2023-02-12    Share on: Twitter | Facebook | HackerNews | Reddit

How to Select Rows From a DataFrame Based on Column Values

To select rows from a pandas DataFrame based on column values, you can use various methods. Here are some of the most common ones:

1. Using Boolean indexing

This involves creating a Boolean condition based on the values in a specific column, and then passing that condition to the DataFrame to select only the rows that meet the condition. For example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Gender': ['F', 'M', 'M', 'M']
})

# Select only the rows where Gender is 'M'
male_df = df[df['Gender'] == 'M']
print(male_df)

This will output:

      Name  Age Gender
1      Bob   30      M
2  Charlie   35      M
3    David   40      M

2. Using the query() method

This method allows you to select rows based on a more complex condition using a string expression. For example:

# Select only the rows where Age is greater than 30
over_30_df = df.query('Age > 30')
print(over_30_df)

This will output:

      Name  Age Gender
2  Charlie   35      M
3    David   40      M

3. Using the loc() method

This method allows you to select rows based on a specific label or index. For example:

# Set the Name column as the index
df.set_index('Name', inplace=True)

# Select only the row for Bob
bob_df = df.loc['Bob']
print(bob_df)

This will output:

Age       30
Gender     M
Name: Bob, dtype: object

4. Using the iloc() method

This method allows you to select rows based on their integer position in the DataFrame. For example:

# Select the first two rows
first_two_df = df.iloc[:2]
print(first_two_df)

This will output:

      Age Gender
Name            
Alice   25      F
Bob     30      M

There are some other methods, see stackoverflow answers to question: How do I select rows from a DataFrame based on column values?