How do you compare two columns in a data frame?

You could use np.where. If cond is a boolean array, and A and B are arrays, then

C = np.where(cond, A, B)

defines C to be equal to A where cond is True, and B where cond is False.

import numpy as np
import pandas as pd

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

yields

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN

If you have more than one condition, then you could use np.select instead. For example, if you wish df['que'] to equal df['two'] when df['one'] < df['two'], then

conditions = [
    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]

choices = [df['one'], df['two']]

df['que'] = np.select(conditions, choices, default=np.nan)

yields

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03   70
2   8    5     0  NaN

If we can assume that df['one'] >= df['two'] when df['one'] < df['two'] is False, then the conditions and choices could be simplified to

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]

choices = [df['two'], df['one']]

(The assumption may not be true if df['one'] or df['two'] contain NaNs.)


Note that

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:

df2 = df.astype(float)

This changes the results, however, since strings compare character-by-character, while floats are compared numerically.

In [61]: '10' <= '4.2'
Out[61]: True

In [62]: 10 <= 4.2
Out[62]: False


Often you may want to compare two columns in a Pandas DataFrame and write the results of the comparison to a third column.

You can easily do this by using the following syntax:

conditions=[(condition1),(condition2)]
choices=["choice1","choice2"]

df["new_column_name"]=np.select(conditions, choices, default)

Here’s what this code does:

  • conditions are the conditions to check for between the two columns
  • choices are the results to return based on the conditions
  • np.select is used to return the results to the new column

The following example shows how to use this code in practice.

Suppose we have the following DataFrame that shows the number of goals scored by two soccer teams in five different matches:

import numpy as np
import pandas as pd

#create DataFrame
df = pd.DataFrame({'A_points': [1, 3, 3, 3, 5],
                   'B_points': [4, 5, 2, 3, 2]})
             
#view DataFrame      
df

          A_points  B_points
0         1         4
1         3         5
2         3         2
3         3         3
4         5         2

We can use the following code to compare the number of goals by row and output the winner of the match in a third column:

#define conditions
conditions = [df['A_points'] > df['B_points'], 
              df['A_points'] < df['B_points']]

#define choices
choices = ['A', 'B']

#create new column in DataFrame that displays results of comparisons
df['winner'] = np.select(conditions, choices, default='Tie')

#view the DataFrame
df

          A_points  B_points  winner
0         1         4         B
1         3         5         B
2         3         2         A
3         3         3         Tie
4         5         2         A

The results of the comparison are shown in the new column called winner.

Notes

Here are a few things to keep in mind when comparing two columns in a pandas DataFrame:

  • The number of conditions and choices should be equal.
  • The default value specifies the value to display in the new column if none of the conditions are met.
  • Both NumPy and Pandas are required to make this code work.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Rename Columns in Pandas
How to Add a Column to a Pandas DataFrame
How to Change the Order of Columns in Pandas DataFrame

How do you compare values in two data frames?

Steps to Compare Values Between two Pandas DataFrames.
Step 1: Prepare the datasets to be compared. To start, let's say that you have the following two datasets that you want to compare: ... .
Step 2: Create the two DataFrames. ... .
Step 3: Compare the values between the two Pandas DataFrames..

How do I compare two columns to match in R?

We can compare two columns in R by using ifelse(). This statement is used to check the condition given and return the data accordingly.

How do you find the difference between two columns in pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns.

How do I find the difference in columns in R?

Approach.
Create a dataframe and the columns should be of numeric or integer data type so that we can find the difference between them..
Extract required data from columns using the $ operator into separate variables. ... .
Then perform the minus operation for the difference between those columns..
Finally, print the result..