You could use np.where. If cond is a boolean array, and A and B are arrays, then
C = np.where(cond, A, B)defines C to be equal to A where cond is True, and B where cond is False.
import numpy as np import pandas as pd a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']] df = pd.DataFrame(a, columns=['one', 'two', 'three']) df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three']) , df['one'], np.nan)yields
one two three que 0 10 1.2 4.2 10 1 15 70 0.03 NaN 2 8 5 0 NaNIf you have more than one condition, then you could use np.select instead. For example, if you wish df['que'] to equal df['two'] when df['one'] < df['two'], then
conditions = [ (df['one'] >= df['two']) & (df['one'] <= df['three']), df['one'] < df['two']] choices = [df['one'], df['two']] df['que'] = np.select(conditions, choices, default=np.nan)yields
If we can assume that df['one'] >= df['two'] when df['one'] < df['two'] is False, then the conditions and choices could be simplified to
conditions = [ df['one'] < df['two'], df['one'] <= df['three']] choices = [df['two'], df['one']](The assumption may not be true if df['one'] or df['two'] contain NaNs.)
Note that
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']] df = pd.DataFrame(a, columns=['one', 'two', 'three'])defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:
df2 = df.astype(float)This changes the results, however, since strings compare character-by-character, while floats are compared numerically.
Often you may want to compare two columns in a Pandas DataFrame and write the results of the comparison to a third column.
You can easily do this by using the following syntax:
conditions=[(condition1),(condition2)] choices=["choice1","choice2"] df["new_column_name"]=np.select(conditions, choices, default)Here’s what this code does:
- conditions are the conditions to check for between the two columns
- choices are the results to return based on the conditions
- np.select is used to return the results to the new column
The following example shows how to use this code in practice.
Suppose we have the following DataFrame that shows the number of goals scored by two soccer teams in five different matches:
import numpy as np import pandas as pd #create DataFrame df = pd.DataFrame({'A_points': [1, 3, 3, 3, 5], 'B_points': [4, 5, 2, 3, 2]}) #view DataFrame df A_points B_points 0 1 4 1 3 5 2 3 2 3 3 3 4 5 2We can use the following code to compare the number of goals by row and output the winner of the match in a third column:
#define conditions conditions = [df['A_points'] > df['B_points'], df['A_points'] < df['B_points']] #define choices choices = ['A', 'B'] #create new column in DataFrame that displays results of comparisons df['winner'] = np.select(conditions, choices, default='Tie') #view the DataFrame df A_points B_points winner 0 1 4 B 1 3 5 B 2 3 2 A 3 3 3 Tie 4 5 2 AThe results of the comparison are shown in the new column called winner.
Notes
Here are a few things to keep in mind when comparing two columns in a pandas DataFrame:
- The number of conditions and choices should be equal.
- The default value specifies the value to display in the new column if none of the conditions are met.
- Both NumPy and Pandas are required to make this code work.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Rename Columns in Pandas
How to Add a Column to a Pandas DataFrame
How to Change the Order of Columns in Pandas DataFrame