How do you compare values in two data frames?

We have two dataframes and a common column that we want to compare and find out the matching, missing values and sometimes the difference between the values using a key

We would first concatenate the two dataframes into one and see how the two dataframes looks side by side and then find out the differences between them.

However if you are interested to find the difference between two dataframes then read this post

We will follow the following steps to find the difference between a column in two dataframes:

  1. Create two dataframes - df1 and df2
  2. Concatenate two dataframes side by side
  3. Compare the column in two dataframes on a common key
  4. Additionally, find the matching rows between two dataframe
  5. find the non-matching rows between the dataframes

Let’s get started, we will first create two test dataframes(df1 & df2) to work upon

Create Two Dataframes

First Dataframe:

The first dataframe has 2 columns: Items and Sale

import pandas as pd
import numpy as np

df1 = pd.DataFrame([['A', 1], ['B', 2]],
                    columns=['Items', 'Sale'])
df1

ItemsSaleA200B410

Second Dataframe:

The second dataframe has three columns: Items, Sale and Category

df2 = pd.DataFrame([['A', 320, 'food'], ['B', 320, 'home'], ['C', 530, 'furniture']],
                    columns=['Items', 'Sale', 'Category'])
df2

ItemsSaleCategoryA320FoodB550HomeC530Furniture

Concatenate the two dataframes

We have concatenated the two dataframes(df1 and df2) and can see them side by side and the final concatenated dataframe is stored in variable df

df=pd.concat([df1, df2],axis=1, keys = ['df1', 'df2'])
df

df1df2ItemsSaleItemsSalesCategory0A200A320Food1B410B550Home2NaNNaNC530Furniture

Compare the columns in two dataframe

We will find the difference between the sales value between two dataframe for each of the Items

We have added a new column called as sales-diff to find the differences between the sales value in two dataframes where the Item values are similar otherwise difference is set to 0.

numpy.where() is used to return choice depending on condition

df['sales-diff']=np.where(df['df1']['Items']==df['df2']['Items'],
                    (df['df1']['Sale']-df['df2']['Sale']),
                    0)

We’ve got a new column that shows exactly the difference between the Sales column between df2 and df1

df1df2sales-diffItemsSaleItemsSalesCategory0A200A320Food1201B410B550Home1402NaNNaNC530Furniture0

Non-matching rows between two dataframes

Let’s find the rows not matching between two dataframes(df1 and df2) based on column Items i.e. Elements of Series df1[‘Items’] which are not in df2[‘Items’]

df[~df['df1']['Items'].isin(df['df2']['Items'])]

df1df2sales-diffItemsSaleItemsSalesCategory2NaNNaNC530Furniture0

Matching rows between two dataframes

We will find the rows matching between the two dataframes(df1 and df2) based on column Items i.e. Elements of Series df1[‘Items’] which are in df2[‘Items’]

df[df['df1']['letter']==df['df2']['letter']]

df1df2sales-diffItemsSaleItemsSalesCategory0A200A320Food1201B410B550Home140

Alternatively, we can use pandas.merge() to merge the two dataframes(df1 and df2) on column Items and apply inner join, use intersection of keys from both dataframes, similar to a SQL inner join and preserve the order of the left keys

How do I compare Dataframe column values?

By using the Where() method in NumPy, we are given the condition to compare the columns. If 'column1' is lesser than 'column2' and 'column1' is lesser than the 'column3', We print the values of 'column1'. If the condition fails, we give the value as 'NaN'. These results are stored in the new column in the dataframe.

How do I compare two DataFrames values in Python?

Here are the steps for comparing values in two pandas Dataframes:.
Step 1 Dataframe Creation: The dataframes for the two datasets can be created using the following code:.
Output:.
Step 2 Comparison of values: You need to import numpy for the successful execution of this step..

How do I compare values between two DataFrames in R?

We can use the compare package in R. We can easily use this package to compare two data frames and check out the summary of what extent it is changed. The function comparedf() is used to compare two dataframes in R. The function takes two dataframes and then check them for comparison.

How do I compare two columns in two DataFrames?

We will follow the following steps to find the difference between a column in two dataframes:.
Create two dataframes - df1 and df2..
Concatenate two dataframes side by side..
Compare the column in two dataframes on a common key..
Additionally, find the matching rows between two dataframe..