Pandas compare row to previous row

I’m still playing around with Covid vaccine data, this time exploring how the number of doses varies week by week. I want to know how many more (or less) vaccines have been done on a given day compared to that same day last week.

We’ll be using Pandas in this blog post, so let’s first install that library and import it:

Install Pandas

pip install pandas

Import module

import pandas as pd

And now let’s create a DataFrame containing a subset of the data that I’m working with:

df = pd.DataFrame([
    {'date': '2021-01-11', 'vaccinations': 165844.0},
    {'date': '2021-01-18', 'vaccinations': 208641.0},
    {'date': '2021-01-25', 'vaccinations': 281725.0},
    {'date': '2021-02-01', 'vaccinations': 352935.0},
    {'date': '2021-02-08', 'vaccinations': 356291.0}
])

Table 1. Outputdatevaccinations

2021-01-11

165844.0

2021-01-18

208641.0

2021-01-25

281725.0

2021-02-01

352935.0

2021-02-08

356291.0

I’ve filtered the data to include the data for 5 Mondays start from the beginning of January. In the real data set we have the data for every day from January until today.

I want to add a column that shows the number of vaccinations done on the previous week. So:

  • for 2021-01-18, we’d have the value for 2021-01-11

  • for

    import pandas as pd
    0, we’d have the value for
    import pandas as pd
    1

and so on.

It took me a while to find the function to do this, but it turns out that

import pandas as pd
2 is what we want. This function shifts the index by the desired number of periods. We’ll pass in
import pandas as pd
3 because we want to shift every row down by 1:

df["vaccinations"].shift(periods=1)

Note

I should note that for the real DataFrame I had data for every day and therefore wanted to compare the data from 7 rows earlier in the DataFrame,which meant I had to use

import pandas as pd
4. If we wanted to compare the data from 3 days earlier, we could use
import pandas as pd
5, and so on.

Table 2. Outputvaccinations

NaN

165844.0

208641.0

281725.0

352935.0

We can then add a new column to our DataFrame using the following code:

df.loc[:, "lastWeekVaccinations"] = df["vaccinations"].shift(periods=1)

Table 3. OutputdatevaccinationslastWeekVaccinations

2021-01-11

165844.0

NaN

2021-01-18

208641.0

165844.0

2021-01-25

281725.0

208641.0

2021-02-01

352935.0

281725.0

2021-02-08

356291.0

352935.0

If we then wanted to compute the difference between this week and last week, we could compute this by writing the following code:

df["vaccinations"] - df["lastWeekVaccinations"]

Or we could use the

import pandas as pd
6 function, which achieves the same thing:

df.loc[:, "lastWeekVaccinationsDiff"] = df["vaccinations"].diff(periods=1)

Table 4. OutputdatevaccinationslastWeekVaccinationslastWeekVaccinationsDiff

2021-01-11

165844.0

NaN

NaN

2021-01-18

208641.0

165844.0

42797.0

2021-01-25

281725.0

208641.0

73084.0

2021-02-01

352935.0

281725.0

71210.0

2021-02-08

356291.0

352935.0

3356.0

We can also compute the percentage change between the weeks using

import pandas as pd
7:

df.loc[:, "lastWeekVaccinationsChange"] = df["vaccinations"].pct_change(periods=1)

Table 5. OutputdatevaccinationslastWeekVaccinationslastWeekVaccinationsDifflastWeekVaccinationsChange

2021-01-11

165844.0

NaN

NaN

NaN

2021-01-18

208641.0

165844.0

42797.0

0.2580557632473892

2021-01-25

281725.0

208641.0

73084.0

0.3502858977861494

2021-02-01

352935.0

281725.0

71210.0

0.25276422042772206

2021-02-08

356291.0

352935.0

3356.0

0.009508833071245393

With these functions I’ve been able to do exactly what I wanted and could then create a chart based on this data.

  • Email
  • Facebook
  • Twitter
  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket

About the author

I'm currently working on real-time user-facing analytics with Apache Pinot at StarTree. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also I co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.

How do I compare rows in pandas?

Steps to Compare Values Between two Pandas DataFrames.
Step 1: Prepare the datasets to be compared. To start, let's say that you have the following two datasets that you want to compare: ... .
Step 2: Create the two DataFrames. ... .
Step 3: Compare the values between the two Pandas DataFrames..

What does diff () do in pandas?

Pandas DataFrame diff() Method The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

What is diff () in Python?

diff(arr[, n[, axis]]) function is used when we calculate the n-th order discrete difference along the given axis. The first order difference is given by out[i] = arr[i+1] – arr[i] along the given axis. If we have to calculate higher differences, we are using diff recursively. Syntax: numpy.diff()

How do I compare two column values in pandas?

By using the Where() method in NumPy, we are given the condition to compare the columns. If 'column1' is lesser than 'column2' and 'column1' is lesser than the 'column3', We print the values of 'column1'. If the condition fails, we give the value as 'NaN'. These results are stored in the new column in the dataframe.