Sunday, 8 September 2013

Counting similarities among two pandas dataframes

Counting similarities among two pandas dataframes

I am relatively new to python, but have been using Wes Kinney's, "Python
for Data Analysis" religiously for the last few weeks. I have spent hours
trying to figure out a solution to my current issue, but I need some help.
I have a data set containing details for shipments that were sent in this
calendar year, however; as I get new data each month, some details may
have changed. I have figured out how to identify the shipments that have
changed, as well as how to identify what those changes may be.
So, let's say I've identified that this shipment (in dataframe, original)
has changed:
ID Code Mode Amount From To Weight Cube
Service_Date
MNO123 BBB Air 50 M1234 M9876 60 6 1-1-2013
And I've identified the potential changes (in a dataframe, changes) as
ID Code Mode Amount From To Weight Cube
Service_Date
MNO123 BBB Air 50 M1234 M9876 60 6 2-2-2013
MNO123 BBB Air 60 M1234 M9876 60 6 2-2-2013
MNO123 BBB Air 70 M1234 M1111 60 6 2-2-2013
All I'm trying to do is add a count column to the changes dataframe which
totals the amount of values that match the respective values in the
original dataframe. So, because code, mode, amount, from, to, weight, and
cube match, the count column would get the value of 7 for the first
observation. Similarly, but having one less matching value, the second
observation would have a count value of 6 and the third would have a count
value of 5.
Having gone through Wes' book and numerous somewhat similar posts on this
site, I believe I need to be using df.iterrows(), but I am struggling with
the process to iterate over both dataframes while also checking and
counting for matching values.
This was my latest attempt:
for i in changes.iterrows():
for i in original.iterrows():
changes['count'] = 0
if changes(i) == original(i):
changes['count'] +=1
Thanks in advance for your time and efforts!

No comments:

Post a Comment