Merging data-frames¶
Panada is very useful for merging dataset; to merging data consider the following data sets, where 'id1' and 'id2' include the ids of data.
raw_data = {'id1': range(4),'income': [10,12,14,16]}
dat1 =pd.DataFrame(raw_data, columns = ['id1', 'income'])
raw_data = {'id2': range(6),'pay': [9,11,13,15,17,19]}
dat2 =pd.DataFrame(raw_data, columns = ['id2', 'pay'])
Obviously the id variables are not the same, they can be compared using
pd.merge can merge different data-frames, the merging can be done based on the identities of left dataset, if there is no match in the right file, Python adds NaN.
Since the ids are not the same, one can do merging based on the intersection of the ids,
Merging can also be done based on the union of the ids,
Note: If the names of id variables are the same in the both datasets, you can use on=id_name instead of left_on= and right_on=.
Note: if you want to identify where the elements in rows are from, add argument indicator=True, then new column named _merge would be added to the merged data which shows its originate.
To combine datasets row-wisely, use concat,