Whether it is to concatenate several datasets from different csv files or to merge sets of aggregated data from different google analytics accounts, combining data from various sources is critical to drawing the right conclusions and extracting optimal value from data analytics.
Combining Datasets
• Whether
it is to concatenate several datasets from different csv files or to merge sets
of aggregated data from different google analytics accounts, combining data
from various sources is critical to drawing the right conclusions and extracting
optimal value from data analytics.
• When
using pandas, data scientists often have to concatenate multiple pandas
DataFrame; either vertically (adding lines) or horizontally (adding columns).
DataFrame.append
• This
method allows to add another dataframe to an existing one. While columns with
matching names are concatenated together, columns with different labels are
filled with NA.
>>>df1
ints
bools
0 0 True
11 False
2 2 True
>>>
df2
ints floats
0 3 1.5
1 4 2.5
2 5 3.5
>>>
df1.append(df2).
ints bools floats
0 0 True NaN
1 1 False NaN
2 2 True NaN
0 3 NaN 1.5
1 4 NaN 2.5
2 5 NaN 3.5
• In
addition to this, DataFrame.append provides other flexibilities such as
resetting the resulting index, sorting the resulting data or raising an error
when the resulting index includes duplicate records.
Pandas.concat
• We can
concat dataframes both vertically (axis=0) and horizontally (axis=1) by using
the Pandas.concat function. Unlike DataFrame.append, Pandas.concat is not a
method but a function that takes a list of objects as input. On the other hand,
columns with different labels are filled with NA values as for
DataFrame.append.
>>>
df3
bools
floats
0 False
4.5
1 True
5.5
2 False 6.5
>>>pd.concat([df1,
df2, df3])
ints
bools floats
0 0.0
True NaN
1 1.0
False NaN
2 2.0
True NaN
0 3.0
NaN 1.5
1 4.0
NaN 2.5
2 5.0
NaN 3.5
0 NaN
False 4.5
1 NaN
True 5.5
2 NaN
False 6.5
Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling - Combining Datasets
Foundation of Data Science
CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation