The date column can be parsed using the extremely handy dateutil library.Once the data has been loaded into Python, Pandas makes the calculation of different statistics very simple.
Aggregation and Grouping
• Pandas aggregation methods are as follows:
a)
count() Total number of items
b)
first(), last(): First and last item
c)
mean(), median(): Mean and median
d)
min(), max(): Minimum and maximum
e)
std(), var(): Standard deviation and variance
f)
mad(): Mean absolute deviation
g)
prod(): Product of all items
h)
sum(): Sum of all items.
• Sample
CSV file is as follows:
• The
date column can be parsed using the extremely handy dateutil library.
import pandas as pd
importdateutil
# Load data from csv file
data =
pd.DataFrame.from_csv('phone_data.csv')
# Convert date from string to date times
data['date'] =
data['date'].apply(dateutil.parser.parse, dayfirst=True)
• Once
the data has been loaded into Python, Pandas makes the calculation of different
statistics very simple. For example, mean, max, min, standard deviations and
more for columns are easily calculable:
# How
many rows the dataset
data['item'].count()
Out[38]:
830
# What
was the longest phone call / data entry?
data['duration'].max()
Out[39]:
10528.0
# How
many seconds of phone calls are recorded in total?
data['duration'][data['item']
== 'call'].sum()
Out[40]:
92321.0
# How
many entries are there for each month?
data['month'].value_counts()
Out[41]:
2014-11
230
2015-01
205
2014-12
157
2015-02
137
2015-03
101
dtype:
int64
# Number
of non-null unique network entries
data['network'].nunique()
Out[42]:
9
groupby() function :
•
groupby essentially splits the data into different groups depending on a
variable of user choice.
• The
groupby() function returns a GroupBy object, but essentially describes how the
rows of the original data set has been split. The GroupBy object groups
variable is a dictionary whose keys are the computed unique groups and
corresponding values being the axis labels belonging to each group.
• Functions
like max(), min(), mean(), first(), last() can be quickly applied to the
GroupBy object to obtain summary statistics for each group.
• The
GroupBy object supports column indexing in the same way as the DataFrame and
returns a modified GroupBy object.
Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling - Aggregation and Grouping
Foundation of Data Science
CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation