Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling

Aggregation and Grouping

Python Libraries for Data Wrangling

Home | All Subjects | CSE Department | Foundation of Data Science

The date column can be parsed using the extremely handy dateutil library.Once the data has been loaded into Python, Pandas makes the calculation of different statistics very simple.

Aggregation and Grouping

• Pandas aggregation methods are as follows:

a) count() Total number of items

b) first(), last(): First and last item

c) mean(), median(): Mean and median

d) min(), max(): Minimum and maximum

e) std(), var(): Standard deviation and variance

f) mad(): Mean absolute deviation

g) prod(): Product of all items

h) sum(): Sum of all items.

• Sample CSV file is as follows:

• The date column can be parsed using the extremely handy dateutil library.

import pandas as pd

importdateutil

# Load data from csv file

data = pd.DataFrame.from_csv('phone_data.csv')

# Convert date from string to date times

data['date'] = data['date'].apply(dateutil.parser.parse, dayfirst=True)

• Once the data has been loaded into Python, Pandas makes the calculation of different statistics very simple. For example, mean, max, min, standard deviations and more for columns are easily calculable:

# How many rows the dataset

data['item'].count()

Out[38]: 830

# What was the longest phone call / data entry?

data['duration'].max()

Out[39]: 10528.0

# How many seconds of phone calls are recorded in total?

data['duration'][data['item'] == 'call'].sum()

Out[40]: 92321.0

# How many entries are there for each month?

data['month'].value_counts()

Out[41]:

2014-11 230

2015-01 205

2014-12 157

2015-02 137

2015-03 101

dtype: int64

# Number of non-null unique network entries

data['network'].nunique()

Out[42]: 9

groupby() function :

• groupby essentially splits the data into different groups depending on a variable of user choice.

• The groupby() function returns a GroupBy object, but essentially describes how the rows of the original data set has been split. The GroupBy object groups variable is a dictionary whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group.

• Functions like max(), min(), mean(), first(), last() can be quickly applied to the GroupBy object to obtain summary statistics for each group.

• The GroupBy object supports column indexing in the same way as the DataFrame and returns a modified GroupBy object.

Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling - Aggregation and Grouping

Home | All Subjects | CSE Department | UNIT: Foundation of Data Science

<< Previous

Next >>

Related Subjects

Foundation of Data Science

CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation

Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling

Aggregation and Grouping

Python Libraries for Data Wrangling

Related Topics

Related Subjects