Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling

Aggregations

Python Libraries for Data Wrangling

In aggregation function is one which takes multiple individual values and returns a summary. In the majority of the cases, this summary is a single value.

Aggregations

• In aggregation function is one which takes multiple individual values and returns a summary. In the majority of the cases, this summary is a single value. The most common aggregation functions are a simple average or summation of values.

• Let us consider following example:

>>> import numpy as np

>>> arr1 = np.array([10, 20, 30, 40, 50])

>>> arr1

array([10, 20, 30, 40, 50])

>>> arr2 = np.array([[0, 10, 20], [30, 40, 50], [60, 70, 80]])

>>> arr2

array([[0, 10, 20]

[30, 40, 50]

[60, 70, 80]])

>>> arr3 = np.array([[14, 6, 9, -12, 19, 72], [-9, 8, 22, 0, 99, -11]])

>>> array3

array([[14, 6, 9, -12, 19, 72])

[-9, 8, 22, 0, 99, -11]])

• Python numpy sum function calculates the sum of values in an array.

arr1.sum()

arr2.sum()

arr3.sum()

• This Python numpy sum function allows to use an optional argument called an axis. This Python numpy Aggregate Function helps to calculate the sum of a given axis. For example, axis = 0 returns the sum of each column in anNumpy array.

arr2.sum(axis = 0)

arr3.sum(axis = 0)

• axis = 1 returns the sum of each row in an array.

arr2.sum(axis = 1)

arr3.sum(axis = 1)

>>> arr1.sum()

150

>>> arr2.sum()

360

>>> arr3.sum()

217

>>> arr2.sum(axis = 0)

array([90, 120, 150])

>>> arr3.sum(axis=0)

array([5, 14, 31, -12, 118, 61])

>>> arr2.sum(axis=1)

array([30, 120, 210])

>>> arr3.sum(axis =1)

array([108, 109])

• Python has built-in min and max functions used to find the minimum value and maximum value of any given array.

• Python min() and max() are built-in functions in python which returns the smallest number and the largest number of the list respectively, as the output. Python min() can also be used to find the smaller one in the comparison of two variables or lists. However, Python max() on the other hand is used to find the bigger one in the comparison of two variables or lists.

Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling - Aggregations