Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling

Comparisons, Masks and Boolean Logic

Python Libraries for Data Wrangling

Masking means to extract, modify, count or otherwise manipulate values in an array based on some criterion.

Comparisons, Masks and Boolean Logic

• Masking means to extract, modify, count or otherwise manipulate values in an array based on some criterion.

• Boolean masking, also called boolean indexing, is a feature in Python NumPy that allows for the filtering of values in numpy arrays. There are two main ways to carry out boolean masking:

a) Method one: Returning the result array.

b) Method two: Returning a boolean array.

Comparison operators as ufuncs

• The result of these comparison operators is always an array with a Boolean data type. All six of the standard comparison operations are available. For example, we might wish to count all values greater than a certain value, or perhaps remove all outliers that are above some threshold. In NumPy, Boolean masking is often the most efficient way to accomplish these types of tasks.

x = np.array([1,2,3,4,5])

print(x<3) # less than

print(x>3) # greater than

print(x<=3) # less than or equal

print(x>=3) #greater than or equal

print(x!=3) #not equal

print(x==3) #equal

• Comparison operators and their equivalent :

Boolean array:

• A boolean array is a numpy array with boolean (True/False) values. Such array can be obtained by applying a logical operator to another numpy array:

importnumpyasnp

a = np.reshape(np.arange(16), (4,4)) # create a 4x4 array of integers

print(a)

[[ 0 1 2 3]

[ 4 5 6 7]

[ 8 9 10 11]

[12 13 14 15]]

large values (a>10) # test which elements of a are greated than 10

print(large_values)

[[False FalseFalse False]

[False FalseFalse False]

[False Falsefalse True]

[ TrueTrueTrue True]]

even_values = (a%2==0) # test which elements of a are even

print(even_values)

[[True False True False]

[True False True False]

[True False True False]

[True False True False]]

Logical operations on boolean arrays

• Boolean arrays can be combined using logical operators :

b = ~(a%3 == 0) # test which elements of a are not divisible by 3

print('array a:\n{}\n'.format(a))

print('array b:\n{}'.format(b))

array a:

[[ 0 1 2 3]

[ 4 5 6 7]

[ 8 9 10 11]

[12 13 14 15]]

array b:

[[False TrueTrue False]

[ TrueTrue False True]

[True False True True]

[False TrueTrue False]]

Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling - Comparisons, Masks and Boolean Logic