Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling

Basics of Numpy Arrays

Python Libraries for Data Wrangling

Home | All Subjects | CSE Department | Foundation of Data Science

Numpy array is a powerful N-dimensional array object which is in the form of rows and columns.

Basics of Numpy Arrays

• Numpy array is a powerful N-dimensional array object which is in the form of rows and columns. We can initialize NumPy arrays from nested Python lists and access it elements. NumPy array is a collection of elements that have the same data type.

• A one-dimensional NumPy array can be thought of as a vector, a two-dimensional array as a matrix (i.e., a set of vectors), and a three-dimensional array as a tensor (i.e., a set of matrices).

• To define an array manually, we can use the np.array() function.

• Basic array manipulations are as follows :

1. Attributes of arrays: It define the size, shape, memory consumption, and data types of arrays.

2. Indexing of arrays: Getting and setting the value of individual array elements. 3. Slicing of arrays: Getting and setting smaller subarrays within a larger array.

4. Reshaping of arrays: Changing the shape of a given array.

5. Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many.

a) Attributes of array

• In Python, arrays from the NumPy library, called N-dimensional arrays or the ndarray, are used as the primary data structure for representing data.

• The main data structure in NumPy is the ndarray, which is a shorthand name for N- dimensional array. When working with NumPy, data in an ndarray is simply referred to as an array. It is a fixed-sized array in memory that contains data of the same type, such as integers or floating point values.

•The data type supported by an array can be accessed via the "dtype" attribute on the array. The dimensions of an array can be accessed via the "shape" attribute that returns a tuple describing the length of each dimension.

• Array attributes are essential to find out the shape, dimension, item size etc.

• ndarray.shape: By using this method in numpy, we can know the array dimensions. It can also be used to resize the array. Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array).

• ndarray.size: The total number of elements of the array. This is equal to the product of the elements of the array's shape.

• ndarray.dtype: An object describing the data type of the elements in the array. Recall that NumPy's ND-arrays are homogeneous: they can only posses numbers of a uniform data type.

b) Indexing of arrays

• Array indexing always refers to the use of square brackets ("[ ]') to index the elements of the array. In order to access a single element of an array we can refer to its index.

• Fig. 4.4.1 shows the indexing of an ndarray mono-dimensional.

>>> a = np.arange(25, 31)

>>>P

array([25, 26, 27, 28, 29, 30])

>>>P[3]

• The NumPy arrays also accept negative indexes. These indexes have the same incremental sequence from 0 to -1, -2, and so on,

>>>P[-1]

>>>P[-6]

• In a multidimensional array, we can access items using a comma-separated tuple of indices. To select multiple items at once, we can pass array of indexes within the square brackets.

>>>a[[1, 3, 4]]

array([26, 28, 29])

• Moving on to the two-dimensional case, namely, the matrices, they are represented as rectangular arrays consisting of rows and columns, defined by two axes, where axis 0 is represented by the rows and axis 1 is represented by the columns. Thus, indexing in this case is represented by a pair of values : the first value is the index of the row and the second is the index of the column.

• Fig. 4.4.2 shows the indexing of a bi-dimensional array.

>>> A = np.arange(10, 19).reshape((3, 3))

>>> A

array([[10, 11, 12],

[13, 14, 15],

[16, 17, 18]])

c) Slicing of arrays

• Slicing is the operation which allows to extract portions of an array to generate new ones. Whereas using the Python lists the arrays obtained by slicing are copies, in NumPy, arrays are views onto the same underlying buffer.

• Slicing of array in Python means to access sub-parts of an array. These sub-parts can be stored in other variables and further modified.

• Depending on the portion of the array, to extract or view the array, make use of the slice syntax; that is, we will use a sequence of numbers separated by colons (':') within the square brackets.

• Synatx : arr[ start stop step],

Arr[slice(start, stop, step)]

• The start parameter represents the starting index, stop is the ending index, and step is the number of items that are "stepped" over. If any of these are unspecified, they default to the values start=0, stop-size of dimension, step-1.

importnumpy as np

arr = np.array([1,2,3,4])

print(arr[1:3:2])

print(arr[:3])

print(arr[::2])

Output:

[2]

[1 2 3]

[13]

Multidimensional sub-arrays:

• Multidimensional slices work in the same way, with multiple slices separated by commas. For example:

In[24]: x2

Out[24]: array([[12, 5, 2, 4],

[ 7, 6, 8, 8],

[ 1, 6, 7, 7]])

In[25]: x2[:2, :3] # two rows, three columns

Out[25]: array([[12, 5, 2],

[ 7, 6, 8]])

In[26]: x2[:3, ::2] # all rows, every other column

Out[26]: array([[12, 2],

[7, 8],

[ 1, 7]])

• Let us create an array using the package Numpy and access its columns.

# Creating an array

importnumpy as np

a= np.array([[1,2,3],[4,5,6],[7,8,9]])

• Now let us access the elements column-wise. In order to access the elements in a column-wise manner colon(:) symbol is used let us see that with an example.

importnumpy as np

a = np.array([[1,2,3],[4,5,6],[7,8,9]])

print(a[:,1])

Output:

[258]

d) Reshaping of array

•The numpy.reshape() function is used to reshape a numpy array without changing the data in the array.

• Syntax:

numpy.reshape(a, newshape, order='C')

Where order: {'C', 'F', 'A'}, optional Read the elements of a using this index order, and place the elements into the reshaped array using this index order.

Step 1: Create a numpy array of shape (8,)

num_array = np.array([1,2,3,4,5,6,7,8])

num_array

Output:

array([1, 2, 3, 4, 5, 6, 7, 8])

Step 2: Use np.reshape() function with new shape as (4,2)

np.reshape(num_array,(4,2))

array([[1,2],

[3,4],

[5,6],

[7,8]])

• The shape of the input array has been changed to a (4,2). This is a 2-D array and contains the same data present in the original input 1-D array.

e) Array concatenation and splitting

• np.concatenate() constructor is used to concatenate or join two or more arrays into one. The only required argument is list or tuple of arrays.

#first, import numpy

importnumpy as np

# making two arrays to concatenate

arr1 = np.arange(1,4)

arr2 = np.arange(4,7)

print("Arrays to concatenate:")

print(arr1); print(arr2)

print("After concatenation:")

print(np.concatenate([arr1,arr2]))

Arrays to concatenate:

[1 2 3]

[4 5 6]

After concatenation:

[1 2 3 4 5 6]

Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling - Basics of Numpy Arrays

Home | All Subjects | CSE Department | UNIT: Foundation of Data Science

<< Previous

Next >>

Related Subjects

Foundation of Data Science

CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation

Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling

Basics of Numpy Arrays

Python Libraries for Data Wrangling

Related Topics

Related Subjects