Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling

Structured Arrays

Python Libraries for Data Wrangling

A structured Numpy array is an array of structures. As numpy arrays are homogeneous i.e. they can contain data of same type only.

Structured Arrays

• A structured Numpy array is an array of structures. As numpy arrays are homogeneous i.e. they can contain data of same type only. So, instead of creating a numpy array of int or float, we can create numpy array of homogeneous structures too.

• First of all import numpy module i.e.

importnumpy as np

• Now to create a structure numpy array we can pass a list of tuples containing the structure elements i.e.

[('Ram', 22.2, 3), ('Rutu', 39.4, 5), ('Rupu', 55.5, 6), ('Iresh', 99.9, 7)]

• But as elements of a Numpy array are homogeneous, so how will be the size and type of structure will be decided? For that we need to pass the type of above structure type i.e. schema in dtype parameter.

• Let's create a dtype for above structure i.e.

    # Creating the type of a structure

     dtype = [('Name', (np.str_, 10)), ('Marks', np.float64), ('GradeLevel', np.int32)]

• Let's create a numpy array based on this dtype i.e.

    # Creating a StrucuredNumpy array

    structuredArr= np.array([('Ram', 22.2, 3), ('Rutu', 39.4, 5), ('Rupu', 55.5, 6),                                                                                      ('Iresh', 99.9, 7)], dtype=dtype)

• It will create a structured numpy array and its contents will be,

    [('Ram', 22.2, 3), ('Rutu', 39.4, 5), ('Rupu', 55.5, 6), ('Iresh', 99.9, 7)]

• Let's check the data type of the above created numpy array is,

      print(structured Arr.dtype)

      Output:

       [('Name', '<U10'), ('Marks', '<f8'), ('GradeLevel', '<i4')]

Creating structured arrays:

• Structured array data types can be specified in a number of ways.

1. Dictionary method :

     np.dtype({'names': ('name', 'age', 'weight'),

    'formats': ('U10', '14', 'f8')})

    Output: dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

2. Numerical types can be specified with Python types or NumPydtypes instead :

    np.dtype({'names': ('name', 'age', 'weight'),

                            'formats':((np.str_, 10), int, np.float32)})

    Output: dtype([('name', '<U10'), ('age', '<i8'), ('weight', '<f4')])

3. A compound type can also be specified as a list of tuples :

     np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

    Output: dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

NumPy data types:

• Below is a listing of all data types available in NumPy and the characters that represent them.

1) I - integer

2) b - boolean

3) u - unsigned integer

4) f - float

5) c - complex float

6) m - timedelta

7) M - datetime

8) O - object

9) S - string

10) U - unicode string

11) V - fixed for other types of memory

Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling - Structured Arrays