Hierarchical indexing is a method of creating structured group relationships in data.
Hierarchical Indexing
•
Hierarchical indexing is a method of creating structured group relationships in
data.
• A
MultiIndex or Hierarchical index comes in when our DataFrame has more than two
dimensions. As we already know, a Series is a one-dimensional labelled NumPy
array and a DataFrame is usually a two-dimensional table whose columns are
Series. In some instances, in order to carry out some sophisticated data
analysis and manipulation, our data is presented in higher dimensions.
• A
MultiIndex adds at least one more dimension to the data. A Hierarchical Index
as the name suggests is ordering more than one item in terms of their ranking.
• To
createDataFrame with player ratings of a few players from the Fifa 19 dataset.
In [1]: import pandas as pd
In [2]: data = {'Position': ['GK',
'GK', 'GK', 'DF', 'DF', 'DF",
'MF', 'MF", 'MF', 'CF', 'CF',
'CF'],
'Name': ['De Gea', 'Coutois',
'Allison', 'VanDijk',
'Ramos', 'Godin', 'Hazard', 'Kante',
'De Bruyne', 'Ronaldo'
'Messi', 'Neymar'],
'Overall': ['91','88', '89', '89',
'91', '90', '91', '90', '92', '94', '93', '92'],
'Rank': ['1st', '3rd', '2nd',
'3rd','1st', '2nd', '2nd', '3rd', '1st', '1st', '2nd', '3rd']}
In [3]: fifa19 = pd.DataFrame(data, columns=['Position', 'Name', 'Overall', 'Rank'])
In
[4]: fifa19
Out[4]:
• From
above Dataframe, we notice that the index is the default Pandas index; the
columns 'Position' and 'Rank' both have values or objects that are repeated.
This could sometimes pose a problem for us when we want to analyse the data.
What we would like to do is to use meaningful indexes that uniquely identify
each row and makes it easier to get a sense of the data we are working with.
This is where MultiIndex or Hierarchical Indexing comes in.
• We do this by using the set_index() method.
For Hierarchical indexing, we use set_index() method for passing a list to
represent how we want the rows to be identified uniquely.
In [5]:
fif19.set_index(['Position', 'Rank'], drop = False)
In [6]:
fifa19
Out[6];
• We can
see from the code above that we have set our new indexes to 'Position' and
'Rank' but there is a replication of these columns. This is because we passed
drop-False which keeps the columns where they are. The default method, however,
is drop-True so without indicating drop=False the two columns will be set as
the indexes and the columns deleted automatically.
In [7]:
fifa19.set_index(['Position', 'Rank'])
Out[7]:
Name Overall
Position
Rank
GK 1st
De Gea91
GK 3rd
Coutios88
GK 2nd
Allison 89
DF 3rd
Van Dijk 89
DF 1st
Ramos 91
DF 2nd
Godin 90
MF 2nd
Hazard 91
MF 3rd
Kante90
MF 1st
De Bruyne 92
CF 1st
Ronaldo 94
CF 2nd
Messi93
CF 3rd
Neymar92
• We use
set_index() with an ordered list of column labels to make the new indexes. To
verify that we have indeed set our DataFrame to a hierarchical index, we call
the .index attribute.
In [8]:
fifa19-fifa 19.set_index(['Position', 'Rank'])
In [9]:
fifa19.index
Out[9]:
MultiIndex(levels = [['CF', 'DF', 'GK', 'MF'],
['1st',
'2nd', '3rd']],
codes =
[[2, 2, 2, 1, 1, 1, 3, 3, 3, 0, 0, 0],
[0, 2,
1, 2,0,1, 1, 2, 0, 0, 1, 2]],
names=
['Position', 'Rank'])
Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling - Hierarchical indexing
Foundation of Data Science
CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation