Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling

Two marks Questions with Answers

Python Libraries for Data Wrangling | Foundation of Data Science

Data wrangling is the process of transforming data from its original "raw" form into a more digestible format and organizing sets from various sources into a singular coherent whole for further processing.

Two Marks Questions with Answers

Q.1 Define data wrangling ?

Ans. Data wrangling is the process of transforming data from its original "raw" form into a more digestible format and organizing sets from various sources into a singular coherent whole for further processing.

Q.2 What is Python?

Ans. Python is a high-level scripting language which can be used for a wide variety of text processing, system administration and internet-related tasks. Python is a true object-oriented language and is available on a wide variety of platforms.

Q.3 What is NumPy ?

Ans. NumPy, short for Numerical Python, is the core library for scientific computing in Python. It has been designed specifically for performing basic and advanced array operations. It primarily supports multi-dimensional arrays and vectors for complex arithmetic operations.

Q.4 What is an aggregation function ?

Ans. An aggregation function is one which takes multiple individual values and returns a summary. In the majority of the cases, this summary is a single value. The most common aggregation functions are a simple average or summation of values.

Q.5 What is Structured Arrays?

Ans. A structured Numpy array is an array of structures. As numpy arrays are homogeneous i.e. they can contain data of same type only. So, instead of creating a numpy array of int or float, we can create numpy array of homogeneous structures too.

Q.6 Describe Pandas.

Ans. Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables. Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas.

 Q.7 How to Manipulating and Creating Categorical Variables?

Ans. Categorical variable is one that has a specific value from a limited selection of values. The number of values is usually fixed. Categorical features can only take on a limited and usually fixed, number of possible values. For example, if a dataset is about information related to users, then user will typically find features like country, gender, age group, etc. Alternatively, if the data we are working with is related to products, you will find features like product type, manufacturer, seller and so on.

Q.8 Explain Hierarchical Indexing.

Ans; Hierarchical indexing is a method of creating structured group relationships in data. A MultiIndex or Hierarchical index comes in when our DataFrame has more than two dimensions. As we already know, a Series is a one-dimensional labelled NumPy array and a DataFrame is usually a two-dimensional table whose columns are Series. In some instances, in order to carry out some sophisticated data analysis and manipulation, our data is presented in higher dimensions.

Q.9 What is Pivot Tables?

Ans. : A pivot table is a similar operation that is commonly seen in spreadsheets and other programs that operate on tabular data. The pivot table takes simple column-wise data as input and groups the entries into a two-dimensional table that provides a multidimensional summarization of the data.

Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling | Foundation of Data Science - Two marks Questions with Answers