Data wrangling is the process of transforming data from its original "raw" form into a more digestible format and organizing sets from various sources into a singular coherent whole for further processing.
Two Marks Questions with Answers
Q.1 Define data wrangling ?
Ans. Data wrangling is the
process of transforming data from its original "raw" form into a more
digestible format and organizing sets from various sources into a singular
coherent whole for further processing.
Q.2 What is Python?
Ans. Python is a high-level
scripting language which can be used for a wide variety of text processing,
system administration and internet-related tasks. Python is a true object-oriented
language and is available on a wide variety of platforms.
Q.3 What is NumPy ?
Ans. NumPy, short for Numerical
Python, is the core library for scientific computing in Python. It has been
designed specifically for performing basic and advanced array operations. It
primarily supports multi-dimensional arrays and vectors for complex arithmetic
operations.
Q.4 What is an aggregation function ?
Ans. An aggregation function is
one which takes multiple individual values and returns a summary. In the
majority of the cases, this summary is a single value. The most common
aggregation functions are a simple average or summation of values.
Q.5 What is Structured Arrays?
Ans. A structured Numpy array is
an array of structures. As numpy arrays are homogeneous i.e. they can contain
data of same type only. So, instead of creating a numpy array of int or float,
we can create numpy array of homogeneous structures too.
Q.6 Describe Pandas.
Ans.
Pandas is a high-level data manipulation tool developed by Wes McKinney. It is
built on the Numpy package and its key data structure is called the DataFrame.
DataFrames allow you to store and manipulate tabular data in rows of
observations and columns of variables. Pandas is built on top of the NumPy
package, meaning a lot of the structure of NumPy is used or replicated in
Pandas.
Q.7 How
to Manipulating and Creating Categorical Variables?
Ans. Categorical variable is one
that has a specific value from a limited selection of values. The number of
values is usually fixed. Categorical features can only take on a limited and
usually fixed, number of possible values. For example, if a dataset is about
information related to users, then user will typically find features like
country, gender, age group, etc. Alternatively, if the data we are working with
is related to products, you will find features like product type, manufacturer,
seller and so on.
Q.8 Explain Hierarchical Indexing.
Ans; Hierarchical indexing is a
method of creating structured group relationships in data. A MultiIndex or
Hierarchical index comes in when our DataFrame has more than two dimensions. As
we already know, a Series is a one-dimensional labelled NumPy array and a
DataFrame is usually a two-dimensional table whose columns are Series. In some
instances, in order to carry out some sophisticated data analysis and
manipulation, our data is presented in higher dimensions.
Q.9 What is Pivot Tables?
Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Python Libraries for Data Wrangling | Foundation of Data Science - Two marks Questions with Answers
Foundation of Data Science
CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation