Data Wrangling is the process of transforming data from its original "raw" form into a more digestible format and organizing sets from various sources into a singular coherent whole for further processing.
UNIT IV: Python Libraries for Data
Wrangling
Syllabus
Basics
of Numpy arrays - aggregations - computations on arrays - comparisons, masks,
boolean logic - fancy indexing - structured arrays - Data manipulation with
Pandas - data indexing and selection operating on data - missing data -
Hierarchical indexing - combining datasets - aggregation and grouping - pivot
tables.
• Data
Wrangling is the process of transforming data from its original "raw"
form into a more digestible format and organizing sets from various sources
into a singular coherent whole for further processing.
• Data
wrangling is also called as data munging.
• The
primary purpose of data wrangling can be described as getting data in coherent
shape. In other words, it is making raw data usable. It provides substance for
further proceedings.
• Data
wrangling covers the following processes:
1.
Getting data from the various source into one place
2.
Piecing the data together according to the determined setting
3.
Cleaning the data from the noise or erroneous, missing elements.
• Data
wrangling is the process of cleaning, structuring and enriching raw data into a
desired format for better decision making in less time.
• There
are typically six iterative steps that make up the data wrangling process:
1. Discovering: Before
you can dive deeply, you must better understand what is in your data, which
will inform how you want to analyze it. How you wrangle customer data, for
example, may be informed by where they are located, what they bought, or what
promotions they received.
2. Structuring: This
means organizing the data, which is necessary because raw data comes in many
different shapes and sizes. A single column may turn into several rows for
easier analysis. One column may become two. Movement of data is made for easier
computation and analysis.
3. Cleaning: What happens when errors and
outliers skew your data? You clean the data. What happens when state data is
entered as AP or Andhra Pradesh or Arunachal Pradesh? You clean the data. Null
values are changed and standard formatting implemented, ultimately increasing
data quality.
4. Enriching Here you take stock in your
data and strategize about how other additional data might augment it. Questions
asked during this data wrangling step might be : what new types of data can I
derive from what I already have or what other information would better inform
my decision making about this current data?
5. Validating Validation rules are
repetitive programming sequences that verify data consistency, quality, and
security. Examples of validation include ensuring uniform distribution of
attributes that should be distributed normally (e.g. birth dates) or confirming
accuracy of fields through a check across data.
6. Publishing: Analysts prepare the
wrangled data for use downstream, whether by a particular user or software and
document any particular steps taken or logic used to wrangle said data. Data
wrangling gurus understand that implementation of insights relies upon the ease
with which it can be accessed and utilized by others.
Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling : Tag: : Data Science - Data Wrangling
Foundation of Data Science
CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation