Data science process consists of six stages
Data Science
Process
Data
science process consists of six stages :
1.
Discovery or Setting the research goal
2.
Retrieving data
3. Data
preparation
4. Data
exploration
5. Data
modeling
6.
Presentation and automation
• Fig.
1.3.1 shows data science design process.
• Step 1: Discovery or Defining research goal
This
step involves acquiring data from all the identified internal and external
sources, which helps to answer the business question.
• Step 2: Retrieving data
It
collection of data which required for project. This is the process of gaining a
business understanding of the data user have and deciphering what each piece of
data means. This could entail determining exactly what data is required and the
best methods for obtaining it. This also entails determining what each of the
data points means in terms of the company. If we have given a data set from a
client, for example, we shall need to know what each column and row represents.
• Step 3: Data preparation
Data can
have many inconsistencies like missing values, blank columns, an incorrect data
format, which needs to be cleaned. We need to process, explore and condition
data before modeling. The cleandata, gives the better predictions.
• Step 4: Data exploration
Data
exploration is related to deeper understanding of data. Try to understand how
variables interact with each other, the distribution of the data and whether
there are outliers. To achieve this use descriptive statistics, visual
techniques and simple modeling. This steps is also called as Exploratory Data
Analysis.
• Step 5: Data modeling
In this
step, the actual model building process starts. Here, Data scientist
distributes datasets for training and testing. Techniques like association,
classification and clustering are applied to the training data set. The model,
once prepared, is tested against the "testing" dataset.
• Step 6: Presentation and automation
Deliver
the final baselined model with reports, code and technical documents in this
stage. Model is deployed into a real-time production environment after thorough
testing. In this stage, the key findings are communicated to all stakeholders.
This helps to decide if the project results are a success or a failure based on
the inputs from the model.
Foundation of Data Science: Unit I: Introduction : Tag: : - Data Science Process
Foundation of Data Science
CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation