Foundation of Data Science: Unit I: Introduction

Data Science Process

Data science process consists of six stages

Data Science Process

Data science process consists of six stages :

1. Discovery or Setting the research goal

2. Retrieving data

3. Data preparation

4. Data exploration

5. Data modeling

6. Presentation and automation

• Fig. 1.3.1 shows data science design process.

• Step 1: Discovery or Defining research goal

This step involves acquiring data from all the identified internal and external sources, which helps to answer the business question.

• Step 2: Retrieving data

It collection of data which required for project. This is the process of gaining a business understanding of the data user have and deciphering what each piece of data means. This could entail determining exactly what data is required and the best methods for obtaining it. This also entails determining what each of the data points means in terms of the company. If we have given a data set from a client, for example, we shall need to know what each column and row represents.

• Step 3: Data preparation

Data can have many inconsistencies like missing values, blank columns, an incorrect data format, which needs to be cleaned. We need to process, explore and condition data before modeling. The cleandata, gives the better predictions.

• Step 4: Data exploration

Data exploration is related to deeper understanding of data. Try to understand how variables interact with each other, the distribution of the data and whether there are outliers. To achieve this use descriptive statistics, visual techniques and simple modeling. This steps is also called as Exploratory Data Analysis.

• Step 5: Data modeling

In this step, the actual model building process starts. Here, Data scientist distributes datasets for training and testing. Techniques like association, classification and clustering are applied to the training data set. The model, once prepared, is tested against the "testing" dataset.

• Step 6: Presentation and automation

Deliver the final baselined model with reports, code and technical documents in this stage. Model is deployed into a real-time production environment after thorough testing. In this stage, the key findings are communicated to all stakeholders. This helps to decide if the project results are a success or a failure based on the inputs from the model.

Foundation of Data Science: Unit I: Introduction : Tag: : - Data Science Process