Foundation of Data Science: Unit I: Introduction

Two marks Questions with Answers

Introduction | Foundation of Data Science

Data science is an interdisciplinary field that seeks to extract knowledge or insights from various forms of data.

Two Marks Questions with Answers

Q.1 What is data science?

Ans;

• Data science is an interdisciplinary field that seeks to extract knowledge or insights from various forms of data.

• At its core, data science aims to discover and extract actionable knowledge from data that can be used to make sound business decisions and predictions.

• Data science uses advanced analytical theory and various methods such as time series analysis for predicting future.

Q.2 Define structured data.

Ans. Structured data is arranged in rows and column format. It helps for application to retrieve and process data easily. Database management system is used for storing structured data. The term structured data refers to data that is identifiable because it is organized in a structure.

Q.3 What is data?

Ans. Data set is collection of related records or information. The information may be on some entity or some subject area.

Q.4 What is unstructured data ?

Ans. Unstructured data is data that does not follow a specified format. Row and columns are not used for unstructured data. Therefore it is difficult to retrieve required information. Unstructured data has no identifiable structure.

Q.5 What is machine - generated data ?

Ans. Machine-generated data is an information that is created without human interaction as a result of a computer process or application activity. This means that data entered manually by an end-user is not recognized to be machine-generated.

Q.6 Define streaming data.

Ans; Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously and in small sizes (order of Kilobytes).

 Q.7 List the stages of data science process.

Ans.: Stages of data science process are as follows:

1. Discovery or Setting the research goal

2. Retrieving data

3. Data preparation

4. Data exploration

5. Data modeling

6. Presentation and automation

Q.8 What are the advantages of data repositories?

Ans.: Advantages are as follows:

i. Data is preserved and archived.

ii. Data isolation allows for easier and faster data reporting.

iii. Database administrators have easier time tracking problems.

iv. There is value to storing and analyzing data.

Q.9 What is data cleaning?

Ans. Data cleaning means removing the inconsistent data or noise and collecting necessary information of a collection of interrelated data.

Q.10 What is outlier detection?

Ans. : Outlier detection is the process of detecting and subsequently excluding outliers from a given set of data. The easiest way to find outliers is to use a plot or a table with the minimum and maximum values.

Q.11 Explain exploratory data analysis.

Ans. : Exploratory Data Analysis (EDA) is a general approach to exploring datasets by means of simple summary statistics and graphic visualizations in order to gain a deeper understanding of data. EDA is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods.

Q.12 Define data mining.

Ans. : Data mining refers to extracting or mining knowledge from large amounts of data. It is a process of discovering interesting patterns or Knowledge from a large amount of data stored either in databases, data warehouses, or other information repositories.

Q.13 What are the three challenges to data mining regarding data mining methodology?

Ans. Challenges to data mining regarding data mining methodology include the following:

1. Mining different kinds of knowledge in databases,

2. Interactive mining of knowledge at multiple levels of abstraction,

3. Incorporation of background knowledge.

Q.14 What is predictive mining?

Ans. Predictive mining tasks perform inference on the current data in order to make predictions. Predictive analysis provides answers of the future queries that move across using historical data as the chief principle for decisions.

Q.15 What is data cleaning?

Ans. Data cleaning means removing the inconsistent data or noise and collecting necessary information of a collection of interrelated data.

Q.16 List the five primitives for specifying a data mining task.

Ans. :

1. The set of task-relevant data to be mined

2. The kind of knowledge to be mined

3. The background knowledge to be used in the discovery process

4. The interestingness measures and thresholds for pattern evaluation

5. The expected representation for visualizing the discovered pattern.

 Q.17 List the stages of data science process.

Ans. Data science process consists of six stages:

1. Discovery or Setting the research goal  2. Retrieving data  3. Data preparation

4. Data exploration  5. Data modeling   6. Presentation and automation

Q.18 What is data repository?

Ans. Data repository is also known as a data library or data archive. This is a general term to refer to a data set isolated to be mined for data reporting and analysis. The data repository is a large database infrastructure, several databases that collect, manage and store data sets for data analysis, sharing and reporting.

Q.19 List the data cleaning tasks?

Ans. Data cleaning are as follows:

1. Data acquisition and metadata

2. Fill in missing values

3. Unified date format

4. Converting nominal to numeric

5. Identify outliers and smooth out noisy data

6. Correct inconsistent data

Q.20 What is Euclidean distance ?

Ans. Euclidean distance is used to measure the similarity between observations. It is calculated as the square root of the sum of differences between each point.

Foundation of Data Science: Unit I: Introduction : Tag: : Introduction | Foundation of Data Science - Two marks Questions with Answers