Data science is an interdisciplinary field that seeks to extract knowledge or insights from various forms of data.
Two Marks
Questions with Answers
Q.1 What is data science?
Ans;
• Data
science is an interdisciplinary field that seeks to extract knowledge or
insights from various forms of data.
• At its
core, data science aims to discover and extract actionable knowledge from data
that can be used to make sound business decisions and predictions.
• Data
science uses advanced analytical theory and various methods such as time series
analysis for predicting future.
Q.2 Define structured data.
Ans. Structured data is arranged
in rows and column format. It helps for application to retrieve and process
data easily. Database management system is used for storing structured data.
The term structured data refers to data that is identifiable because it is
organized in a structure.
Q.3 What is data?
Ans. Data set is collection of
related records or information. The information may be on some entity or some
subject area.
Q.4 What is unstructured data ?
Ans. Unstructured data is data
that does not follow a specified format. Row and columns are not used for
unstructured data. Therefore it is difficult to retrieve required information.
Unstructured data has no identifiable structure.
Q.5 What is machine - generated data ?
Ans. Machine-generated data is an
information that is created without human interaction as a result of a computer
process or application activity. This means that data entered manually by an
end-user is not recognized to be machine-generated.
Q.6 Define streaming data.
Ans; Streaming data is data that
is generated continuously by thousands of data sources, which typically send in
the data records simultaneously and in small sizes (order of Kilobytes).
Q.7 List
the stages of data science process.
Ans.: Stages of data science
process are as follows:
1.
Discovery or Setting the research goal
2.
Retrieving data
3. Data
preparation
4. Data
exploration
5. Data
modeling
6.
Presentation and automation
Q.8 What are the advantages of data repositories?
Ans.: Advantages are as
follows:
i. Data
is preserved and archived.
ii. Data
isolation allows for easier and faster data reporting.
iii.
Database administrators have easier time tracking problems.
iv.
There is value to storing and analyzing data.
Q.9 What is data cleaning?
Ans. Data cleaning means
removing the inconsistent data or noise and collecting necessary information of
a collection of interrelated data.
Q.10 What is outlier detection?
Ans. : Outlier detection is the
process of detecting and subsequently excluding outliers from a given set of
data. The easiest way to find outliers is to use a plot or a table with the
minimum and maximum values.
Q.11 Explain exploratory data analysis.
Ans. : Exploratory Data Analysis
(EDA) is a general approach to exploring datasets by means of simple summary
statistics and graphic visualizations in order to gain a deeper understanding
of data. EDA is used by data scientists to analyze and investigate data sets
and summarize their main characteristics, often employing data visualization
methods.
Q.12 Define data mining.
Ans. : Data mining refers to
extracting or mining knowledge from large amounts of data. It is a process of
discovering interesting patterns or Knowledge from a large amount of data
stored either in databases, data warehouses, or other information repositories.
Q.13 What are the three challenges to data
mining regarding data mining methodology?
Ans. Challenges to data mining
regarding data mining methodology include the following:
1.
Mining different kinds of knowledge in databases,
2.
Interactive mining of knowledge at multiple levels of abstraction,
3.
Incorporation of background knowledge.
Q.14 What is predictive mining?
Ans. Predictive mining tasks
perform inference on the current data in order to make predictions. Predictive
analysis provides answers of the future queries that move across using
historical data as the chief principle for decisions.
Q.15 What is data cleaning?
Ans. Data cleaning means removing
the inconsistent data or noise and collecting necessary information of a
collection of interrelated data.
Q.16 List the five primitives for specifying a
data mining task.
Ans. :
1. The
set of task-relevant data to be mined
2. The
kind of knowledge to be mined
3. The
background knowledge to be used in the discovery process
4. The
interestingness measures and thresholds for pattern evaluation
5. The
expected representation for visualizing the discovered pattern.
Q.17
List the stages of data science process.
Ans. Data science process
consists of six stages:
1.
Discovery or Setting the research goal 2. Retrieving data 3. Data preparation
4. Data
exploration 5. Data modeling 6. Presentation and automation
Q.18 What is data repository?
Ans. Data repository is also
known as a data library or data archive. This is a general term to refer to a
data set isolated to be mined for data reporting and analysis. The data
repository is a large database infrastructure, several databases that collect,
manage and store data sets for data analysis, sharing and reporting.
Q.19 List the data cleaning tasks?
Ans. Data cleaning are as
follows:
1. Data
acquisition and metadata
2. Fill
in missing values
3.
Unified date format
4.
Converting nominal to numeric
5.
Identify outliers and smooth out noisy data
6.
Correct inconsistent data
Q.20 What is Euclidean distance ?
Foundation of Data Science: Unit I: Introduction : Tag: : Introduction | Foundation of Data Science - Two marks Questions with Answers
Foundation of Data Science
CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation