Data is collection of facts and figures which relay something specific, but which are not organized in any way.
UNIT II : Describing
Data
Syllabus
Types of Data - Types of Variables - Describing Data with Tables and Graphs -Describing Data with Averages - Describing Variability - Normal Distributions and Standard (z) Scores.
Types of Data
• Data
is collection of facts and figures which relay something specific, but which
are not organized in any way. It can be numbers, words, measurements,
observations or even just descriptions of things. We can say, data is raw
material in the production of information.
• Data
set is collection of related records or information. The information may be on
some entity or some subject area.
•
Collection of data objects and their attributes. Attributes captures the basic
characteristics of an object
• Each
row of a data set is called a record. Each data set also has multiple
attributes, each of which gives information on a specific characteristic.
• Data
can broadly be divided into following two types: Qualitative data and
quantitative data.
Qualitative data:
•
Qualitative data provides information about the quality of an object or
information which cannot be measured. Qualitative data cannot be expressed as a
number. Data that represent nominal scales such as gender, economic status,
religious preference are usually considered to be qualitative data.
•
Qualitative data is data concerned with descriptions, which can be observed but
cannot be computed. Qualitative data is also called categorical data.
Qualitative data can be further subdivided into two types as follows:
1.
Nominal data
2.
Ordinal data
Qualitative data:
•
Qualitative data is the one that focuses on numbers and mathematical
calculations and can be calculated and computed.
• Qualitative
data are anything that can be expressed as a number or quantified. Examples of
quantitative data are scores on achievement tests, number of hours of study or
weight of a subject. These data may be represented by ordinal, interval or
ratio scales and lend themselves to most statistical manipulation.
• There
are two types of qualitative data: Interval data and ratio data.
1. Advantages:
• It
helps in-depth analysis
•
Qualitative data helps the market researchers to understand the mindset of
their
customers.
• Avoid
pre-judgments
2. Disadvantages:
• Time
consuming
• Not
easy to generalize
•
Difficult to make systematic comparisons
1. Advantages:
• Easier
to summarize and make comparisons.
• It is
often easier to obtain large sample sizes
• It is
less time consuming since it is based on statistical analysis.
2. Disadvantages:
• The
cost is relatively high.
• There
is no accurate generalization of data the researcher received
• Ranked
data is a variable in which the value of the data is captured from an ordered
set, which is recorded in the order of magnitude. Ranked data is also called as
Ordinal data.
•
Ordinal represents the "order." Ordinal data is known as qualitative
data or categorical data. It can be grouped, named and also ranked.
• Characteristics
of the Ranked data:
a) The
ordinal data shows the relative ranking of the variables
b) It
identifies and describes the magnitude of a variable
c) Along
with the information provided by the nominal scale, ordinal scales give the
rankings of those variables
d) The
interval properties are not known
e) The
surveyors can quickly analyze the degree of agreement concerning the identified
order of variables
• Examples:
a)
University ranking : 1st, 9th, 87th...
b)
Socioeconomic status: poor, middle class, rich.
c) Level
of agreement: yes, maybe, no.
d) Time
of day: dawn, morning, noon, afternoon, evening, night
• Scales
of measurement, also called levels of measurement. Each level of measurement
scale has specific properties that determine the various use of statistical
analysis.
• There
are four different scales of measurement. The data can be defined as being one
of the four scales. The four types of scales are: Nominal, ordinal, interval
and ratio.
Nominal
• A
nominal data is the 1 level of measurement scale in which the numbers serve as
"tags" or "labels" to classify or identify the objects.
• A
nominal data usually deals with the non-numeric variables or the numbers that
do not have any value. While developing statistical models, nominal data are
usually transformed before building the model.
• It is
also known as categorical variables.
Characteristics of nominal data:
1. A
nominal data variable is classified into two or more categories. In this
measurement mechanism, the answer should fall into either of the classes.
2. It is
qualitative. The numbers are used here to identify the objects.
3. The
numbers don't define the object characteristics. The only permissible aspect of
numbers in the nominal scale is "counting".
• Example:
1. Gender:
Male, female, other.
2. Hair
Color: Brown, black, blonde, red, other.
Interval
• Interval
data corresponds to a variable in which the value is chosen from an interval
set.
• It is
defined as a quantitative measurement scale in which the difference between the
two variables is meaningful. In other words, the variables are measured in an
exact manner, not as in a relative way in which the presence of zero is
arbitrary.
• Characteristics
of interval data:
a) The
interval data is quantitative as it can quantify the difference between the values.
b) It
allows calculating the mean and median of the variables.
c) To
understand the difference between the variables, you can subtract the values
between the variables.
d) The
interval scale is the preferred scale in statistics as it helps to assign any
numerical values to arbitrary assessment such as feelings, calender types, etc.
•
Examples:
1.
Celsius temperature
2.
Fahrenheit temperature
3. Time
on a clock with hands.
Ratio
• Any
variable for which the ratios can be computed and are meaningful is called
ratio data.
• It is
a type of variable measurement scale. It allows researchers to compare the
differences or intervals. The ratio scale has a unique feature. It processes
the character of the origin or zero points.
• Characteristics of ratio data:
a) Ratio
scale has a feature of absolute zero.
b) It
doesn't have negative numbers, because of its zero-point feature.
c) It
affords unique opportunities for statistical analysis. The variables can be
orderly added, subtracted, multiplied, divided. Mean, median and mode can be
calculated using the ratio scale.
d) Ratio
data has unique and useful properties. One such feature is that it allows unit
conversions like kilogram - calories, gram - calories, etc.
• Examples:
Age, weight, height, ruler measurements, number of children.
Example 2.1.1: Indicate whether each of
the following terms is qualitative; ranked or quantitative:
(a)
ethnic group
(b)
academic major
(c) age
(d)
family size
(e) net
worth (in Rupess)
(f)
temperature
(g)
sexual preference
(h)
second-place finish
(i) IQ
score
(j)
gender
Solution :
(a)
ethnic group→
Qualitative
(b) age → Quantitative
(c)
family size →
Quantitative
(d)
academic major →
Qualitative
(e)
sexual preference →
Qualitative
(f) IQ
score →
Quantitative
(g) net
worth (in Rupess) →
Quantitative
(h)
second-place finish → ranked
(i)
gender →
Qualitative
(j)
temperature →
Quantitative
Foundation of Data Science: Unit II: Describing Data : Tag: : Describing Data | Data Science - Types of Data
Foundation of Data Science
CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation