Foundation of Data Science: Unit II: Describing Data

Types of Data

Describing Data | Data Science

Data is collection of facts and figures which relay something specific, but which are not organized in any way.

UNIT II : Describing Data

Syllabus

Types of Data - Types of Variables - Describing Data with Tables and Graphs -Describing Data with Averages - Describing Variability - Normal Distributions and Standard (z) Scores.

Types of Data

• Data is collection of facts and figures which relay something specific, but which are not organized in any way. It can be numbers, words, measurements, observations or even just descriptions of things. We can say, data is raw material in the production of information.

• Data set is collection of related records or information. The information may be on some entity or some subject area.

• Collection of data objects and their attributes. Attributes captures the basic characteristics of an object

• Each row of a data set is called a record. Each data set also has multiple attributes, each of which gives information on a specific characteristic.

Qualitative and Quantitative Data

• Data can broadly be divided into following two types: Qualitative data and quantitative data.

Qualitative data:

• Qualitative data provides information about the quality of an object or information which cannot be measured. Qualitative data cannot be expressed as a number. Data that represent nominal scales such as gender, economic status, religious preference are usually considered to be qualitative data.

• Qualitative data is data concerned with descriptions, which can be observed but cannot be computed. Qualitative data is also called categorical data. Qualitative data can be further subdivided into two types as follows:

1. Nominal data

2. Ordinal data

Qualitative data:

• Qualitative data is the one that focuses on numbers and mathematical calculations and can be calculated and computed.

• Qualitative data are anything that can be expressed as a number or quantified. Examples of quantitative data are scores on achievement tests, number of hours of study or weight of a subject. These data may be represented by ordinal, interval or ratio scales and lend themselves to most statistical manipulation.

• There are two types of qualitative data: Interval data and ratio data.

Difference between Qualitative and Quantitative Data

Advantages and Disadvantages of Qualitative Data

1. Advantages:

• It helps in-depth analysis

• Qualitative data helps the market researchers to understand the mindset of their

customers.

• Avoid pre-judgments

2. Disadvantages:

• Time consuming

• Not easy to generalize

• Difficult to make systematic comparisons

Advantages and Disadvantages of Quantitative Data

1. Advantages:

• Easier to summarize and make comparisons.

• It is often easier to obtain large sample sizes

• It is less time consuming since it is based on statistical analysis.

2. Disadvantages:

• The cost is relatively high.

• There is no accurate generalization of data the researcher received

Ranked Data

• Ranked data is a variable in which the value of the data is captured from an ordered set, which is recorded in the order of magnitude. Ranked data is also called as Ordinal data.

• Ordinal represents the "order." Ordinal data is known as qualitative data or categorical data. It can be grouped, named and also ranked.

• Characteristics of the Ranked data:

a) The ordinal data shows the relative ranking of the variables

b) It identifies and describes the magnitude of a variable

c) Along with the information provided by the nominal scale, ordinal scales give the rankings of those variables

d) The interval properties are not known

e) The surveyors can quickly analyze the degree of agreement concerning the identified order of variables

• Examples:

a) University ranking : 1st, 9th, 87th...

b) Socioeconomic status: poor, middle class, rich.

c) Level of agreement: yes, maybe, no.

d) Time of day: dawn, morning, noon, afternoon, evening, night

Scale of Measurement

• Scales of measurement, also called levels of measurement. Each level of measurement scale has specific properties that determine the various use of statistical analysis.

• There are four different scales of measurement. The data can be defined as being one of the four scales. The four types of scales are: Nominal, ordinal, interval and ratio.

Nominal

• A nominal data is the 1 level of measurement scale in which the numbers serve as "tags" or "labels" to classify or identify the objects.

• A nominal data usually deals with the non-numeric variables or the numbers that do not have any value. While developing statistical models, nominal data are usually transformed before building the model.

• It is also known as categorical variables.

Characteristics of nominal data:

1. A nominal data variable is classified into two or more categories. In this measurement mechanism, the answer should fall into either of the classes.

2. It is qualitative. The numbers are used here to identify the objects.

3. The numbers don't define the object characteristics. The only permissible aspect of numbers in the nominal scale is "counting".

• Example:

1. Gender: Male, female, other.

2. Hair Color: Brown, black, blonde, red, other.

Interval

• Interval data corresponds to a variable in which the value is chosen from an interval set.

• It is defined as a quantitative measurement scale in which the difference between the two variables is meaningful. In other words, the variables are measured in an exact manner, not as in a relative way in which the presence of zero is arbitrary.

• Characteristics of interval data:

a) The interval data is quantitative as it can quantify the difference between the values.

b) It allows calculating the mean and median of the variables.

c) To understand the difference between the variables, you can subtract the values between the variables.

d) The interval scale is the preferred scale in statistics as it helps to assign any numerical values to arbitrary assessment such as feelings, calender types, etc.

• Examples:

1. Celsius temperature

2. Fahrenheit temperature

3. Time on a clock with hands.

Ratio

• Any variable for which the ratios can be computed and are meaningful is called ratio data.

• It is a type of variable measurement scale. It allows researchers to compare the differences or intervals. The ratio scale has a unique feature. It processes the character of the origin or zero points.

• Characteristics of ratio data:

a) Ratio scale has a feature of absolute zero.

b) It doesn't have negative numbers, because of its zero-point feature.

c) It affords unique opportunities for statistical analysis. The variables can be orderly added, subtracted, multiplied, divided. Mean, median and mode can be calculated using the ratio scale.

d) Ratio data has unique and useful properties. One such feature is that it allows unit conversions like kilogram - calories, gram - calories, etc.

• Examples: Age, weight, height, ruler measurements, number of children.

Example 2.1.1: Indicate whether each of the following terms is qualitative; ranked or quantitative:

(a) ethnic group

(b) academic major

(c) age

(d) family size

(e) net worth (in Rupess)

(f) temperature

(g) sexual preference

(h) second-place finish

(i) IQ score

(j) gender

Solution :

(a) ethnic group Qualitative

(b) age Quantitative

(c) family size Quantitative

(d) academic major Qualitative

(e) sexual preference Qualitative

(f) IQ score Quantitative

(g) net worth (in Rupess) Quantitative

(h) second-place finish ranked

(i) gender Qualitative

(j) temperature Quantitative

Foundation of Data Science: Unit II: Describing Data : Tag: : Describing Data | Data Science - Types of Data