Data warehousing is the process of constructing and using a data warehouse.
Data
Warehousing
• Data
warehousing is the process of constructing and using a data warehouse. A data
warehouse is constructed by integrating data from multiple heterogeneous
sources that support analytical reporting, structured and/or ad hoc queries and
decision making. Data warehousing involves data cleaning, data integration and
data consolidations.
• A data
warehouse is a subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management's decision-making process. A data
warehouse stores historical data for purposes of decision support.
• A
database an application-oriented collection of data that is organized,
structured, coherent, with minimum and controlled redundancy, which may be
accessed by several users in due time.
• Data
warehousing provides architectures and tools for business executives to
systematically organize, understand and use their data to make strategic
decisions.
• A data
warehouse is a subject-oriented collection of data that is integrated,
time-variant, non-volatile, which may be used to support the decision-making
process.
• Data
warehouses are databases that store and maintain analytical data separately
from transaction-oriented databases for the purpose of decision support. Data
warehouses separate analysis workload from transaction workload and enable an
organization to consolidate data from several source.
• Data
organization in data warehouses is based on areas of interest, on the major
subjects of the organization: Customers, products, activities etc. databases
organize data based on enterprise applications resulted from its functions.
• The
main objective of a data warehouse is to support the decision-making system,
focusing on the subjects of the organization. The objective of a database is to
support the operational system and information is organized on applications and
processes.
• A data
warehouse usually stores many months or years of data to support historical
analysis. The data in a data warehouse is typically loaded through an
extraction, transformation and loading (ETL) process from multiple data
sources.
• Databases and data warehouses are
related but not the same.
• A
database is a way to record and access information from a single source. A
database is often handling real-time data to support day-to-day business
processes like transaction processing.
• A data warehouse is a
way to store historical information from multiple sources to allow you to
analyse and report on related data (e.g., your sales transaction data, mobile
app data and CRM data). Unlike a database, the information isn't updated in
real-time and is better for data analysis of broader trends.
• Modern
data warehouses are moving toward an Extract, Load, Transformation (ELT)
architecture in which all or most data transformation is performed on the
database that hosts the data warehouse.
• Goals
of data warehousing:
1. To
help reporting as well as analysis.
2.
Maintain the organization's historical information.
3. Be
the foundation for decision making.
"How are organizations using the
information from data warehouses ?"
• Most
of the organizations makes use of this information for taking business decision
like :
a)
Increasing customer focus: It is possible by performing analysis of customer
buying.
b)
Repositioning products and managing product portfolios by comparing the
performance of last year sales.
c)
Analysing operations and looking for sources of profit.
d)
Managing customer relationships, making environmental corrections and managing
the cost of corporate assets.
1.
Subject oriented Data are organized based on how the users refer to them. A
data warehouse can be used to analyse a particular subject area. For example,
"sales" can be a particular subject.
2.
Integrated: All inconsistencies regarding naming convention and value
representations are removed. For example, source A and source B may have
different ways of identifying a product, but in a data warehouse, there will be
only a single way of identifying a product.
3.
Non-volatile: Data are stored in read-only format and do not change over time.
Typical activities such as deletes, inserts and changes that are performed in
an operational application environment are completely non-existent in a DW
environment.
4. Time
variant : Data are not current but normally time series. Historical information
is kept in a data warehouse. For example, one can retrieve files from 3 months,
6 months, 12 months or even previous data from a data warehouse.
Key characteristics of a Data Warehouse
1. Data
is structured for simplicity of access and high-speed query performance.
2. End
users are time-sensitive and desire speed-of-thought response times.
3. Large
amounts of historical data are used.
4.
Queries often retrieve large amounts of data, perhaps many thousands of rows.
5. Both
predefined and ad hoc queries are common.
6. The
data load involves multiple sources and transformations.
• Data
warehouse architecture is a data storage framework's design of an organization.
A data warehouse architecture takes information from raw sets of data and
stores it in a structured and easily digestible format.
• Data
warehouse system is constructed in three ways. These approaches are classified
the number of tiers in the architecture.
a)
Single-tier architecture.
b)
Two-tier architecture.
c)
Three-tier architecture (Multi-tier architecture).
• Single
tier warehouse architecture focuses on creating a compact data set and
minimizing the amount of data stored. While it is useful for removing
redundancies. It is not effective for organizations with large data needs and
multiple streams.
•
Two-tier warehouse structures separate the resources physically available from
the warehouse itself. This is most commonly used in small organizations where a
server is used as a data mart. While it is more effective at storing and
sorting data. Two-tier is not scalable and it supports a minimal number of
end-users.
Three tier (Multi-tier) architecture:
• Three
tier architecture creates a more structured flow for data from raw sets to
actionable insights. It is the most widely used architecture for data warehouse
systems.
• Fig.
1.11.1 shows three tier architecture. Three tier architecture sometimes called
multi-tier architecture.
• The
bottom tier is the database of the warehouse, where the cleansed and
transformed data is loaded. The bottom tier is a warehouse database server.
• The
middle tier is the application layer giving an abstracted view of the database.
It arranges the data to make it more suitable for analysis. This is done with
an OLAP server, implemented using the ROLAP or MOLAP model.
• OLAPS
can interact with both relational databases and multidimensional databases,
which lets them collect data better based on broader parameters.
• The
top tier is the front-end of an organization's overall business intelligence
suite. The top-tier is where the user accesses and interacts with data via
queries, data visualizations and data analytics tools.
• The
top tier represents the front-end client layer. The client level which includes
the tools and Application Programming Interface (API) used for high-level data
analysis, inquiring and reporting. User can use reporting tools, query,
analysis or data mining tools.
1)
Business user: Business users require a data warehouse to view summarized data
from the past. Since these people are non-technical, the data may be presented
to them in an elementary form.
2) Store
historical data: Data warehouse is required to store the time variable data
from the past. This input is made to be used for various purposes.
3) Make
strategic decisions: Some strategies may be depending upon the data in the data
warehouse. So, data warehouse contributes to making strategic decisions.
4) For
data consistency and quality Bringing the data from different sources at a
commonplace, the user can effectively undertake to bring the uniformity and
consistency in data.
5) High
response time: Data warehouse has to be ready for somewhat unexpected loads and
types of queries, which demands a significant degree of flexibility and quick
response time.
a)
Understand business trends and make better forecasting decisions.
b) Data
warehouses are designed to perform well enormous amounts of data.
c) The
structure of data warehouses is more accessible for end-users to navigate,
understand and query.
d)
Queries that would be complex in many normalized databases could be easier to
build and maintain in data warehouses.
e) Data
warehousing is an efficient method to manage demand for lots of information
from lots of users.
f) Data
warehousing provide the capabilities to analyze a large amount of historical
data.
• Metadata
is simply defined as data about data. The data that is used to represent other
data is known as metadata. In data warehousing, metadata is one of the
essential aspects.
• We can
define metadata as follows:
a)
Metadata is the road-map to a data warehouse.
b)
Metadata in a data warehouse defines the warehouse objects.
c)
Metadata acts as a directory. This directory helps the decision support system
to locate the contents of a data warehouse.
• In a data
warehouse, we create metadata for the data names and definitions of a given
data warehouse. Along with this metadata, additional metadata is also created
for time-stamping any extracted data, the source of extracted data.
Why is metadata necessary in a data warehouse ?
a)
First, it acts as the glue that links all parts of the data warehouses.
b) Next,
it provides information about the contents and structures to the developers.
c)
Finally, it opens the doors to the end-users and makes the contents recognizable
in their terms.
• Fig.
1.11.2 shows warehouse metadata.
Foundation of Data Science: Unit I: Introduction : Tag: : Characteristics, Multitier Architecture, Needs, Benefits, Metadata - Data Warehousing
Foundation of Data Science
CS3352 3rd Semester CSE Dept | 2021 Regulation | 3rd Semester CSE Dept 2021 Regulation