Understanding Data Warehousing
Data has always been an essential ingredient to decision-making and, in modern business, the need to obtain, store, and use data has increased dramatically as the complexities and scope of the global marketplace has expanded.
Data warehousing is an environment established for the sole purpose of gathering, integrating, and delivering data from across multiple data sources for use in enterprise decision-making. However, its effectiveness can be expanded to support any person, process, or system needing current and historical data which is consistent and relatable.
Defining Data Warehouse
Data warehouse is a computing environment composed of several technologies and products, including:
Defining Data Warehouse (Part 2)
Data Warehousing is about managing the data. The following data features are key reasons for having a data warehouse:
Benefits of Data Warehousing
Data Warehousing provides the following benefits:
A comprehensive and integrated perspective of the enterprise
Availability of current and historical information for strategic decision making
Mitigating operational risks related to supporting the decision-making process
Providing a flexible and interactive source of information
Introducing Business Intelligence
Business Intelligence is a set of disciplines designed specifically to establish a consistent decision-making environment.
Business Intelligence does not replace Data Warehousing, but uses it extensively in it processes.
Business Intelligence can be described as a two-step process:
Transforming data into information
Transforming information into knowledge
Functional Components of a Data Warehouse
Physical Components of a Data Warehouse
Data Sources can include:
Data can consist of structured or unstructured, prepared or raw formats.
The activities of data staging are:
Extracting data from the data sources
Transforming the data into usable information
Loading the data and metadata into data storage
ETL (Data Extraction, Transformation, and Loading) is considered the most time-consuming and human-intensive activities in data warehousing.
One purpose of Data Staging is to raise the quality of the data used in decision making: bad data will lead to bad decisions.
Data Quality is influenced by:
Inadequate database designs
Aging of data
Dummy or absent data
Ineffective primary keys
Violation of business rules
Lack of policies and procedures
Organizations must establish the storage requirements for:
Corporate data warehouse
Individual data marts
OLAP-based multidimensional databases
The requirements for Information Delivery reside in expectations related to:
Query types and frequencies
Report types and frequencies
Types of analysis
Distribution of information
Applications for decision support
Potential growth and expansion
The core of the data warehouse is its
What is a Data Mart?
A data mart is a subset of a data warehouse. A data warehouse will typically contain data relevant to the entire enterprise, while a data mart contains data relevant to a line of business or department within the enterprise.
Deployment of data warehouses and data marts will usually take one of the following approaches:
Top-down (data warehouse first, data marts second)
Bottom-up (data marts first, data warehouse second)
Data Warehouse Architecture
There are five basic architectures in data warehousing:
Centralized Data Warehouse – one data warehouse with no data marts.
Independent Data Marts – several autonomous data marts with no central data warehouse.
Federated Data Marts – several data marts operating under standardized controls with no central warehouse.
Hub-and-Spoke – several data marts with a central data warehouse.
Data-Mart Bus – several data marts are created and conform to the standards and controls of the original data mart.
Why Data Warehousing?
What does a data warehouse provide the user?
Ability to run simple queries and reports against current and historical data
Ability to perform “what if” scenarios
Ability to iteratively query and analyze deeper into the data
Ability to identify historical trends and apply them effectively to future situations.
Challenges in Data Acquisition
The typical challenges facing data acquisition activities are:
Large number of data sources
Disparate data sources
External data sources
Ongoing data feeds
Different computing platforms
Complex data transformations
Challenges in Data Storage
The typical challenges facing data storage activities are:
Large data volumes
Large data sets
New data types
Data storage in staging area
Multiple index types
Challenges in Information Delivery
The typical challenges facing information delivery activities are:
Multiple user types
Multiple query types
Tools from multiple vendors
Relevant Data Warehouse Standards
Relevant standards for data warehousing, specifically metadata, are provided through:
Meta Data Coalition
Object Management Group
OLAP Council for Multi-dimensional Application Programmers Interface (MDAPI)
Basic Project Plan
The basic plan for a data warehouse project is:
The Toolkit is designed to be holistic to the enterprise’s relationship with data, not just data warehousing. As part of its scope, a second presentation is available to introduce Data Analytics and Data Mining, which is related to the second step of Business Intelligence.
The goal of the Data Warehouse/Analytics Toolkit is to define the contributing factors, major components, and their relationships, while providing the basic tools to take action based on the organization’s needs.
The participant can take two directions in using the toolkit at this point. To continue with the data warehouse discussion, the next document of interest is, Developing Warehouse Capabilities, which is intended to be a step-by-step guide in creating a Big Data foundation in your organization. To learn more about data-related activities within an enterprise, see the presentation, Introduction to Data Analytics and Mining.
. Multiple templates have been created to support the process and aid organizations in their efforts to improve their Data Warehouse and Data Analytic capabilities.