A Data Warehouse Tutorial - SQL, database, informix, oracle, SQL Server

OLAP,OLTP,MOLAP,ROLAP,HOLAP,tutorial,explains

SQL Stats

Code: 31,327 lines
Jobs: 372 postings

Sponsored by:

You are in:

Latest Code Ticker for SQL.

Unix Date Convertor Function
By George Graff on 10/23

Insert pipe delimited rows into multiple rows.
By Charles Toepfer on 10/21

Logfiles by PL/SQL
By Stephan Rechberger on 10/21

To display a name in a default language if the given one doesn't exist
By Serge Alard on 10/18

Order by column except a few values
By Serge Alard on 10/18

Introduction to PL/SQL (Series 3) Cursors
By David Nishimoto on 10/14

Sorting a String using T-SQL
By Gaurav Pugalia on 10/12

Protecting against TSQL virii, worms and time bombs
By Joseph Gama on 10/11

Get size in bytes of SP, View, Trigger, UDF or Rule
By Joseph Gama on 10/11

(Screen Shot)

Click here to put this ticker on your site!

Add this ticker to your desktop!

Daily Code Email

To join the 'Code of the Day' Mailing List click here!

Affiliate Sites

A Data Warehouse Tutorial

	Print
	Email

Submitted on: 12/27/2001 6:14:26 PM
By: Visual-Basic-Data-Mining.NET/DesktopDefault.aspx
Level: Beginner
User Rating:

By 9 Users
Compatibility:SQL Server 7.0, Other

Users have accessed this article 22470 times.

(About the author)

A tutorial that explains data warehousing concepts. Concepts include data warehouse, data mart, benefits of data warehousing, OLAP, OLTP, dimensions, fact tables, measures, aggregations, cubes, PivotTableŽ Service, offline OLAP cubes, virtual cubes, MOLAP cubes, ROLAP cubes, HOLAP cubes. Attached is this article in MicrsoftŽ Word™ format. Submitted by Pareto Analyst (www.paretoanalysts.com)

This article has accompanying files

Terms of Agreement:
By using this article, you agree to the following terms...
1) You may use this article in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.
2) You MAY NOT redistribute this article (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.
3) You may link to this article from another website, but ONLY if it is not wrapped in a frame.
4) You will abide by any additional copyright restrictions which the author may have placed in the article or article's description.

<%@ Language=JavaScript %> Data Warehousing

submitted BY: WWW.PARETOANALYSTS.COM

What is a data warehouse?
A data warehouse is a collection of data marts representing historical data from different operations in the company. This data is stored in a structure optimized for querying and data analysis as a data warehouse. Table design, dimensions and organization should be consistent throughout a data warehouse so that reports or queries across the data warehouse are consistent. A data warehouse can also be viewed as a database for historical data from different functions within a company.

What is a data mart?
A data mart is a segment of a data warehouse that can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales, payroll, production. Data marts are sometimes complete individual data warehouses which are usually smaller than the corporate data warehouse.

What are the benefits of data warehousing?
Data warehouses are designed to perform well with aggregate queries running on large amounts of data.

The structure of data warehouses is easier for end users to navigate, understand and query against unlike the relational databases primarily designed to handle lots of transactions.

Data warehouses enable queries that cut across different segments of a company's operation. E.g. production data could be compared against inventory data even if they were originally stored in different databases with different structures.

Queries that would be complex in very normalized databases could be easier to build and maintain in data warehouses, decreasing the workload on transaction systems.

Data warehousing is an efficient way to manage and report on data that is from a variety of sources, non uniform and scattered throughout a company.

Data warehousing is an efficient way to manage demand for lots of information from lots of users.

Data warehousing provides the capability to analyze large amounts of historical data for nuggets of wisdom that can provide an organization with competitive advantage.

What is OLAP?
OLAP stands for Online Analytical Processing.
It uses database tables (fact and dimension tables) to enable multidimensional viewing, analysis and querying of large amounts of data. E.g. OLAP technology could provide management with fast answers to complex queries on their operational data or enable them to analyze their company's historical data for trends and patterns.

What is OLTP?
OLTP stands for Online Transaction Processing.
OLTP uses normalized tables to quickly record large amounts of transactions while making sure that these updates of data occur in as few places as possible. Consequently OLTP database are designed for recording the daily operations and transactions of a business. E.g. a timecard system that supports a large production environment must record successfully a large number of updates during critical periods like lunch hour, breaks, startup and close of work.

What are dimensions?
Dimensions are categories by which summarized data can be viewed. E.g. a profit summary in a fact table can be viewed by a Time dimension (profit by month, quarter, year), Region dimension (profit by country, state, city), Product dimension (profit for product1, product2).

What are fact tables?
A fact table is a table that contains summarized numerical and historical data (facts) and a multipart index composed of foreign keys from the primary keys of related dimension tables.

What are measures?
Measures are numeric data based on columns in a fact table. They are the primary data which end users are interested in. E.g. a sales fact table may contain a profit measure which represents profit on each sale.

What are aggregations?
Aggregations are precalculated numeric data. By calculating and storing the answers to a query before users ask for it, the query processing time can be reduced. This is key in providing fast query performance in OLAP.

What are cubes?
Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provide multidimensional views of data, querying and analytical capabilities to clients.

What is the PivotTableŽ Service?
This is the primary component that connects clients to the MicrosoftŽ SQL Server™ 2000 Analysis Server. It also provides the capability for clients to create local offline cubes using it as an OLAP server. PivotTableŽ Service does not have a user interface, the clients using its services has to provide its user interface.

What are offline OLAP cubes?
These are OLAP cubes created by clients, end users or third-party applications accessing a data warehouse, relational database or OLAP cube through the MicrosoftŽ PivotTableŽ Service. E.g. MicrosoftŽ Excel™ is very popular as a client for creating offline local OLAP cubes from relational databases for multidimensional analysis. These cubes have to be maintained and managed by the end users who have to manually refresh their data.

What are virtual cubes?
These are combinations of one or more real cubes and require no disk space to store them. They store only the definitions and not the data of the referenced source cubes. They are similar to views in relational databases.

What are MOLAP cubes?
MOLAP Cubes: stands for Multidimensional OLAP. In MOLAP cubes the data aggregations and a copy of the fact data are stored in a multidimensional structure on the Analysis Server computer. It is best when extra storage space is available on the Analysis Server computer and the best query performance is desired. MOLAP local cubes contain all the necessary data for calculating aggregates and can be used offline. MOLAP cubes provide the fastest query response time and performance but require additional storage space for the extra copy of data from the fact table.

What are ROLAP cubes?
ROLAP Cubes: stands for Relational OLAP. In ROLAP cubes a copy of data from the fact table is not made and the data aggregates are stored in tables in the source relational database. A ROLAP cube is best when there is limited space on the Analysis Server and query performance is not very important. ROLAP local cubes contain the dimensions and cube definitions but aggregates are calculated when they are needed. ROLAP cubes requires less storage space than MOLAP and HOLAP cubes.

What are HOLAP cubes?
HOLAP Cubes: stands for Hybrid OLAP. A ROLAP cube has a combination of the ROLAP and MOLAP cube characteristics. It does not create a copy of the source data however, data aggregations are stored in a multidimensional structure on the Analysis Server computer. HOLAP cubes are best when storage space is limited but faster query responses are needed.

What is the approximate size of a data warehouse?

You can estimate the approximate size of a data warehouse made up of only fact and dimension tables by estimating the approximate size of the fact tables and ignoring the sizes of the dimension tables.

To estimate the size of the fact table in bytes, multiply the size of a row by the number of rows in the fact table. A more exact estimate would include the data types, indexes, page sizes, etc. An estimate of the number of rows in the fact table is obtained by multiplying the number of transactions per hour by the number of hours in a typical work day and then multiplying the result by the number of days in a year and finally multiply this result by the number of years of transactions involved. Divide this result by 1024 to convert to kilobytes and by 1024 again to convert to megabytes.

E.g. A data warehouse will store facts about the help provided by a companyâs product support representatives. The fact table is made of up of a composite key of 7 indexes (int data type) including the primary key. The fact table also contains 1 measure of time (datetime data type) and another measure of duration (int data type). 2000 product incidents are recorded each hour in a relational database. A typical work day is 8 hours and support is provided for every day in the year. What will be approximate size of this data warehouse in 5 years?

First calculate the approximate size of a row in bytes (int data type = 4 bytes, datetime data type = 8 bytes):

size of a row = size of all composite indexes (add the size of all indexes) + size of all measures (add the size of all measures).

Size of a row (bytes) = (4 * 7) + (8 + 4).

Size of a row (bytes) = 40 bytes.

Number of rows in fact table = (number of transactions per hour) * (8 hours) * (365 days in a year).

Number of rows in fact table = (2000 product incidents per hour) * (8 hours ) * (365 days in a year).

Number of rows in fact table = 2000 * 8 * 365

Number of rows in fact table = 5840000

Size of fact table (1 year) = (Number of rows in fact table) * (Size of a row)

Size of fact table (bytes per year) = 5840000 * 40

Size of fact table (bytes per year) = 233600000.

Size of fact table (megabytes per year) = 233600000 / (1024*1024)

Size of fact table (in megabytes for 5 years) = (23360000 * 5) / (1024 *1024)

Size of fact table (megabytes) = 1113.89 MB

Size of fact table (gigabytes) = 1113.89 / 1024

Size of fact table (gigabytes) = 1.089 GB

Download article

Note: Due to the size or complexity of this submission, the author has submitted it as a .zip file to shorten your download time. Afterdownloading it, you will need a program like Winzipto decompress it.

Virus note:All files are scanned once-a-day by Planet Source Code for viruses,but new viruses come out every day, so no prevention program can catch 100% of them.

FOR YOUR OWN SAFETY, PLEASE:
1)Re-scan downloaded files using your personal virus checker before using it.
2)NEVER, EVER run compiled files (.exe's, .ocx's, .dll's etc.)--only run source code.

If you don't have a virus scanner, you can get one at many places on the net including:McAfee.com

Other 4 submission(s) by this author

Report Bad Submission

Your Vote!

Other User Comments

There are no comments on this submission.

Add Your Feedback!

Note:Not only will your feedback be posted, but an email will be sent to the code's author in your name.

NOTICE: The author of this article has been kind enough to share it with you. If you have a criticism, please state it politely or it will be deleted.

For feedback not related to this particular article, please click here.