Expert Consultancy from Yellow Pelican

Stock Market Analysis Project - Create Data Warehouse

A site about Talend

Create Data Warehouse

This Job is part of the Stock Market Analysis Project series of articles, and is intended to demonstrate the creation and maintenance of a real Data Warehouse, using Talend. Generally speaking, my preference for maintaining the database would be outside of Talend; however, it is a useful exercise to have Talend initially create the database structures from the available data.

In this initial release, we will create both a Date and Time Dimension, using reusable library Jobs; which are described in the Data Warehouse Cookbook.

Download CreateStockMarketDW

You may download CreateStockMarketDW and import (Job Designs->Import items) in to a Talend Project of your choice. This download is supported by Talend 5.2.2 and above.

Download

CreateStockMarketDW

This Job creates both a Date and Time Dimension in a MySQLDatabase. As our Data Warehouse expands, we will build on this.

This Job makes use of the reusable library Job LibContextReader, to provide connection information to a MySQL database. Before running this Job, you will need to install MySQL and create a database.

Job Configuration

This Job requires two configuration files to be created. If you haven't already done so, read the articles Project Configuration and Database Configuration now.

Job Design

The following screenshot shows the design of our Job. It is a fairly simple Job that: -

  • Establishes a connection to MySQL.
  • Executes LibMakeDimDate.
  • Creates MySQL table DIM_DATE and loads data.
  • Executes LibMakeDimTime.
  • Creates MySQL table DIM_TIME and loads data.

Image 1

Init (tJava)

This is simply a place-holder and records the start of the Job.

LibMySQLSharedConnection (tRunJob)

This library Job establishes a shared connection to the specified MySQL database. In the case of this Job, the default configuration file will be used, as described above.

MySQLConnection (tMySqlConnection)

Regerences the shared MySQL database connection name "MySQLSharedConnection", providing connectivity for the MySQL components used within this Job. Not that "MySQLSharedConnection" is the default name, created by LibMySQLSharedConnection.

Note that Auto Commit has been set to true (tRunJob(tMySqlConnection)->Advanced settings).

InitDimDate (tJava)

This is simply a place-holder and records the start of the Date Dimension Subjob.

LibMakeDimDate (tRunJob)

Calls the reusable library Job LibMakeDimDate, to generate the data required to populate DIM_DATE.

To make the generated data available, select Copy Child Job Schema (tRunJob(LibMakeDimDate)->Basic settings).

MapDimDate (tMap)

Maps the input to output data, without modification. This component acts as a place-holder an is where generated atributes may be dropped, modified or added, as required.

DIM_DATE (tMySqlOutput)

Creates the database table DIM_DATE and populates it with the generated data.

  • Set Use an existing connection to true
  • Set Action on table to Drop table if exists and create

These parameters may be found on tMySqlOutput->Basic settings.

DIM_TIME

The components used for the population of DIM_TIME behave in a similar manner to those used in the population of DIM_DATE.

DeInit (tJava)

This is simply a place-holder and records the end of the Job.

Conclusion

We have now created the first two core tables for our Data Warehouse. We can test these using a MySQL query tool, such as MySQL Workbench.

select THE_DATE, DAY_OF_WEEK, MONTH, YEAR
from DIM_DATE
where IS_TODAY = true;

THE_DATEDAY_OF_WEEKMONTHYEAR
2013-11-14 00:00:00ThursdayNovember2013




Expert Consultancy from Yellow Pelican
comments powered by Disqus

© www.TalendByExample.com