Expert Consultancy from Yellow Pelican

Stock Market Analysis Project - Get Stock Quotes

A site about Talend

Get Stock Quotes

This Job is part of the Stock Market Analysis Project series of articles. This Job provides core data for this project, and will be used to populate our Operational Data Store. All of the data that we load in to our Data Warehouse will pass through this Operational Data Store.

Download GetQuotes

You may download GetQuotes and import (Job Designs->Import items) in to a Talend Project of your choice. This download is supported by Talend 5.2.2 and above.

Download

GetQuotes

This Job will query Yahoo! Finance to fetch data on Stocks. This process is core to the maintenance of our Data Warehouse.

Job Configuration

This Job requires two configuration files to be created. If you haven't already done so, read the articles Project Configuration and Database Configuration now.

Job Design

The following sections describe the functionality of this Job.

Context Loading

Operation of this Job will be controlled through configuration parameters that are loaded from the files described in the section Job Configuration, above. These will assign the Context Variables that are described in the StockMarket Context Group.


All Jobs within this project will use a central configuration file and we will name this file when Context loading, as shown below. For more information on Context loading, read the article on the reusable library Job LibContextReader.



Archiving Previous Dataset

The data that that this Job retrieves from Yahoo! Finance is, initially, written to a CSV file. It is this data that is then processed, for loading to our MySQL Database.

The first task that our Job will perform is to archive any previously created file. Not only does this preserve our previous raw data, it also provides a simple indicator for determining if this Job has previously run. We can make this detection more sophisticated at a later time, should the need rise. As well as archiving the file, we will set the indicator globalMap.put("prevFileArchived", true); which will be placed on globalMap.



Establish MySQL Database Connection

Connection to our MySQL database will be established using the reusable library Job LibMySQLSharedConnection.



Full or Fast Refresh

In the initial release of this Job, we'll provide a simple mechanism for determining if we will perform a full or fast refresh. We also allow the Job to make a smart decision, based on whether or not this Job has previously executed (see Archiving Previous Dataset, above). As we build this application, we will look to make this functionality more sophisticated; however, for now, this basic functionality will suffice. The settings for the refresh type is determined by the parameter getQuotesFastRefresh=smart, defined in the file Default.StockMarket.cfg. Read the article Project Configuration, for more information.

Full Refresh

On a Full Refresh Candidate Ticker Symbol will be generated by the reusable library Job LibMakeTickerSymbols. Quotes will be requested for all of these Candidate Ticker Symbol.

Fast Refresh

On a Fast Refresh, Quotes will be requested for all Ticker Symbol that exist in the database table STOCK. This is the master table for holding details of Stocks, in our Operational Data Store and is maintained by this Job i.e. if there are entries in this table, this Job has been previously executed. If no rows are found in this table, then no quotations will be requested.

Auto Refresh

If Auto Refresh is set to true (the default), then Job will determine the refresh type based on the value returned by (Boolean) globalMap.get("prevFileArchived"). If this value is true, then it will be assumed that rows exist in the database table STOCK.

Buffering Symbols

As can be seen from the screenshot below, we've made our decision on whether we will generate candidate symbols or read symbols from our database table STOCK. We now generate or read our symbols and then buffer them using the component tBufferOutput. This is a convenient way for generating row data from either of our two decision branches, making the data available for later processing using the companion component tBufferInput.



Getting Stock Quotations

Stock Quotations are available from Yahoo! Finance using the API call http://finance.yahoo.com/d/quotes.csv?s=. When this call is made, up to 200 Stock Ticker Symbol may be requested within a single call, together with a format parameter, for example, &f=snab. An example call might look something like http://finance.yahoo.com/d/quotes.csv?s=TSCO.L,ARM.L&f=snab. You can try entering this URL in to a Web Browser to see the results.

Quotation Format

When requesting quotation from Yahoo! Finance, a format parameter &f= must be specified. This parameter allows you to specify the values that should be returned by the query.

FlagValueComment
sSymbolThe Stock Ticker Symbol
nNameThe Stock Name
aAskThe Ask Price
sBidThe Bin Price

We will build on the requested values in a later article.

Pivoting Symbols

Yahoo! Finance allows up to 200 symbols to be passed in a single request. For speed of request, we will pivot put input symbols using a the component tDenormalize to that we can minimise the number of requests that we make.

Storing Query Results

Query results are returned in CSV format and, for convenience, we will write these to an file. This is the same file that we archived at the start of this Job (see Archiving Previous Dataset).

Record Quotation Date & Time

Quotes from Yahoo! Finance are usually delayed by 20 minutes. In this first version of this Job, we'll record the date & time following our last request. We will use this value in combination with the Stock Ticker Symbol as the Primary Key to our database table QUOTE, indicting the approximate time that the data was valid. We will build on this in a later article.

Reviewing our Quotation Data

We should now have all of our Quotation data saved in the file context.outputDir + "getQuotes.csv". It is now worth taking the opportunity to see what we have.

In our simple example call http://finance.yahoo.com/d/quotes.csv?s=TSCO.L,ARM.L&f=snab, described in the section Getting Stock Quotations, above, we may expect to see a file with content similar to that shown below.

"TSCO.L","TESCO PLC",246.15,245.40
"ARM.L","ARM HOLDINGS",905.0001,901.3999

In testing, some anomalies have also been detected and these need dealing with in our data quality mapping rules.

MapQuoteData (tMap_3)

For some of our Stock Ticker Symbol data may not be available, especially if we are requesting quotations for candidate symbols. In these instances, we will immediately discard this data using the Expression Filter, as can be seen below.

QuoteData.symbolId != null &&
! "".equals(QuoteData.symbolId) &&
QuoteData.name != null &&
! "".equals(QuoteData.name) &&
! QuoteData.symbolId.equals(QuoteData.name)

For both Ask and Ask, we may see some anomolies in the data, for example, we may see the value N/A. For our purposes, we will simply may this value to 0 as can be seen from the example mapping shown below.

"N/A".equals(QuoteData.ask) ? new BigDecimal(0) : new BigDecimal(QuoteData.ask)

Loading Quotation Data

As can be seen from the screenshot below, our cleansed data can now be loaded in to our Operational Data Store database.


Review

We now have a Talend Job that will request quote data from Yahoo! Finance, writing the data to two MySQL tables, STOCK and QUOTE. These tables are part of our Operational Data Store.

Each tile we run this Job, it will update the table STOCK and write a new set of data to the table QUOTE.

You may query the two tables using a tool such as MySQL Workbench.

select * from STOCK where SYMBOL = 'AFN.L';

SYMBOLSTOCK_NAME
AFN.LADVFN

select * from QUOTE where SYMBOL = 'AFN.L';

SYMBOLQUOTE_TIMESTAMPASKBID
AFN.L2013-12-02 07:17:572.45302.4510

Scheduling our Job

Now that we've completed our Job and executed it from within Talend Studio, we will now look at our option for scheduling the Job.

The Stock prices available from Yahoo! Finance are delayed by 20 minutes and the London Stock Exchange (LSE) usually opens at 8 AM, closing at 5:30 PM. This provides us with a sensible scheduling option.

In this example, we are going to execute our Job using a Unix-style Shell Launcher. If you are not already familiar with scheduling Jobs in this way, read the articles Talend Job Deployment and Job Shell Launchers, now.

We will schedule our Job using the cron job scheduler.

A Brief Introduction to Cron

To schedule a program using cron, you need to create a crontab file. A crontab file is maintained using the program crontab. For further information on scheduling Talend Jobs using cron, read our article Scheduling Talend Jobs using cron.

crontab File

The following shows an example crontab file. This file is maintained using the command crontab -e.

# TalendByExample.com - Stock Market Analysis Project
21,41	8	*	*	1,2,3,4,5 $HOME/talend/bin/runTalendJob StockMarket GetQuotes
1,21,41	7,9,10,11,12,13,14,15,16	*	*	1,2,3,4,5 $HOME/talend/bin/runTalendJob StockMarket GetQuotes
1,21,51	16	*	*	1,2,3,4,5 $HOME/talend/bin/runTalendJob StockMarket GetQuotes

A Simple Job Control Script

When we exported our Job, Talend created a launch script, for example, GetQuotes_run.sh. In some cases, simply executing this script is sufficient for running our Job. Often, we want more control over the Job we are executing.

As can be seen from the crontab file, above, we are using a single Shell Script to execute and control our Jobs. This is a very basic script and we will enhance this as we work through this project. The current version of this script is shown below.

#!/bin/sh

if [ ${#} -ne 2 ]; then
        echo "runTalendJob: usage: project_folder job_name"
        exit 2
fi
echo "`date`: Starting Job ${1}.${2}" 1>> ${HOME}/talend/${1}/${2}.log
${HOME}/talend/${1}/Jobs/${2}/${2}_run.sh 1>> ${HOME}/talend/${1}/${2}.log 2>&1
JobExitStatus=${?}

if [ ${JobExitStatus} -eq 0 ]; then
        echo "`date`: Job ${1}.${2} ended successfully" 1>> ${HOME}/talend/${1}/${2}.log
else
        echo "`date`: Job ${1}.${2} ended in error" 1>> ${HOME}/talend/${1}/${2}.log
fi
exit ${JobExitStatus}

This script assumes that our exported Job has been extracted to the directory $HOME/talend/StockMarket/Jobs (where $HOME represents our Home directory). For more information on exporting Jobs, read the articles Talend Job Deployment and Job Shell Launchers, now.

Next Step

Now that we've got a couple of basic tables in our Operational Data Store, we can look at Creating our Data Warehouse.

Revision History

RevisionDescription
0.1Initial Release, as documented in this article.
0.2Added exception handler to tHTTPRequest calls, allowing retry on failure (max attmepts 3).




Expert Consultancy from Yellow Pelican
comments powered by Disqus

© www.TalendByExample.com