Expert Consultancy from Yellow Pelican

Reusable Context Load Job

A site about Talend

Reusable Context Load Job

This tutorial explains how to create a reusable Job for loading Context. You may call this reusable Job from any of your other Jobs for simple Context loading.

Download LibContextReader

You may Download LibContextReader and import it in to a Talend Project of your choice. This tutorial supports Talend 5.2.0 and above.

Download

In our tutorial on Loading Context from a File we looked at how Context could be read and loaded from a file, employing all of the error handling that we would expect from a well-designed Job.

That's very helpful to give us an understanding of the process and eases the maintenance of our Jobs. We now want to take this a step further and have a reusable Job that we can call from any other Job, and use it to load their Context. This finally addresses any requirements we have for loading context in a reliable and consistent way, with the minimum of programming in each of our task-specific Jobs.

You may like to take the tutorial on Loading Context from a File first, as that tutorial documents the process in a little more detail; however everything you need to know is contained within this tutorial.

Job Design

The following screenshot shows our new Job that will read a Context file and present the results back to a Parent Job.

Talend Reusable Context Load Job Image 1

It's a pretty simple Job that performs the following tasks: -

  • Initialisation
  • Check that a Context File Exists
  • Die if No Context File
  • Read Context File
  • Write Context to a Buffer

Create Job LibContextReader

As this is going to be a reusable Job, we'll add it to our reusable Job library. In the Talend Design Repository, create a new folder under Job Designs, named Lib (or a folder with a name of your choice). We'll create our new Job in this folder, and name it LibContextReader.

Talend Reusable Context Load Job Image 2

Add Context Variables

This Job uses the following context variables. Each of these will be provided with a default value (InitJob), if values are not passed from the parent Job.

  • contextDir String
  • contextName String
  • contextStr String

Add these context variables now.

The Context Variable contextDir is used to identify the location of your Context files. If a value is not passed from the parent Job, then the value will be set to $HOME/talend/context (where $HOME represents the Home Directory of the user running the Job).

The Context Variable contextName is used to identify the name of your context file. If a value is not passed from the parent Job, then the value will be set to General.

The Context Variable contextStr is used to identify the Context string of the parent Job, for example, Default, Test or Production (you may choose to only use a Default Context). If a value is not passed from the parent, then the value will be set to the Context string of LibContextReader Default.

With no parameters being passed, these Context Variables will be used to identify a Context file located at $HOME/talend/context/Default.General.cfg.

Add Components

Add the Component listed below, as shown in the Job Design screenshot.

  • Initialisation (InitJob tJava)
  • Check that Context File Exists (CheckContextFileExists tFileExist)
  • Die if No Context File (DieOnNotContextFileExists tDie)
  • Read Context File (ReadContext tFileInputDelimited)
  • Write Context to a Buffer (BufferContext tBufferOutput)

Connect Components

Connect these Components, as shown in the Job Design screenshot above.

  • InitJob->CheckContextFileExists: Trigger->On Subjob Ok
  • CheckContextFileExists->DieOnNotContextFileExists: Trigger->Run if (rename to NotContextFileExists)
  • CheckContextFileExists->ReadContext: Trigger->Run if (rename to ContextFileExists)
  • ReadContext->BufferContext: Row->Main (rename to Context)

Configure Components

You can now configure the Components of your Job, as shown below.

InitJob (tJava)

// Context Variables

// ContextDir
if(context.contextDir == null ||
   "null".equals(context.contextDir) ||
   "".equals(context.contextDir)) context.contextDir = System.getProperty("user.home") + "/talend/context";

// ContextStr
if(context.contextStr == null ||
   "null".equals(context.contextStr) ||
   "".equals(context.contextStr)) context.contextStr = contextStr;

// ContextName
if(context.contextName == null ||
   "null".equals(context.contextName) ||
   "".equals(context.contextName)) context.contextName = "General";

globalMap.put("contextFile", context.contextDir + "/" + context.contextStr + "." + context.contextName + ".cfg");

System.out.println(jobName + ": context will be loaded from " + (String) globalMap.get("contextFile"));

CheckContextFileExists (tFileExist)

Set File name/Stream to (String) globalMap.get("contextFile").

If (NotContextFileExists) (order:1) (Trigger->Run if)

Set Condition to ! (Boolean) globalMap.get("tFileExist_1_EXISTS").

DieOnNotContextFileExists (tDie)

Set Die message to jobName + ": cannot open file " + (String) globalMap.get("contextFile").

If (ContextFileExists) (order:2) (Trigger->Run if)

Set Condition to (Boolean) globalMap.get("tFileExist_1_EXISTS").

ReadContext (tInputFileDelimited)

Set File name/Stream to (String) globalMap.get("contextFile").
Set Field Separator to "=".
Check the option CSV Options (this allows us to have the = character in our field values, by quoting values).

Define the Component's Schema as shown below. Select Yes when asked to propogate the Schema. This will copy this new Schema to BufferContext.

Talend Reusable Context Load Job Image 3

Run Your Job

You should now be able to run this Job. Assuming that you have not already created a Context file, you should see output similar to that shown below.

Starting job LibContextReader at 18:49 30/05/2013.
[statistics] connecting to socket on port 3498
[statistics] connected
LibContextReader: cannot open file /Users/talend_user/talend/context/Default.General.cfg
LibContextReader: context will be loaded from /Users/talend_user/talend/context/Default.General.cfg
[statistics] disconnected
Job LibContextReader ended at 18:49 30/05/2013. [exit code=4]

Now create a Context file, for example, /Users/talend_user/talend/context/Default.General.cfg and add some Context Variables. It doesn't matter what you add, provided that it is in the correct format.

myString=Hello World!
myNumber=12345
myBoolean=true

Now run your Job again, and see that the Context file is now found and processed. You should see that no errors are returned and the correct number of rows flow through your Job.

Starting job LibContextReader at 18:56 30/05/2013.
[statistics] connecting to socket on port 3519
[statistics] connected
LibContextReader: context will be loaded from /Users/talend_user/talend/context/Default.General.cfg
[statistics] disconnected
Job LibContextReader ended at 18:56 30/05/2013. [exit code=0]

Talend Reusable Context Load Job Image 4

Calling your new Reusable Job

Now that you've now completed your reusable Job, we can look at how it's called from another Job.

You'll now use a fictional MySQL Payroll database to demonstrate how you use your new Context loader. This Job will simply assign some parameters to a tMySQLConnection Component.

Create a new Job named MySQLContextTest.

Talend Reusable Context Load Job Image 5

Add Components

Add the Component listed below, as shown in the Job Design screenshot above.

  • Your new reusable Job (ReadPayrollContext tRunJob(LibContextReader))
  • A Context Loader (LoadPayrollContext tContextLoad)
  • A MySQL Connection (PayrollConnection tMySQLConnection)

Connect Components

Connect these Components, as shown in the Job Design screenshot above.

  • ReadPayrollContext->LoadPayrollContext: Row->Main (rename to PayrollContext)
  • ReadPayrollContext->PayrollConnection: Trigger->On Subjob Ok

Add a Context Group and File

If you're likely to use a Context Variable in more than one Job, it is good practice to create a Context Group rather than adding individual Context Variables to your Job.

Select the Component PayrollConnection (tMySQLConnection), and see which parameters can be passed at runtime.

Talend Reusable Context Load Job Image 6

  • Host
  • Port
  • Database
  • Additional JDBC Parameters
  • Username
  • Password

We'll now create a Context Group named Payroll and an accompanying Context file named $HOME/talend/context/Default.Payroll.cfg.

payrollHost=MySQLServer1
payrollPort=3306
payrollDatabase=payroll_db
payrollAdditionalJDBCParameters="noDatetimeStringSync=true"
payrollUsername=talend
payrollPassword=pa$$word

Note the quotation marks around "noDatetimeStringSync=true" as it contains a Field Separator = character.

When you've created your Context Group, as shown in the screenshot below, add it to your Job by dragging it to the Job's Contexts (Job MySQLContextTest 0.1) tab.

Talend Reusable Context Load Job Image 7

Configure Components

You can now configure the Components of your Job, as shown below.

PayrollConnection (tMySQLConnection)

Now that you've Added the Payroll Context Group to your Job, you can now configure PayrollConnection (tMySQLConnection_1) so that it uses your new Context Variables, as shown in the screenshot below.

Talend Reusable Context Load Job Image 8

ReadPayrollContext (tRunJob)

You need to make two changes to this Component. First of all, you need to tell it to use the Schema that you defined as the output for the Job LibContextReader, as this defines the data that will be sent to LoadPayrollContext. To do this, simply click the button Copy Child Job Schema; which can be found on the Component tab. The copied Schema will be displayed and you can then click OK.

You've created a new Context file $HOME/talend/context/Default.Payroll.cfg rather than using your, previously created, $HOME/talend/context/Default.General.cfg Context file. You now need to tell LibContextReader to read this new file. This is achieved by passing the name as a parameter, as shown in the screenshot below.

Talend Reusable Context Load Job Image 9

LoadPayrollContext (tLoadContext)

For debugging purposes, check the option Print options. This will display the actions of this Component. You may choose to un-check this option when your Job is complete, as this option writes information to Talend log files and you may not want to be writing out sensitive passwords. It is always recommended that Context file permissions are restricted when they contain sensitive information.

Run Your Job

You can now run your new Job. We expect that it will fail as you will not have a MySQL database to connect to; however, as can be seen from the screenshot below, you will see the assignment of your Context Variables.

Talend Reusable Context Load Job Image 10

Conclusion

This tutorial has shown you how to build a reusable Job (LibContextReader) that, with some minor configuration, can be called from any Job to assist it in loading its Context. This reusable Job has the flexibility to allow you to use multiple Context Files and to run your Jobs with multiple Contexts (e.g. Default, Test, Production). The choice is yours. This is a first step in building a Library of reusable Talend Jobs. You may also want to consider using the example Job MySQLContextTest as a Template. I usually have a folder named Job Designs->Templates for storing pre-configured Jobs and Components.

If you have any feedback or suggestions for this tutorial, please feel free to comment.




Expert Consultancy from Yellow Pelican
comments powered by Disqus

© www.TalendByExample.com