Expert Consultancy from Yellow Pelican

My First Talend Job

A site about Talend

Start Talend Open Studio for Data Integration

In this tutorial, you'll learn how to create a simple Talend Job. To do this, you're going to create a Job that reads data from one Delimited File, does something with it and then writes the data to a new Delimited File.

The four steps, when creating a Talend Job, are to create a Job, add Components from the Component Palette, Configure/Connect the components and then to run the Job.

Create an Input Data File

To complete this tutorial, you will need a data file that can act as an input to your Job. Create a new directory to hold your exmple files and then use a Text Editor to create a file named Employee.csv. You can now Cut-and-Paste the values from the table below.

Id;Name;Department;StartDate;Salary
1;John Smith;Sales;01-Mar-2012;20000
2;Jane Roberts;Sales;05-Jun-2010;22500
3;Paul Jones;Marketing;22-Jun-2010;30000
4;Mark Jacobs;Sales;05-Apr-2010;22000
5;Mary Stephenson;IT;13-Sep-2009;22500
6;Steve Roberts;IT;14-Dec-2011;18000
7;Margaret Johnson;CEO;30-Feb-2011;41000
8;Paul Harrison;Sales;07-Nov-2009;19500
9;Peter Pedlar;Admin;22-Dec-2012;21000
10;Douglas Peterson;Admin;28-May-2011;21500

Create a new Job

To create a new Job, right-click Job Designs, which is found in the Repository pane and then select Create Job from the drop-down list. This will open the New Job dialog. For now, you'll simply enter a Job name and then select Finish. It's good practice to complete Purpose and Description; but we'll talk about completing the remaining settings in a later example.

My First Talend Job Image 1

Your new Job will now open up as a blank canvas in the design Window. Note. The Talend design tool is based on Eclipse. If you'd like to know more about Eclipse, then visit the Eclipse website where you'll find information and tutorials that will help you to use and understand Eclipse and the Talend design tool.

Add Components

You'll now add three components to your Job, by selecting them from the Component Palette. You can see from the screenshot below, that we've searched for the tFileInputDelimited component. You can also browse through the component groups; which is a great way to see what's available in Talend. You can add this component to your Job by simply dragging it to the design window. When you've done this, also add tMap and tFileOutputDelimited components to your Job.

My First Talend Job Image 2

Configure and Connect Components

Now that you've added three components, your Job should look like the one below. You can now start configuring and connection your components.

  • The Asterisk by the Job Name indicates that the Job has not been saved.
  • You can save your Job at any time. It will be automatically saved when you run it.
  • You can see both a Design and a Code tab.
  • Talend generates Java code. This is what you'll see in the Code tab.
  • You can't modify the Java code; but it's an invaluable resource for debugging and understanding your Job.

My First Talend Job Image 3

Change Component Labels

You'll see that Talend has automaticaly named your components, using the component's Class Type, followed by an underscore and a sequential number. As well as this being the component's name, this is also the label that you will see on the screen. I'd strongly recommend that you always change the label names to something more useful. It is a key aspect of any Talend Job that it is readable and well documented.

To change the component labels, click on a label to select it and then click a second time to enter edit-mode (note that this is not a double-click, so take your time!).

Change your component labels to "EmployeeInputFile", "MapEmployee" and "EmployeeOutputFile". Note that you are only changing the labels. The Components will still be referred to in your Job by the original names that were issued by Talend. Component labels do not need to be unique within your Job.

At the bottom of the design window, you'll see a series of tabs for Contexts, Component, Run and Problems. By default, the Problems tab is selected. You'll see one error. Talend reports this where tFileInputDelimited has no output connector, with the other components just reporting warnings for lack of output connectors. You'll also notice small error/warning icons next to the components. Hover your mouse over these, to see an error message.

Configure EmployeeInputFile

Select the EmployeeInputFile (tFileInputDelimited_1) component and then select the Component tab, as shown in the screenshot below. Note that We've referred to this component by both the Label and Component Name, as displayed in the Component tab. In future, we'll simply refer to the Label; but it's important to remember how the Component is referred to, programatically.

My First Talend Job Image 4

There are numerous setting that you can make here; but for now, we'll only review File name/Stream and Header.

In the screenshot shown above, File name/Stream is set to the default value, with a file named in.csv located in the workspace directory. workspace is the directory where Talend holds your projects and source code and is probably not the best place for locating your data files. Enter the fully path to the Employee.csv file that you created earlier.

You have a header row in your input file, so you will need to supress this. Talend allows us to do this by specifying the number of header rows, using the field Header. Set this to a value of 1.

You Job should now look something like the screenshot below.

My First Talend Job Image 5

Define Schema for EmployeeInputFile

In a later tutorial, you'll be looking at the Talend Repository where you can define files and use Talend's file sampling to simplify the definition of Files (and other data sources). For now, you'll do things manually so that you get a better understanding of defining a Schema.

Select EmployeeInputFile and then select the Component settings tab. Select the elipses ... to the right of Edit schema and complete the dialog, as shown below.

Schemas are the definition of your data and how data moves around your Job.

  • Use the add button to add a Column.
  • Use the delete button to delete a Column.
  • Set the Column name, Type, Length and Precision.
  • For Id, check Key and un-check Nullable.
  • For StartDate, set the Date Pattern. Note that you may press Ctrl+Space on the Date Pattern field to see a selection of values, or you can simply enter a valid value.

Note. You may have noticed that, when you selected the Type for the column Id, the option in the drop-down list was int | Integer. Try toggling the check-box Nullable and you'll see that Type swaps between int and Integer. Remember that the code generated by Talend is Java and by changing the Nullable setting, Talend is swapping the type between the primitive int and the Class Integer. i.e. You can't represent null in a primitive type.

When you've completed this dialog, press OK.

My First Talend Job Image 7

Connect EmployeeInputFile to MapEmployee

You're now ready to connect your first two components. Right-click on EmployeeInputFile, select Row and then select Main. You will now see a line drawn from the selected component to the cursor. Move the cursor to the MapEmployee component and select it. You will notice that the cursor image changes to a small plug symbol and also includes a stop symbol when the cursor is over an area or component that you may not connect to.

My First Talend Job Image 6

Now that you've connected you're first two components, your Job should look like the screenshot below. You can now take a moment to look at what's happened to your Job.

  • The small red cross by EmployeeInputFile has now disappeared. This is because the component now has an output connector.
  • Talend has now created a row connection, named row1 (Main).
  • You'll also see that the pale background colour of EmployeeInputFile and MapEmployee has now grouped them together. This is because they have formed a SubJob.

My First Talend Job Image 8

Configure and Connect EmployeeOutputFile

You can now configure the EmployeeOutputFile component in a similar way to EmployeeInputFile. Select its Component properties tab and set the value of File name/Stream to MappedEmployee.csv (in the same directory that you have used previously). You can then connect it to MapEmployee by connecting MappedEmployee's output to EmployeeOutputFile. You'll notice that when you make the connection from MapEmployee to EmployeeOutputFile, Talend will ask you to name the output. Call it MappedEmployee.

My First Talend Job Image 9

Naming Outputs

You'll see that you now have all of your components connected and that they all belong to the same SubJob. You have two outputs named row1 (Main) and MappedEmployee (Main). Talend forces you to name the output for > (This is because tMap allows you to specify multiple outputs); but it doesn't do this for all components. It is good practice to give all of your outputs meaningful names.

In a similar way to how you changed your component names, change the name of output row1 to EmployeeInput.

When you previously changed the names of your components, you were only changing their labels; their names remained the same. Outout connectors behave differently. You are not only changing the label, you're also changing the name that you will use to refer to the component programatically. This name must be unique within your Job.

You should now have a Job that looks like the screenshot below. You'll notice that the EmployeeInput outout remains selected and that you can see the schema from tFileInputDelimited_1 (EmployeeInputFile).

My First Talend Job Image 10

Configure MapEmployee

You can now configure the MapEmpolyee component. Select its Component properties tab and then select the elipses ... to the right of Map Editor. Alternatively, you can double-click the component to access the Map Editor.

For the purposes of this tutorial, you are going to make the structure of your output file identical to that of your input file and we're going to use Talend to propogate the EmployeeInputFile schema throughout the Job. You'll remember that, when you configured EmployeeOutputFile, you did not edit its schema.

My First Talend Job Image 11

Talend allows you to copy & paste columns from one schema to another. Select all of the Columns from the EmployeeInput schema and use the Copy & Paste Buttons buttons to copy the EmployeeInput schema and to paste it to the MappedEmployee schema. You can multi-select columns by selecting the empty left-most column in the schema and using the usual multi-select features of your mouse or cursor keys.

Once you have successfully copied the input schema to the output schema, press Auto map! and Talend will automatically map Expressions. Auto map! automatically maps empty Expressions where an output Column name matches an input Column name. Auto map! is located above the MappedEmployee Expression editor.

Mapping Expressions is how you transform the your data and you'll try an example of this later in this tutorial. For now, your Job will simply move data from input to output without any transformation.

Your Map Editor should now look the same as the screenshot below.

My First Talend Job Image 13

Now that you've completed the Map Editor, you may now press the Ok button to save your changes. When you select Ok, Talend will ask you if you want to propogate your changes. We haven't yet defined a schema for EmployeeOutputFile, so press the Yes button. Talend will now copy the output schema from MapEmployees to EmployeeOutputFile.

My First Talend Job Image 14

Run Your Job

To run your Job, select the Run (Job MyFirstTalendJob) tab and press the Run button.

Hopefully, you'll recieve no errors and the results will look like the screenshot below. If you do recieve errors, then review this tutorial to identify the problem. Errors may be either Compile or Runtime errors.

My First Talend Job Image 15

You can see from the screenshot above that Talend shows the number of rows that pass through each connector, together with the throughput (rows per second). You can also see the Job's output shown in the Run tab. This window shows you output written to System.out and System.err.

You can view the output file MappedEmployee.csv in a Text Editor. You should observe that the file is identical to Employee.csv except that the header row has been dropped.

Add a Mapping Expression

You'll now add a new Mapping Expression to your Job. So far, your Job copies data from input to output without modification.

My First Talend Job Image 16

Select the MapEmployee component and load the Map Editor. Select the Expression for the Column Name and select the elipses ... to the right of the current value EmployeeInput.Name. This will load the Expression Editor.

You'll now edit the current expression and change the value of Name to uppercase. Change the expression to EmployeeInput.Name.toUpperCase(). As you type, you should notice that Talend pops-up a window showing the methods available for the current Object Class (Remember that you are entering Java code).

My First Talend Job Image 17

When you've completed the new expression, press Ok to close the Expression Editor and then press Ok to close the Map Editor.

Re-run your Job.

Review

You've created a simple Talend Job that reads data from a file, makes a simple modification and then writes the output to a new file. This tutorial has tried to not only to show you the mechanics of creating this simple Job; but also provide an insight in to how Talend works. This is far from being a production-ready Job; but should provide you with a good foundation to start building your Talend Jobs. Please view our other tutorials for more on Talend, its Components and processes.

You can view the output file MappedEmployee.csv in a Text Editor. You should observe that the file is identical to the previous file except that Name has now been uppercased.

1;JOHN SMITH;Sales;01-Mar-2012;20000.00
2;JANE ROBERTS;Sales;05-Jun-2010;22500.00
3;PAUL JONES;Marketing;22-Jun-2010;30000.00
4;MARK JACOBS;Sales;05-Apr-2010;22000.00
5;MARY STEPHENSON;IT;13-Sep-2009;22500.00
6;STEVE ROBERTS;IT;14-Dec-2011;18000.00
7;MARGARET JOHNSON;CEO;02-Mar-2011;41000.00
8;PAUL HARRISON;Sales;07-Nov-2009;19500.00
9;PETER PEDLAR;Admin;22-Dec-2012;21000.00
10;DOUGLAS PETERSON;Admin;28-May-2011;21500.00




Expert Consultancy from Yellow Pelican
comments powered by Disqus

© www.TalendByExample.com