A site about Talend
In this tutorial, you'll learn how to create a simple Talend Job. To do this, you're going to create a Job that reads data from one Delimited File, does something with it and then writes the data to a new Delimited File.
The four steps, when creating a Talend Job, are to create a Job, add Components from the Component Palette, Configure/Connect the components and then to run the Job.
To complete this tutorial, you will need a data file that can act as an input to your Job. Create a new directory to hold your exmple files and then use a Text Editor to create a file named Employee.csv. You can now Cut-and-Paste the values from the table below.
Id;Name;Department;StartDate;Salary 1;John Smith;Sales;01-Mar-2012;20000 2;Jane Roberts;Sales;05-Jun-2010;22500 3;Paul Jones;Marketing;22-Jun-2010;30000 4;Mark Jacobs;Sales;05-Apr-2010;22000 5;Mary Stephenson;IT;13-Sep-2009;22500 6;Steve Roberts;IT;14-Dec-2011;18000 7;Margaret Johnson;CEO;30-Feb-2011;41000 8;Paul Harrison;Sales;07-Nov-2009;19500 9;Peter Pedlar;Admin;22-Dec-2012;21000 10;Douglas Peterson;Admin;28-May-2011;21500
To create a new Job, right-click Job Designs, which is found in the Repository pane and then select Create Job from the drop-down list. This will open the New Job dialog. For now, you'll simply enter a Job name and then select Finish. It's good practice to complete Purpose and Description; but we'll talk about completing the remaining settings in a later example.
Your new Job will now open up as a blank canvas in the design Window. Note. The Talend design tool is based on Eclipse. If you'd like to know more about Eclipse, then visit the Eclipse website where you'll find information and tutorials that will help you to use and understand Eclipse and the Talend design tool.
You'll now add three components to your Job, by selecting them from the Component Palette. You can see from the screenshot below, that we've searched for the tFileInputDelimited component. You can also browse through the component groups; which is a great way to see what's available in Talend. You can add this component to your Job by simply dragging it to the design window. When you've done this, also add tMap and tFileOutputDelimited components to your Job.
Now that you've added three components, your Job should look like the one below. You can now start configuring and connection your components.
You'll see that Talend has automaticaly named your components, using the component's Class Type, followed by an underscore and a sequential number. As well as this being the component's name, this is also the label that you will see on the screen. I'd strongly recommend that you always change the label names to something more useful. It is a key aspect of any Talend Job that it is readable and well documented.
To change the component labels, click on a label to select it and then click a second time to enter edit-mode (note that this is not a double-click, so take your time!).
Change your component labels to "EmployeeInputFile", "MapEmployee" and "EmployeeOutputFile". Note that you are only changing the labels. The Components will still be referred to in your Job by the original names that were issued by Talend. Component labels do not need to be unique within your Job.
At the bottom of the design window, you'll see a series of tabs for Contexts, Component, Run and Problems. By default, the Problems tab is selected. You'll see one error. Talend reports this where tFileInputDelimited has no output connector, with the other components just reporting warnings for lack of output connectors. You'll also notice small error/warning icons next to the components. Hover your mouse over these, to see an error message.
Select the EmployeeInputFile (tFileInputDelimited_1) component and then select the Component tab, as shown in the screenshot below. Note that We've referred to this component by both the Label and Component Name, as displayed in the Component tab. In future, we'll simply refer to the Label; but it's important to remember how the Component is referred to, programatically.
There are numerous setting that you can make here; but for now, we'll only review File name/Stream and Header.
In the screenshot shown above, File name/Stream is set to the default value, with a file named in.csv located in the workspace directory. workspace is the directory where Talend holds your projects and source code and is probably not the best place for locating your data files. Enter the fully path to the Employee.csv file that you created earlier.
You have a header row in your input file, so you will need to supress this. Talend allows us to do this by specifying the number of header rows, using the field Header. Set this to a value of 1.You Job should now look something like the screenshot below.
In a later tutorial, you'll be looking at the Talend Repository where you can define files and use Talend's file sampling to simplify the definition of Files (and other data sources). For now, you'll do things manually so that you get a better understanding of defining a Schema.
Select EmployeeInputFile and then select the Component settings tab. Select the elipses...
to the right of Edit schema and complete the dialog, as shown below.
Schemas are the definition of your data and how data moves around your Job.
Note. You may have noticed that, when you selected the Type for the column Id, the option in the drop-down list was int | Integer. Try toggling the check-box Nullable and you'll see that Type swaps between int and Integer. Remember that the code generated by Talend is Java and by changing the Nullable setting, Talend is swapping the type between the primitive int and the Class Integer. i.e. You can't represent null in a primitive type.
When you've completed this dialog, press OK.
You're now ready to connect your first two components. Right-click on EmployeeInputFile, select Row and then select Main. You will now see a line drawn from the selected component to the cursor. Move the cursor to the MapEmployee component and select it. You will notice that the cursor image changes to a small plug symbol and also includes a stop symbol when the cursor is over an area or component that you may not connect to.
Now that you've connected you're first two components, your Job should look like the screenshot below. You can now take a moment to look at what's happened to your Job.
You can now configure the EmployeeOutputFile component in a similar way to EmployeeInputFile. Select its Component properties tab and set the value of File name/Stream to MappedEmployee.csv (in the same directory that you have used previously). You can then connect it to MapEmployee by connecting MappedEmployee's output to EmployeeOutputFile. You'll notice that when you make the connection from MapEmployee to EmployeeOutputFile, Talend will ask you to name the output. Call it MappedEmployee.
You'll see that you now have all of your components connected and that they all belong to the same SubJob. You have two outputs named row1 (Main) and MappedEmployee (Main). Talend forces you to name the output for > (This is because tMap allows you to specify multiple outputs); but it doesn't do this for all components. It is good practice to give all of your outputs meaningful names.
In a similar way to how you changed your component names, change the name of output row1 to EmployeeInput.
When you previously changed the names of your components, you were only changing their labels; their names remained the same. Outout connectors behave differently. You are not only changing the label, you're also changing the name that you will use to refer to the component programatically. This name must be unique within your Job.
You should now have a Job that looks like the screenshot below. You'll notice that the EmployeeInput outout remains selected and that you can see the schema from tFileInputDelimited_1 (EmployeeInputFile).
You can now configure the MapEmpolyee component. Select its Component properties tab and then select the elipses ...
to the right of Map Editor. Alternatively, you can double-click the component to access the Map Editor.
For the purposes of this tutorial, you are going to make the structure of your output file identical to that of your input file and we're going to use Talend to propogate the EmployeeInputFile schema throughout the Job. You'll remember that, when you configured EmployeeOutputFile, you did not edit its schema.
Talend allows you to copy & paste columns from one schema to another. Select all of the Columns from the EmployeeInput schema and use the buttons to copy the EmployeeInput schema and to paste it to the MappedEmployee schema. You can multi-select columns by selecting the empty left-most column in the schema and using the usual multi-select features of your mouse or cursor keys.
Once you have successfully copied the input schema to the output schema, press Auto map! and Talend will automatically map Expressions. Auto map! automatically maps empty Expressions where an output Column name matches an input Column name. Auto map! is located above the MappedEmployee Expression editor.
Mapping Expressions is how you transform the your data and you'll try an example of this later in this tutorial. For now, your Job will simply move data from input to output without any transformation.
Your Map Editor should now look the same as the screenshot below.
Now that you've completed the Map Editor, you may now press the Ok button to save your changes. When you select Ok, Talend will ask you if you want to propogate your changes. We haven't yet defined a schema for EmployeeOutputFile, so press the Yes button. Talend will now copy the output schema from MapEmployees to EmployeeOutputFile.
To run your Job, select the Run (Job MyFirstTalendJob) tab and press the Run button.
Hopefully, you'll recieve no errors and the results will look like the screenshot below. If you do recieve errors, then review this tutorial to identify the problem. Errors may be either Compile or Runtime errors.
You can see from the screenshot above that Talend shows the number of rows that pass through each connector, together with the throughput (rows per second). You can also see the Job's output shown in the Run tab. This window shows you output written to System.out and System.err.
You can view the output file MappedEmployee.csv in a Text Editor. You should observe that the file is identical to Employee.csv except that the header row has been dropped.
You'll now add a new Mapping Expression to your Job. So far, your Job copies data from input to output without modification.
Select the MapEmployee component and load the Map Editor. Select the Expression for the Column Name and select the elipses ...
to the right of the current value EmployeeInput.Name. This will load the Expression Editor.
You'll now edit the current expression and change the value of Name to uppercase. Change the expression to EmployeeInput.Name.toUpperCase()
. As you type, you should notice that Talend pops-up a window showing the methods available for the current Object Class (Remember that you are entering Java code).
When you've completed the new expression, press Ok to close the Expression Editor and then press Ok to close the Map Editor.
Re-run your Job.
You've created a simple Talend Job that reads data from a file, makes a simple modification and then writes the output to a new file. This tutorial has tried to not only to show you the mechanics of creating this simple Job; but also provide an insight in to how Talend works. This is far from being a production-ready Job; but should provide you with a good foundation to start building your Talend Jobs. Please view our other tutorials for more on Talend, its Components and processes.
You can view the output file MappedEmployee.csv in a Text Editor. You should observe that the file is identical to the previous file except that Name has now been uppercased.
1;JOHN SMITH;Sales;01-Mar-2012;20000.00 2;JANE ROBERTS;Sales;05-Jun-2010;22500.00 3;PAUL JONES;Marketing;22-Jun-2010;30000.00 4;MARK JACOBS;Sales;05-Apr-2010;22000.00 5;MARY STEPHENSON;IT;13-Sep-2009;22500.00 6;STEVE ROBERTS;IT;14-Dec-2011;18000.00 7;MARGARET JOHNSON;CEO;02-Mar-2011;41000.00 8;PAUL HARRISON;Sales;07-Nov-2009;19500.00 9;PETER PEDLAR;Admin;22-Dec-2012;21000.00 10;DOUGLAS PETERSON;Admin;28-May-2011;21500.00