A site about Talend
This tutorial provides comprehensive information on the functionality that Talend offers, when working with databases. You may find it helpful to read our article Working with Databases before taking this tutorial; however, it is not a prerequisite.
This tutorial is based on MySQL; but referring to other database vendors if useful. You may find it helpful to download the latest version of MySQL so that you can try some of the techniques that are described here.
You may Download Example Job and import it in to a Talend Project of your choice. This tutorial supports Talend 5.2.2 and above.
The example Job in this tutorial will perform the following tasks: -
This tutorial uses several reusable Jobs that are described on TalendByExample. These Jobs are design to simplify development, allowing you to quickly build robust and maintainable Jobs.
To run the example Job TBEMySQLTutorial, you will need to create the following Context files. These files should be placed in a directory located from your Home directory, i.e.
This file holds general Context variables. Change the value of outputDir so that it identifies a directory where you would like the example Job to write any files that it creates.
# This is a sample Talend Context File, for reading by LibContextReader
This file holds MySQL Context variables that will allow the example Job to connect, create tables and then read/write data. Amend these parameters to suit your environment. I would recommend that you test the MySQL connection using the tools provided by MySQL prior to running the Job. This is the easiest way to eliminate any connectivity issues.
# This is a sample Talend Context File, for connecting to a MySQL Database
These two reusable Jobs generate their respective pseudo-random data. These will provide input for the example Job. For more information on these reusable Jobs, read Talend Data Generation.
If you've already created the Context files
Default.MySQL.cfg, you should be able to run the example Job TBEMySQLTutorial. The following screenshot shows the general design of the Job.
This example Job supports the Context of Default, Test and Production. It is a personal choice as to whether or not you only use the Default Context, or have additional ones for your different environments. You may still find it helpful to add the additional Context to your Jobs, even if you do not think you'll use them, as this gives flexibility later on.
You'll see from the screenshot below that a Context Group named General has been added. This, currently, contains just a single variable outputDir. I have added this to a group as it seems likely that we will want to use this in many of our task-specific Jobs. The variables numberOfAddress and numberOfPerson are very local to this Job, so I've added these as built-in.
The first task, is to load some general context. This tells our Job something about the environment in which it is executing. Context is loaded using the reusable Job LibContextReader. This Job will load Context from the default file
The reusable Job LibMySQLSharedConnection is used to establish a connection with our Database. Although this Job is database vendor-specific, it can be easily adapted to work with the database vendor of your choice. Context for this connection is provided from the default Context file,
$HOME/talend/context/Default.MySQL.cfg. This connection is shared, meaning that you can refer to it from any tMysqlConnection components located in the parent Job, its children and siblings.
The following screenshot shows that we've configured the component tMysqlConnection_1 so that it uses the shared connection
"MySQLSharedConnection", that has been established by the reusable Job LibMySQLSharedConnection.
"MySQLSharedConnection" is the default Shared DB Connection Name that is established by tLibMySQLSharedConnection, and may me overridden by passing the parameter MySQLSharedDBConnectionName. To use the shared connection, check the option Use or register a shared DB Connection and then enter the connection name in the field Shared DB Connection Name.
In a later article, we'll look at the different Transaction Control strategies that we can employ in our Job. For now, we'll set this connection to Auto Commit. This option can be found on the Advanced settings tab of the tMysqlConnection component, as shown in the following screenshot. Transactions will be committed every Number of rows per insert, as specified in the component tMysqlOutput.
Now that we've established a shared connection to our database, we can create some data. To do this, two pseudo-random data generators will be used. In our example, these have been configured so that they will create 500 Address rows and 2000 Person rows. Each Person row will be linked to a random Address. Read Talend Data Generation, for more information on these data generators.
When adding child Jobs that are capable of returning data to their parent (using the tRunJob component), it is important to remember to hit Copy Child Job Schema. This gives the tRunJob component the same Schema as the child Job. This button can be found on the Basic settings tab.
The following screenshot shows the configuration of one of the two tMysqlOutput components. Both components have been configured in a similar manner.
Checking this option makes the Component List drop-down available. Here, you can select the tMysqlConnection component that is to provide the database connection.
This option allows you to specify any Data Definition Language (DDL) operation that should be performed against Table. This example will create a new table each time the Job runs (dropping it first if it already exists) and, therefore, this option has been set to Drop table if exists and create.
This option allows you to specify the Data Manipulation Language (DML) operation that should be performed. As this Job creates a new table each time it runs, this option is set to Insert as we want to only create new rows.
We would usually expect the Schema to have been propagated to this component, when it was connected to the preceding component. If this has not happened, hit the Sync columns button. This will always copy the Schema from the preceding component.
Note. This option is dependent on both database vendor and Action on data. Please check the options available for your database vendor and the action that you are performing. For more information, read t[DB]Output Reference.
As can be seen from the screenshot below, Number of rows per insert can be found on the Advanced settings tab (when you check the option Extend Insert) and has a default value of
100. This value specifies the number of rows that will be inserted before a transaction is Committed. This value works in conjunction with Auto Commit, that we specified for our tMysqlConnection component.
Now that we've created some data in our MySQL database, we can now read it back. As can be seen from the screenshot below, we'll read both our Person and Address tables, join them using a tMap component, and then write the combined output to a file using a tFileOutputDelimited component. I have chosen to use a tMap component over a tJoin (Comparison to tMap), as I find no use case for tJoin.
The following screenshot shows the configuration of one of the two tMysqlInput components. Both components have been configured in a similar manner.
In this example, Schema, Table Name, Query Type and Query were pre-configured. This is because this component was added from the Talend DB Connections Repository. This is the best way of adding tMysqlInput components to your Job. You can modify these once they have been added (including setting Use an existing connection). For more information on the Talend DB Connections Repository, read our article Working with Databases.
The following screenshot shows the Mapping Editor of the tMap component that joins our two MySQL data sources. The mapping of the data is fairly basic. The entire Person and Address rows are written, less the two Address keys
Person.AddressId. We are dropping these keys as we have now Normalized our data.
You will see that Address is the Lookup of our two inputs and that we've instructed our tMap component to perform a look-up based on the relationship between the two tables, based on
Finally, we'll write our output to a delimited file using a tFileOutputDelimited component. For this example, we've left all of the default values except for File Name. Here, we've set the value to
context.outputDir + "/PersonAddress.txt". This allows us to locate the file in a directory of our choice, as specified in the Context file
In this tutorial, we have used some reusable Jobs that will help to make our Talend development easier. Use these in your own Jobs, or use the techniques employed, to help your understanding of Talend. Do avoid embedding database connection details inside individual components.
We've used some fairly basic functionality to read and write to a MySQL database. In later tutorials, we'll build on these techniques.comments powered by Disqus