A site about Talend
If you're using Talend, then it's fairly likely that at some time soon, you'll be connecting to a Relational Database, either reading data, writing data, or both.
This series of articles will help you to navigate the numerous database components and repository metadata so that you can make the most out of Talend's functionality and have simply yet reliable database connectivity. Our examples will be based on MySQL; but referring to other database vendors when appropriate. You may find it helpful to download the latest version of MySQL so that you can try some of the techniques described here.
Whichever database vendor you are chosing to work with, you may find it helpful to test your connections to the database with the tools supplied by the vendor, before working with Talend. This should help to isolate any connectivity issues, before introducing Talend.
I have written some short notes on Installing & Configuring MySQL Server. You may find this helpful, especially if you would like to complete our tutorials; however, there are plenty of tutorials available for MySQL.
As well has supporting a number of database vendors such as MySQL and Oracle, Talend also provides some generic database components, to help provide connectivity to unsupported vendors.
The Talend repository supports database metadata. This allows you to define connections to databases and then import schema objects such as tables. Setting up this metadata allows you to test your database connectivity from within Talend, as well as making it convenient to work with databases within your Jobs.
To create a new database connection within the repository, right-click Repository->Metadata->Db Connections and select the option Create connection. This will display the Database Connection dialog.
Enter a Name, Purpose and Description for your database connection. Use a Name that suggests its purpose, for example, DevelopmentAccounts indicating that this connection is for the development instance of your Accounts database. Depending on your own working environment, I would recommend always establishing these connections to a development or test database and never to production. Be consistent in the databases that you connect to and the naming conventions you use. This avoids mishaps later.
When you've completed step 1 of this dialog, press Next, as shown in the screenshot below.
Step 2 of the Database Connection dialog allows you to enter the connection parameters of the database that you want to connect to.
Select the DB Type that you want to connect to. For some database vendors, there may be multiple types types that you may select, for example, Oracle connections are supported using OCI, RAC, SID and Service Name. Once you have selected a DB type, the remainder of the dialog will be preconfigured for the database vendor and connection type that you selected. As shown below, for MySQL, we have some basic setings that we need to make for a connection to a MySQL 5 database.
When you have completed this page of the dialog, you can Check your connection and then hit Finish.
Now that your repository database connection has been established, right-click your Database Connection and select Retrieve Schema from the pop-up menu. This will display the Schema dialog. Step 1 of this dialog allows you to set your Filter Conditions and your Selection Types.
This step will only specify the filtering rules for the objects to import. Once these options have been completed, hit Next, to select the objects that you would like to import.
This option allows you to filter based on the object name. Enter your criteria in Set the Name Filter The default value is %
, which means that all objects will be selected.
This option allows you to write your own SQL filter, by entering an SQL query in Set the Sql Filter. This allows you to select objects from one of the database dictionary tables, for example, SELECT TNAME FROM TAB WHERE TNAME LIKE 'BAL%'
.
The Select Types option allows you to specify the database object types to choose from. You may select tables, views and synonyms.
Stage 2 allows you to choose objects. You may expand a node in the list by pressing . Name Filter allows you to hide objects from view. Select the objects that you would like to import, using the check boxes to the left of each node. When completed, hit Next. Note that you may need to allow Talend time to retrieve information from the database, before this button becomes available.
Stage 3 allows you to make some final amendments before you import. This is more useful when you are reviewing previously imported objects. For the purposes of this tutorial, we'll leave these options with their default values. Hit Finish to import the schema.
You can now create a simple Job that will read data from your database.
Create a new Job and drag one of the Table objects from the Repository to the Job Workspace. In the example shown here, the Users table from the MySQL catalogue, will be used. When you drop the object on to the Workspace, you can choose the Talend component that should be used. Choose the default which is tMysqlInput (or the database input comonent for the database vendor you are using).
Note. In this example, you're going to write the output of this table to the Talend Console so you may want to choose a table with a small number of rows.
Add a tLogRow component to your Job and connect the two components using Row->Main. You should now be able to run your Job and see data displayed in the Talend Console. Your completed Job, and results, should look similar to the screenshot below.
You've now managed to connect to your database and read some data. All very simple. Now is a perfect opportunity to think about how we are going to write easily maintainable Jobs that are built in a consistent way and that can be promoted to your Test and Production environments with the minimum of effort and without the risk of a calamity.
Select your database input component and then select Component tab, as shown below.
Property Type Allows you to specify if your database connection parameters should be taken from the Repository (as they are in our example), or Built-In to the component itself.
At first glance, maintaining this component with a Repository connection seems like a sensibly idea; however, you need to think about how your Job is going to be promoted to your Test and Production environments. Are we going to change the repository connections each time we want to deploy to a different environment?
My preference is to use Repository connections only within a Development enviroment, as a starter to writing Jobs. Once I have added a database object to my Job, I immediately disassociate it with the Repository connection (as opposed to disassociating it with the Repository Schema; which will be discussed later).
Built-In allows you to hard-code the connection parameters within this component (don't do this), use Context Variables, or use globalMap Objects.
Usually, I'd recommend using Context Variables and then Externalizing these; however, with database components, there is a third way (which can, indirectly, use externalized Context Variables).
This option allows you to select an existing database connection, effectively, delegating the responsibility of the connection to another component. In the case of MySQL, a tMysqlConnection component.
This greatly reduces the configuration of your components such as tMysqlInput, as, with the delegation of connection responsibility, they need no further configuration if your database connection details change.
You will still want to externalize your database connection parameters; but now these may be configured within a greatly reduced number of components i.e. your database connection component. Also see our download LibMySQLSharedConnection and associated tutorial Reusable Database Connection Job.
Add a database connection component to your Job, for example, tMysqlConnection and then connect it to your database input component using Trigger->On Subjob Ok. Select the Component tab for this new component and set Property Type to your Repository database connection. Once you've selected Repository, you will need to hit the ...
to select the Repository item. Now return to the Component tab of your database input and select the option Use existing database connection. You should see that a new Component List drop-down list is displayed and that the new database connection is automatically selected, as shown in the screenshot below.
In this first article on Working with Databases, we've looked at some basics and we've wrapped-up our database connection parameters into a database connection component.
We can simplfy this process further, by creating reusable Jobs to reduce our coding effort, and to externalise our connection parameters. I would strongly recommend that you do this before embarking on creating lots of database Jobs.
For more information on this subject, read our tutorial on Reusable Database Connection Job and take a look at our download LibMySQLSharedConnection.
There are more tutorials on Working with Databases to follow.
comments powered by Disqus