Expert Consultancy from Yellow Pelican

Talend t[DB]Input Reference

A site about Talend

Talend t[DB]Input Reference

Talend provides a number of t[DB]Input components, providing the ability to select data from database objects using SQL. Although Talend provides a generic component, tDBInput, you are more than likely to find a custom component that supports the database vendor of your choice.

Our examples will be based on MySQL; but referring to other database vendors when appropriate. The principles learned here, can be applied to all of the database vendors that Talend supports.

tMysqlInput

Talend provides the tMysqlInput component, for selecting data from MySQL database Tables, Views and Synonyms, using Structured Query Language (SQL).

Although not demonstrated in the examples shown here, I would always recommend that you use the tMysqlConnection component to establish your database connections, rather than configuring them directly within this component. For more information on this, read our t[DB]Connection tutorial, and Working with Databases.

The simplest way to get going with this component, is to import the database schemas that you want to use, in to the Talend Repository. More information on this can be found in Working with Databases.

By importing your database schemas, you can then add Repository items to your Job design workspace. When you drag a database object to the design workspace, you will be able to choose the Component that should be used. In this case, it will be the default option of tMysqlInput.

The following screenshot shows the Component tab, for our newly created component.

Image 10

Property Type

Property Type allows you to specify if your database connection parameters should be taken from the Repository (as they are in our example), or Built-In to the component itself.

As discussed in Working with Databases, I would recommend that you do not use this option. Add a t[DB]Connection component and use this to maintain your database connections.

Use an existing connection

This option allows you to use an existing database connection. That is, a connection that has already been established with a t[DB]Connection component. On selecting this option, a new drop-down list will be displayed. This list will show all of the available connections. This is the connection method that I would always recommend.

Schema

As with all Talend components that handle data, you can specify a Schema that may be either held in your Repository, or Built-In to the component itself. If you have added this component from the Repository then, as with Property Type, the default option will be Repository.

Query Type

As can be seen from the screenshot above, although this component is using a Repository Schema, the default Query Type is Built-In. You can have Repository Queries; but you need to add these yourself - Repository->DB Connections->MyConnection->Queries.

Table Name

The only purpose served by the option Table Name is as an enabler for the button Guess Query. If you add a database object from the Repository, then it will be correctly set to the object name, for example, "users"; but it serves no purpose unless you want to Guess Query.

When setting this option, you may type it in or hit the elpises ... to select an object from the Repository.

Guess Query

Talend is able to guess the Query. To do this, press the Guess Query button. This is a simple process of taking the value from the field Table Name, the columns that are listed in the Schema, and using these to construct a simple SQL Select Statement.

Guess Schema

Guess Schema is a little more sophisticated than Guess Query. When you press the Guess Schema button, Talend takes the statement entered in Query and then uses this to interrogate the Database Data Dictionary, validating the table and columns, and to retrieve the correct Data Types. The SQL Data Types are mapped to Java types, which may include Java Primitive Data Types and Classes.

Mapping Query to Schema

Columns selected in your Query are mapped to the Schema by their position.

You may have more columns defined in your Schema than are defined in your Query. In this case, trailing Schema columns will be assigned with null values (or zero for numeric Primitive data types). If there are more columns defined in your Query than are defined in your Schema, the trailing columns will not be mapped to an output.

The data types of your Schema must be compatible with the data types returned from your Query. If they do not, it is likely that a Java Exception will be thrown when you run your Job.

Remember that you can Guess Schema or Guess Query at any time, should you ever need to rationalise your mapping between Query and Schema. Bear in mind that, if you have a complex Query, you risk losing aspects of this when you Guess Query, so always make a back-up of it first.

Advanced Settings

The Advanced Settings of the component may contain options that are database vendor specific.

Enable stream

Note. This option is an option that is specific to MySQL. Other options may be available for other database vendors.

When you read data from a MySQL database, by default, all available data will be read and held in memory. When appropriate, you may stream data by checking Enable stream. If you are using a shared database connection, you will not be able to write to the connection that is being streamed. For more information on streaming, read our t[DB]Connection Reference and Database Joins Tutorial.

Column Trimming

Various options are provided for trimming input data, should you find that your input data contains leading or trailing whitespace and you want to remove it.




Expert Consultancy from Yellow Pelican
comments powered by Disqus

© www.TalendByExample.com