A site about Talend
If you're designing your own Talend component, the first file that you'll need to deal with is the Component Descriptor file, for example, tMyComponent_java.xml
. This is the file that describes your component, including inputs, outputs and settings.
Your Component Descriptor file must conform to, the XSD file Component.xsd
. This file is located under the Talend installation directory. In my case (Mac OS X with Talend Open Studio 5.5.0), the file can be found at /Applications/TOS_DI-r117820-V5.5.0/configuration/org.eclipse.osgi/bundles/1835/1/.cp/model/Component.xsd
.
Component.xsd
is used internally by Talend and is also a useful reference, for understanding the structure of your XML file and knowing which elements and attributes are available and which are mandatory.
Unfortunately, this XSD is light on documentation and, despite its importance, information on this component is light. This document helps to fill that gap buy looking at each element and attribute of this file and providing some helpful documentation. This documentation is by no means complete (yet); however, it starts to build a useful reference beginning with the more common elements and attributes of this file.
You would usually expect the first line of an XML file to specify the XML version and the encoding. The source code for most of the standard components does not have this line. When you create a new component using Component Designer, the line shown below will be added. Although it would appear that this line is not mandatory, it would seem sensible that it is always specified.
<?xml version="1.0" encoding="UTF-8"?>
You can comment your XML in the following way.
<!-- This is a comment -->
The following tables describe the elements and attributes that are defined in Component.xsd
. For clarity, attribute names are prefixed @
. Where a value is described as the default value, this is the value that is assigned when you create a new component using the Component Designer.
Under the column Mandatory, I have also include Elements and Attributes that, although not strictly mandatory, there is no need to exclude them and you should, simply, just include then in all of your component.
Emement Attribute | Mandatory | Parent | Sample Value | Description |
---|---|---|---|---|
COMPONENT | Yes | COMPONENT is the root element. | ||
HEADER | Yes | COMPONENT | HEADER is is where you specify a number of attributes and elements, to describe your component. | |
FAMILIES | Yes | COMPONENT | FAMILIES is is where you specify the families that you want your component to belong to. This specifies the groups that your component will appear on, within the Component Palette. | |
DOCUMENTATION | Yes | COMPONENT | DOCUMENTATION allows you to specify a URL as the location of your component's documentation. If you have no documentation, specify URL as <URL/> . | |
CONNECTORS | Yes | COMPONENT | CONNECTORS are where you specify the inputs and output for your component. If you have no connectors, specify <CONNECTORS/> . | |
PARAMETERS | Yes | COMPONENT | PARAMETERS are where you specify the settings for your component. If you have no parameters, specify <PARAMETERS/> . | |
ADVANCED_PARAMETERS | Yes | COMPONENT | ADVANCED_PARAMETERS are where you specify the advanced settings for your component. These are shown on a separate tab, to PARAMETERS . If you have no advanced parameters, specify <ADVANCED_PARAMETERS/> . | |
CODEGENERATION | Yes | COMPONENT | TBC. If you have no code generation, specify <CODEGENERATION/> . | |
RETURNS | Yes | COMPONENT | RETURNS is where you specify your component's return values. If you have no return values, specify <RETURNS/> . |
I have split the HEADER
elements and attributes in to two tables. Those which are mandatory and those which are optional.
Emement Attribute | Mandatory | Parent | Sample Value | Description |
---|---|---|---|---|
HEADER | Yes | COMPONENT | HEADER is is where you specify a number of attributes and elements, to describe your component. | |
@AUTHOR | Yes | HEADER | Talend | This value specifies the component's Author Name. The default value is Talend ; however, you may enter your own name. |
@COMPATIBILITY | Yes | HEADER | ALL | This appears to indicate the Operating System that this component is supported on. Analysis of the standard components shows that standard values are ALL and WIN32 . There are a few components that have a value of WIN32 , for example, tMap; however it is clear that this component is not restricted to WIN32 . My feeling is that this value as been deprecated. It is also unclear what the relationship is to PLATEFORM . Setting this to a value of ALL appears a fair bet. |
@PLATEFORM | Yes | HEADER | ALL | I suspect that PLATEFORM got lost in translation, some how, and it should actually have read PLATFORM . This appears to indicate the Operating System that this component is supported on. Analysis of the standard components shows that standard values are ALL and WINDOWS . The only components that do not have the value of ALL are a series of tSPPS* components. Also see comments for COMPATIBILITY , above. Setting this to a value of ALL appears a resonable choice. |
@RELEASE_DATE | Yes | HEADER | 20140531A | This value specifies the release date of your component, in the format YYYYMMDD , followed by a single uppercase letter. The default is the oddly chosen 20080229A . Analysys of the standard components shows that the the single characters used are A and P . Although it is not documented, I suspect that these values represent the state of the component - Alpha and Production. Also see comments for STATUS . I can also see that there are invalid values for many of the standard components, so I suspect there is no validation of this value. All except for one of the standard components uses a value of A , so I see no reason to use any other value for my own components i.e. I stick to a value patterm of YYYYMMDDA and I simply enter the date that I created my component. |
@SERIAL | Yes | HEADER | "" | This attribute appears to be included in all components; however, there is no indiction as to its purpose and analysis of the standard components shows that it never receives a value. I always set it SERIAL="" . |
@STARTABLE | Yes | HEADER | true | This is a boolean attribute that indcates if the component is permitted to start a Subjob. The permitted values are true or false . You should be able to detemine from your component's purpose, what the approriate value should be.
|
@STATUS | Yes | HEADER | ALPHA | This value indicates the status of your Job. Analysis of the standard components use the values of ALPHA and BETA (I have seen community developed components on the Talend Exchange using PROD . Although the purpose of this attribute appears clear, the vast majority of the standard components have STATUS="APLPHA" and I see no value in doing anything different with my own components. Also see comments for RELEASE_DATE , above. |
@VERSION | Yes | HEADER | 0.1 | This value specifies the component's Version Number. The default value is an unexpected0.102 . Typical Talend versioning tends to start with 0.1 and tick-up by minor numbers. There is no restrictions on version numbers so you may use a scheme of your choice. Just remember to tick the number up, each time you release a new version of your component. |
SIGNATURE | Yes | HEADER | This element appears to be included in all components; however, there is no indiction as to its purpose and analysis of the standard components shows that it never receives a value. I always set it to <SIGNATURE/> . |
The following code-fragment shows a typical HEADER
.
<HEADER AUTHOR="Talend" COMPATIBILITY="ALL" PLATEFORM="ALL" RELEASE_DATE="20140531A" SERIAL="" STARTABLE="true" STATUS="ALPHA" VERSION="0.1"> <SIGNATURE/> </HEADER>
Emement Attribute | Mandatory | Parent | Sample Value | Description |
---|---|---|---|---|
@DATA_AUTO_PROPAGATE | No | HEADER | false | I've noticed that, with later versions of Talend, Component Designer adds the attribute DATA_AUTO_PROPAGATE="false" . Some but not all of the standard components set AUTO_PROPAGATE , with most setting the value to false . |
@COMBINED | No | HEADER | This attribute is only used by the 4 components in the ELT/Combined SQL family. This appears to bea specialised attribute and is, therefore, outside the cope of this documentation. | |
@SCHEMA_AUTO_PROPAGATE | No | HEADER | TBC | |
@EXTENSION | No | HEADER | TBC | |
@HAS_CONDITIONAL_OUTPUTS | No | HEADER | This attribute specifies if the component has conditional outputs. TBC | |
@HASH_COMPONENT | No | HEADER | This attribute is only used by a number of tHash* components. This appears to bea specialised attribute and is, therefore, outside the cope of this documentation. | |
@IS_MULTIPLYING_OUTPUTS | No | HEADER | TBC | |
@MAIN_CODE_CALLED | No | HEADER | This attribute does not appear to be used by any of the standard components and is, therfore, outside the scope of this documentation. | |
@NUMBER_PARALELLIZE | No | HEADER | This attribute does not appear to be used by any of the standard components and is, therfore, outside the scope of this documentation. | |
@SHORT_NAME | No | HEADER | This attribute does not appear to be used by any of the standard components and is, therfore, outside the scope of this documentation. | |
@SINGLETON | No | HEADER | This boolean attribute appears to have been introduced for the tPreJob and tPostJob components, as these are the only components that set this attribute, with a true value. I believe that this is to indicate that no other component may be belong to the Subjob | |
@SUBJOB_COLOR | No | HEADER | This string attribute appears to have been introduced for the tPreJob and tPostJob components, as these are the only components that set this attribute, with a value of 255;220;180 . This attribute allows you to change the default background colour of the Subjob. Best practive appears to be that the colour is only changed for SINGLETON components. | |
@SUBJOB_TITLE_COLOR | No | HEADER | This string attribute appears to have been introduced for the tPreJob and tPostJob components, as these are the only components that set this attribute, with a value of 230;100;080 . This attribute allows you to change the default title bar colour of the Subjob. Best practive appears to be that the colour is only changed for SINGLETON components. | |
@TECHNICAL | No | HEADER | The purpose of this boolean attribute is not clear. It appears to be resevered, with a true value for a limited number of components; which appear not to be for publiction to the Component Palette. | |
@TSTATCATCHER_STATS | No | HEADER | This attribute does not appear to be used by any of the standard components and is, therfore, outside the scope of this documentation. | |
@VISIBLE | No | HEADER | The prpose of this boolean attribute is not clear. It appears to be reserved, with a false value, for a limited number of components. | |
@PARTITIONING | No | HEADER | This attribute appears to have been introduced in more recent versions of Talend. It certainly exists for most, if not all, of the standard components in Talend Open Studio 5.5.0. Common values are NONE and AUTO . I have not yet discovered its purpose; but I belive it's related to parallel execution. | |
FORMAT | No | HEADER | This element s not used in any of the standard components and does not appear to be in use. |
For your component to appear on the Component Palette, you must specify one or more FAMILY
.
Emement Attribute | Mandatory | Parent | Sample Value | Description |
---|---|---|---|---|
FAMILIES | Yes | COMPONENT | FAMILIES is is where you specify the families that you want your component to belong to. This specifies the groups that your component will appear on, within the Component Palette. | |
FAMILY | Yes | FAMILIES | Sample | Specify one or more families that you would like your component to belong to. These may be existing or new families and you may also specify sub-folders within a family group. |
The following code-fragment shows a typical FAMILIES
.
<FAMILIES> <FAMILY>File/Input</FAMILY> <FAMILY>Sample</FAMILY> </FAMILIES>
DOCUMENTATION
allows you to specify a URL, as the location of your component's documentation.
Emement Attribute | Mandatory | Parent | Sample Value | Description |
---|---|---|---|---|
DOCUMENTATION | Yes | COMPONENT | DOCUMENTATION is is where you specify tthe location of your component's documentation. | |
URL | Yes | DOCUMENTATION | Specify a URL, as the location of your documentation. |
The following code-fragment shows a typical DOCUMENTATION
.
<DOCUMENTATION> <URL>http://www.talendbyexample.com/talend-tcheckpoint-component.html</URL> </DOCUMENTATION>
CONNECTORS
are where you specify the inputs and outputs for your component.
Emement Attribute | Mandatory | Parent | Sample Value | Description |
---|---|---|---|---|
CONNECTORS | No | COMPONENT | Specify zero or more connectors for your component. | |
CONNECTOR | Yes | CONNECTORS | Specifies an individual connector for your component. | |
@CTYPE | Yes | CONNECTOR | CTYPE specifyes the Connector Type. The permitted values do not appear to be constrained by an enumerate list. In the next table, I have analysed all of the Connector types that are supported by the standard components. | |
@MIN_INPUT | CONNECTOR | 1 | This integer attribute specifies the minimum number of input instances of this connector that permitted. | |
@MAX_INPUT | CONNECTOR | 1 | This integer attribute specifies the maximum number of input instances of this connector that permitted. | |
@MIN_OUTPUT | CONNECTOR | 1 | This integer attribute specifies the minimum number of output instances of this connector that permitted. | |
@MAX_OUTPUT | CONNECTOR | 1 | This integer attribute specifies the maximum number of output instances of this connector that permitted. | |
@BASE_SCHEMA | CONNECTOR | TBC | ||
@BUILTIN | CONNECTOR | TBC | ||
@COLOR | CONNECTOR | COLOR is a string attribute that allows you to specify the colour of the Connector Line. This is, typically, set to F0000 (red) for REJECT CTYPE="FLOW" connectors and is used in conjunction with LINE_STYLE . | ||
@COMPONENT | CONNECTOR | TBC | ||
@INPUT_LINE_SELECTION | CONNECTOR | TBC | ||
@LINK_STYLE | CONNECTOR | LINE_STYLE is an integer attribute that allows you to specify the line style of the Connector Line. This is, typically, set to 2 (dotted line) for REJECT CTYPE="FLOW" connectors and is used in conjunction with COLOR . | ||
@MERGE_ALLOWED_DIFFERENT_SCHEMA | CONNECTOR | TBC | ||
@MULTI_SCHEMA | CONNECTOR | TBC | ||
@NAME | CONNECTOR | This string attribute is used for naming connections where CTYPE="FLOW" . This is the only usage that I've observed, with the standard components. Typical values are FILTER, INPUT, LOOKUP, MAIN, OTHER, OUTPUT, OUTPUT_MAIN, REJECT, SCHEMA_TARGET, ELTCOMBINE . | ||
@NOT_SHOW_IF | CONNECTOR | This boolean attribute allows you to specify cases where a CTYPE="FLOW" output should not be avaiable from within the Talend Designer. I have only observed this with the standard components, for NAME="REJECT" ; where the test NOT_SHOW_IF="(DIE_ON_ERROR == 'true')" is made i.e. do not allow this connector to be used when the component has been configured to Die on Error. | ||
@SHOW_IF | CONNECTOR | I have not observed this attribute being used by any of the standard components. If it is supported, it would appear that it may be used in a similar manner to NOT_SHOW_IF . |
The following code-fragment shows a typical CONNECTORS
. This example has been taken from the tJava component.
<CONNECTORS> <CONNECTOR CTYPE="FLOW" MAX_INPUT="1" MAX_OUTPUT="1"/> <CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="1" MAX_INPUT="1"/> <CONNECTOR CTYPE="SUBJOB_OK" MAX_INPUT="1" /> <CONNECTOR CTYPE="SUBJOB_ERROR" MAX_INPUT="1" /> <CONNECTOR CTYPE="COMPONENT_OK"/> <CONNECTOR CTYPE="COMPONENT_ERROR"/> <CONNECTOR CTYPE="RUN_IF"/> </CONNECTORS>
The following table describes the permitted values for the CTYPE
attribute.
Value | Description |
---|---|
COMPONENT_ERROR | This value allows you to specify that a SubJob will be called in the event of COMPONENT_ERROR . Typically, no other attributes are provided to this Connection Type. Some of the standard components also specify a maximum number of inputs and/or outputs. |
COMPONENT_OK | This value allows you to specify that a SubJob will be called in the event of COMPONENT_OK . Typically, no other attributes are provided to this Connection Type. Some of the standard components also specify a maximum number of inputs and/or outputs. |
FLOW | The FLOW connector is at the heart of most component design, as it allows data to flow in to and/or out of a component. For the majority of the standard components, both MAX_INPUTS and MAX_OUTPUTS are also specified for FLOW connectors. A common variation to this is for REJECT connectors, as described in the table above where COLOR , for example, is modified. |
ITERATE | The ITERATE connector allows you to iterate data between components, rather than flow. For the majority of the standard components, both MAX_INPUTS and MAX_OUTPUTS are also specified for ITERATE connectors. I have observed little variation of to this, with the standard components. |
LOOKUP | TBC |
MERGE | TBC |
REFERENCE | TBC |
ROWS_END | TBC |
RUN_ERROR | TBC |
RUN_IF | This value allows you to specify that a SubJob will be called in the event of a boolean condition being met. This condition is determined by the Job designer, not by the component designer. For the majority of standard components, no other attributes are set for this connector type. A few components set MAX_INPUTS and/or MAX_OUTPUTS . |
RUN_OK | TBC |
SUBJOB_ERROR | TBC |
SUBJOB_OK | TBC |
TABLE | TBC |
THEN_RUN | TBC |
PARAMETERS
and ADVANCED_PARAMETERS
are where you specify the settings for your component. The only difference between these two repeating groups of elements is, which design tab they appear on, Basic settings or Advanced settings.
Emement Attribute | Mandatory | Parent | Sample Value | Description |
---|---|---|---|---|
PARAMETERS | Yes | COMPONENT | PARAMETERS are where you specify the settings for your component. If you have no parameters, specify <PARAMETERS/> . |
|
ADVANCED_PARAMETERS | Yes | COMPONENT | ADVANCED_PARAMETERS are where you specify the advanced settings for your component. If you have no parameters, specify <ADVANCED_PARAMETERS/> . |
|
PARAMETER | Yes | *PARAMETERS | ADVANCED_PARAMETERS are where you specify the advanced settings for your component. If you have no parameters, specify <ADVANCED_PARAMETERS/> . |
|
@NAME | Yes | PARAMETER | FILE_NAME | You specify the unique name of this component using the string attribute, NAME . This is the name that you will use to reference the parameter, from within your JET source code files. You will also specify a screen label for the component, in the message properties file, for example, FILE_NAME.NALE=File Name . |
@NUM_ROW | Yes | PARAMETER | 10 | The integer attribute NUM_ROW specifies the relative line number where the parameter will appears on the component's Settings Pane. Remember that there are two setting panes, Basic Settings and Advanced Settings. If you have two parameters with NUM_ROW of 1 and 2 , they will occupy the first two lines of the Setting Pane. If the values are 10 and 15 , they will still ocupy the first two lines. The reason that many components have an incremental jump is to allow new paramters to be easily inserted. You may have multiple parameters on the same line. |
@SHOW | No | PARAMETER | This boolean attribute determines if the parameter should be available for input. This is unconditional. The parameter is either available, or not. | |
@SHOW_IF | No | PARAMETER | The SHOW_IF attribute allows you to control if the parameter should be available for input. The result of SHOW_IF must return a boolean value. If true is returned, then the parameter will be shown. If a parameter is not shown, it will still retain any value that has been set. Example - SHOW_IF="USESTREAM=='false'" . | |
@NOT_SHOW_IF | No | PARAMETER | The NOT_SHOW_IF attribute performs in a similar manner to SHOW_IF except that the test is reversed. | |
@REQUIRED | No | PARAMETER | This boolean attribute specifies is the attribute is required. If this attributes is not specified, the value is defaulted to false . TBC | |
@GROUP | No | PARAMETER | This string attribute allows you to group parameters. Graphicallly, parameters within the same GROUP appear in a highlighted panel, together with a group name. In the case of Radio Buttons, this also controls these buttons as a group i.e. only a single button within the group may be selected. A good example of grouping is the tLoop component. This component supports both a For loop and a While loop. The loop type is determined by Radio Buttons that belong to the GROUP=LOOPTYPE . | |
@PROPERTY_VALUE | No | PARAMETER | The value for this parameter may come from either the Repository or it may be Built-In. This attribute specifieds a repository value, for example, REPOSITORY_VALUE="FILE_PATH" . This attribute is used in conjunction with a parameter that specifies FIELD="PROPERTY_TYPE" ; which is described in the following table. PROPERTY_TYPE must also specify a PROPERTY_VALUE to identify the Metadata type. A common setting would be PROPERTY_VALUE="DELIMITED" indicating a Delimited File. | |
@CONTEXT | No | PARAMETER | This attribute is used by some of the Custom Code components and appears to indicate the JETET template file that is associated with the parameter, begin, main or end. Beyond this, I have not been able to determine the purpose of this attribute. | |
DEFAULT | No | PARAMETER | The Element allows you to specify a default value for a parameter. A useful defaults - __COMP_DEFAULT_FILE_DIR__ . |
The following code-fragment shows a typical PARAMETERS
and ADVANCED_PARAMETERS
. This example has been taken from the tJava component.
<PARAMETERS> <PARAMETER NAME="CODE" FIELD="MEMO_JAVA" REQUIRED="false" NUM_ROW="2" NB_LINES="9" CONTEXT="begin"> <DEFAULT>String foo = "bar";</DEFAULT> </PARAMETER> </PARAMETERS> <ADVANCED_PARAMETERS> <PARAMETER NAME="IMPORT" FIELD="MEMO_IMPORT" REQUIRED="false" NUM_ROW="1" NB_LINES="3"> <DEFAULT>//import java.util.List;</DEFAULT> </PARAMETER> </ADVANCED_PARAMETERS>
The following table describes the permitted values for the FIELD
attribute.
Value | Description |
---|---|
PROPERTY_TYPE | Many components allow you to have parameters specified as either coming from from the Repository, or as Built-In to the component itself. This is often used for parameters such as database connections where you can define your database connection within the repository and then use this repository connection directly within your Job (I would never recommend doing this with database connections). This value is used in conjunction with the attribute REPOSITORY_VALUE . |
TEXT | This is the simplest FIELD type, allowing the parameter to store a simple sting value. |
MEMO | Similar in usage to TEXT , this FIELD type provides a multi-line input. The number of lines are specified by the attribute NB_LINES , for example, NB_LINES="9" . A numner of styled MEMO fields are also available, for example, MEMO_JAVA . These styled memo fields are for providing features such as syntax-highlighting and auto-completion, if available. |
MEMO_IMPORT | Similar in usage to MEMO , this FIELD type provides a multi-line input that is styled for Java import code. |
MEMO_JAVA | Similar in usage to MEMO , this FIELD type provides a multi-line input that is styled for Java. |
MEMO_SQL | Similar in usage to MEMO , this FIELD type provides a multi-line input that is styled for SQL. |
MEMO_MESSAGE | Similar in usage to MEMO , this FIELD type provides a multi-line input that is styled for Email messages. This fields is only used by the standard component tSendMail. |
RADIO | Radio buttons provide a boolean value and are used within a group. It makes no sense to have a single radio button. If you need to have multiple groups of radio buttons, then you can group these by using the attribute PARAMETER.GROUP . For more information on this attribute, refer to GROUP in the table above. If you do not need to group your boolean values, use a checkbox (CHECK ) as detailed below. |
CHECK | Checkbox provides a boolean value. Unlike radio buttons (RADIO ) show above, checkboxes do not need to be used as part of a group and simply toggle a boolean value on and off. |
LABEL | This string field allows you to specify a label that should be displayed. Usually, this is to provide a help message for another parameter. The message is specified in the DEFAULT element. An example of this can be seen with the tFileInputDelimited component, where some text is displayed prior to the File name/Stream parameter. |
COLOR | This field provides a color picker. You can specify a default color, for example <DEFAULT>"FF0000"<DEFAULT> . |
DATE | This field provides a date picker. You can specify a default date, for example <DEFAULT>"1970-01-01 00:00:00"<DEFAULT> . |
AS400_CHECK | TBC |
CLOSED_LIST | TBC |
CODE | TBC |
COLUMN_LIST | TBC |
COMMAND | TBC |
COMPONENT_LIST | TBC |
CONNECTION_LIST | TBC |
CONTEXT_PARAM_NAME_LIST | TBC |
DBTABLE | TBC |
DIRECTORY | TBC |
ENCODING_TYPE | TBC |
EXTERNAL | TBC |
FIELD | TBC |
FILE | TBC |
FLOAT | TBC |
GUESS_SCHEMA | TBC |
HADOOP_JARS_DIALOG | TBC |
IMAGE | TBC |
INTEGER | TBC |
LOOKUP_COLUMN_LIST | TBC |
MAPPING_TYPE | TBC |
MODULE_LIST | TBC |
OPENED_LIST | TBC |
PASSWORD | TBC |
PREV_COLUMN_LIST | TBC |
PROCESS_TYPE | TBC |
QUERYSTORE_TYPE | TBC |
SCHEMA_TYPE | TBC |
SCHEMA_XPATH_QUERYS | TBC |
STRING | TBC |
String | TBC |
TABLE | TBC |
TDPID | TBC |
TNS_EDITOR | TBC |
WSDL2JAVA | TBC |