Expert Consultancy from Yellow Pelican

Component.xsd

A site about Talend

Talend Component.xsd Descriptor File

If you're designing your own Talend component, the first file that you'll need to deal with is the Component Descriptor file, for example, tMyComponent_java.xml. This is the file that describes your component, including inputs, outputs and settings.

Your Component Descriptor file must conform to, the XSD file Component.xsd. This file is located under the Talend installation directory. In my case (Mac OS X with Talend Open Studio 5.5.0), the file can be found at /Applications/TOS_DI-r117820-V5.5.0/configuration/org.eclipse.osgi/bundles/1835/1/.cp/model/Component.xsd.

Component.xsd is used internally by Talend and is also a useful reference, for understanding the structure of your XML file and knowing which elements and attributes are available and which are mandatory.

Unfortunately, this XSD is light on documentation and, despite its importance, information on this component is light. This document helps to fill that gap buy looking at each element and attribute of this file and providing some helpful documentation. This documentation is by no means complete (yet); however, it starts to build a useful reference beginning with the more common elements and attributes of this file.

XML Version and Encoding

You would usually expect the first line of an XML file to specify the XML version and the encoding. The source code for most of the standard components does not have this line. When you create a new component using Component Designer, the line shown below will be added. Although it would appear that this line is not mandatory, it would seem sensible that it is always specified.

<?xml version="1.0" encoding="UTF-8"?>

Comments

You can comment your XML in the following way.

<!-- This is a comment -->

Elements and Attibutes

The following tables describe the elements and attributes that are defined in Component.xsd. For clarity, attribute names are prefixed @. Where a value is described as the default value, this is the value that is assigned when you create a new component using the Component Designer.

Under the column Mandatory, I have also include Elements and Attributes that, although not strictly mandatory, there is no need to exclude them and you should, simply, just include then in all of your component.

COMPONENT Elements and Attributes

Emement AttributeMandatoryParentSample ValueDescription
COMPONENTYesCOMPONENT is the root element.
HEADERYesCOMPONENTHEADER is is where you specify a number of attributes and elements, to describe your component.
FAMILIESYesCOMPONENTFAMILIES is is where you specify the families that you want your component to belong to. This specifies the groups that your component will appear on, within the Component Palette.
DOCUMENTATIONYesCOMPONENTDOCUMENTATION allows you to specify a URL as the location of your component's documentation. If you have no documentation, specify URL as <URL/>.
CONNECTORSYesCOMPONENTCONNECTORS are where you specify the inputs and output for your component. If you have no connectors, specify <CONNECTORS/>.
PARAMETERSYesCOMPONENTPARAMETERS are where you specify the settings for your component. If you have no parameters, specify <PARAMETERS/>.
ADVANCED_PARAMETERSYesCOMPONENTADVANCED_PARAMETERS are where you specify the advanced settings for your component. These are shown on a separate tab, to PARAMETERS. If you have no advanced parameters, specify <ADVANCED_PARAMETERS/>.
CODEGENERATIONYesCOMPONENTTBC. If you have no code generation, specify <CODEGENERATION/>.
RETURNSYesCOMPONENTRETURNS is where you specify your component's return values. If you have no return values, specify <RETURNS/>.

HEADER Elements and Attributes

I have split the HEADER elements and attributes in to two tables. Those which are mandatory and those which are optional.

Emement AttributeMandatoryParentSample ValueDescription
HEADERYesCOMPONENTHEADER is is where you specify a number of attributes and elements, to describe your component.
@AUTHORYesHEADERTalendThis value specifies the component's Author Name. The default value is Talend; however, you may enter your own name.
@COMPATIBILITYYesHEADERALLThis appears to indicate the Operating System that this component is supported on. Analysis of the standard components shows that standard values are ALL and WIN32. There are a few components that have a value of WIN32, for example, tMap; however it is clear that this component is not restricted to WIN32. My feeling is that this value as been deprecated. It is also unclear what the relationship is to PLATEFORM. Setting this to a value of ALL appears a fair bet.
@PLATEFORMYesHEADERALLI suspect that PLATEFORM got lost in translation, some how, and it should actually have read PLATFORM. This appears to indicate the Operating System that this component is supported on. Analysis of the standard components shows that standard values are ALL and WINDOWS. The only components that do not have the value of ALL are a series of tSPPS* components. Also see comments for COMPATIBILITY, above. Setting this to a value of ALL appears a resonable choice.
@RELEASE_DATEYesHEADER20140531AThis value specifies the release date of your component, in the format YYYYMMDD, followed by a single uppercase letter. The default is the oddly chosen 20080229A. Analysys of the standard components shows that the the single characters used are A and P. Although it is not documented, I suspect that these values represent the state of the component - Alpha and Production. Also see comments for STATUS. I can also see that there are invalid values for many of the standard components, so I suspect there is no validation of this value. All except for one of the standard components uses a value of A, so I see no reason to use any other value for my own components i.e. I stick to a value patterm of YYYYMMDDA and I simply enter the date that I created my component.
@SERIALYesHEADER""This attribute appears to be included in all components; however, there is no indiction as to its purpose and analysis of the standard components shows that it never receives a value. I always set it SERIAL="".
@STARTABLEYesHEADERtrueThis is a boolean attribute that indcates if the component is permitted to start a Subjob. The permitted values are true or false. You should be able to detemine from your component's purpose, what the approriate value should be.
@STATUSYesHEADERALPHAThis value indicates the status of your Job. Analysis of the standard components use the values of ALPHA and BETA (I have seen community developed components on the Talend Exchange using PROD. Although the purpose of this attribute appears clear, the vast majority of the standard components have STATUS="APLPHA" and I see no value in doing anything different with my own components. Also see comments for RELEASE_DATE, above.
@VERSIONYesHEADER0.1This value specifies the component's Version Number. The default value is an unexpected0.102. Typical Talend versioning tends to start with 0.1 and tick-up by minor numbers. There is no restrictions on version numbers so you may use a scheme of your choice. Just remember to tick the number up, each time you release a new version of your component.
SIGNATUREYesHEADERThis element appears to be included in all components; however, there is no indiction as to its purpose and analysis of the standard components shows that it never receives a value. I always set it to <SIGNATURE/>.

The following code-fragment shows a typical HEADER.

<HEADER 
	AUTHOR="Talend"
	COMPATIBILITY="ALL" 
	PLATEFORM="ALL" 
	RELEASE_DATE="20140531A" 
	SERIAL=""
	STARTABLE="true" 
	STATUS="ALPHA" 
	VERSION="0.1">

	<SIGNATURE/>
</HEADER>

Emement AttributeMandatoryParentSample ValueDescription
@DATA_AUTO_PROPAGATENoHEADERfalseI've noticed that, with later versions of Talend, Component Designer adds the attribute DATA_AUTO_PROPAGATE="false". Some but not all of the standard components set AUTO_PROPAGATE, with most setting the value to false.
@COMBINEDNoHEADERThis attribute is only used by the 4 components in the ELT/Combined SQL family. This appears to bea specialised attribute and is, therefore, outside the cope of this documentation.
@SCHEMA_AUTO_PROPAGATENoHEADERTBC
@EXTENSIONNoHEADERTBC
@HAS_CONDITIONAL_OUTPUTSNoHEADERThis attribute specifies if the component has conditional outputs. TBC
@HASH_COMPONENTNoHEADERThis attribute is only used by a number of tHash* components. This appears to bea specialised attribute and is, therefore, outside the cope of this documentation.
@IS_MULTIPLYING_OUTPUTSNoHEADERTBC
@MAIN_CODE_CALLEDNoHEADERThis attribute does not appear to be used by any of the standard components and is, therfore, outside the scope of this documentation.
@NUMBER_PARALELLIZENoHEADERThis attribute does not appear to be used by any of the standard components and is, therfore, outside the scope of this documentation.
@SHORT_NAMENoHEADERThis attribute does not appear to be used by any of the standard components and is, therfore, outside the scope of this documentation.
@SINGLETONNoHEADERThis boolean attribute appears to have been introduced for the tPreJob and tPostJob components, as these are the only components that set this attribute, with a true value. I believe that this is to indicate that no other component may be belong to the Subjob
@SUBJOB_COLORNoHEADERThis string attribute appears to have been introduced for the tPreJob and tPostJob components, as these are the only components that set this attribute, with a value of 255;220;180. This attribute allows you to change the default background colour of the Subjob. Best practive appears to be that the colour is only changed for SINGLETON components.
@SUBJOB_TITLE_COLORNoHEADERThis string attribute appears to have been introduced for the tPreJob and tPostJob components, as these are the only components that set this attribute, with a value of 230;100;080. This attribute allows you to change the default title bar colour of the Subjob. Best practive appears to be that the colour is only changed for SINGLETON components.
@TECHNICALNoHEADERThe purpose of this boolean attribute is not clear. It appears to be resevered, with a true value for a limited number of components; which appear not to be for publiction to the Component Palette.
@TSTATCATCHER_STATSNoHEADERThis attribute does not appear to be used by any of the standard components and is, therfore, outside the scope of this documentation.
@VISIBLENoHEADERThe prpose of this boolean attribute is not clear. It appears to be reserved, with a false value, for a limited number of components.
@PARTITIONINGNoHEADERThis attribute appears to have been introduced in more recent versions of Talend. It certainly exists for most, if not all, of the standard components in Talend Open Studio 5.5.0. Common values are NONE and AUTO. I have not yet discovered its purpose; but I belive it's related to parallel execution.
FORMATNoHEADERThis element s not used in any of the standard components and does not appear to be in use.

FAMILIES Elements and Attributes

For your component to appear on the Component Palette, you must specify one or more FAMILY.

Emement AttributeMandatoryParentSample ValueDescription
FAMILIESYesCOMPONENTFAMILIES is is where you specify the families that you want your component to belong to. This specifies the groups that your component will appear on, within the Component Palette.
FAMILYYesFAMILIESSampleSpecify one or more families that you would like your component to belong to. These may be existing or new families and you may also specify sub-folders within a family group.

The following code-fragment shows a typical FAMILIES.

<FAMILIES>
	<FAMILY>File/Input</FAMILY>
	<FAMILY>Sample</FAMILY>
</FAMILIES>

DOCUMENTATION Elements and Attributes

DOCUMENTATION allows you to specify a URL, as the location of your component's documentation.

Emement AttributeMandatoryParentSample ValueDescription
DOCUMENTATIONYesCOMPONENTDOCUMENTATION is is where you specify tthe location of your component's documentation.
URLYesDOCUMENTATIONSpecify a URL, as the location of your documentation.

The following code-fragment shows a typical DOCUMENTATION.

<DOCUMENTATION>
	<URL>http://www.talendbyexample.com/talend-tcheckpoint-component.html</URL>
</DOCUMENTATION>

CONNECTORS Elements and Attributes

CONNECTORS are where you specify the inputs and outputs for your component.

Emement AttributeMandatoryParentSample ValueDescription
CONNECTORSNoCOMPONENTSpecify zero or more connectors for your component.
CONNECTORYesCONNECTORSSpecifies an individual connector for your component.
@CTYPEYesCONNECTORCTYPE specifyes the Connector Type. The permitted values do not appear to be constrained by an enumerate list. In the next table, I have analysed all of the Connector types that are supported by the standard components.
@MIN_INPUTCONNECTOR1This integer attribute specifies the minimum number of input instances of this connector that permitted.
@MAX_INPUTCONNECTOR1This integer attribute specifies the maximum number of input instances of this connector that permitted.
@MIN_OUTPUTCONNECTOR1This integer attribute specifies the minimum number of output instances of this connector that permitted.
@MAX_OUTPUTCONNECTOR1This integer attribute specifies the maximum number of output instances of this connector that permitted.
@BASE_SCHEMACONNECTORTBC
@BUILTINCONNECTORTBC
@COLORCONNECTORCOLOR is a string attribute that allows you to specify the colour of the Connector Line. This is, typically, set to F0000 (red) for REJECT CTYPE="FLOW" connectors and is used in conjunction with LINE_STYLE.
@COMPONENTCONNECTORTBC
@INPUT_LINE_SELECTIONCONNECTORTBC
@LINK_STYLECONNECTORLINE_STYLE is an integer attribute that allows you to specify the line style of the Connector Line. This is, typically, set to 2 (dotted line) for REJECT CTYPE="FLOW" connectors and is used in conjunction with COLOR.
@MERGE_ALLOWED_DIFFERENT_SCHEMACONNECTORTBC
@MULTI_SCHEMACONNECTORTBC
@NAMECONNECTORThis string attribute is used for naming connections where CTYPE="FLOW". This is the only usage that I've observed, with the standard components. Typical values are FILTER, INPUT, LOOKUP, MAIN, OTHER, OUTPUT, OUTPUT_MAIN, REJECT, SCHEMA_TARGET, ELTCOMBINE.
@NOT_SHOW_IFCONNECTORThis boolean attribute allows you to specify cases where a CTYPE="FLOW" output should not be avaiable from within the Talend Designer. I have only observed this with the standard components, for NAME="REJECT"; where the test NOT_SHOW_IF="(DIE_ON_ERROR == 'true')" is made i.e. do not allow this connector to be used when the component has been configured to Die on Error.
@SHOW_IFCONNECTORI have not observed this attribute being used by any of the standard components. If it is supported, it would appear that it may be used in a similar manner to NOT_SHOW_IF.

The following code-fragment shows a typical CONNECTORS. This example has been taken from the tJava component.


<CONNECTORS>
	<CONNECTOR CTYPE="FLOW" MAX_INPUT="1" MAX_OUTPUT="1"/>
	<CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="1" MAX_INPUT="1"/>
	<CONNECTOR CTYPE="SUBJOB_OK" MAX_INPUT="1" />
	<CONNECTOR CTYPE="SUBJOB_ERROR" MAX_INPUT="1" />
	<CONNECTOR CTYPE="COMPONENT_OK"/>
	<CONNECTOR CTYPE="COMPONENT_ERROR"/>
	<CONNECTOR CTYPE="RUN_IF"/>
</CONNECTORS>

CONNECTORS Elements and Attributes - CONNECTOR CTYPE Values

The following table describes the permitted values for the CTYPE attribute.

ValueDescription
COMPONENT_ERRORThis value allows you to specify that a SubJob will be called in the event of COMPONENT_ERROR. Typically, no other attributes are provided to this Connection Type. Some of the standard components also specify a maximum number of inputs and/or outputs.
COMPONENT_OKThis value allows you to specify that a SubJob will be called in the event of COMPONENT_OK. Typically, no other attributes are provided to this Connection Type. Some of the standard components also specify a maximum number of inputs and/or outputs.
FLOWThe FLOW connector is at the heart of most component design, as it allows data to flow in to and/or out of a component. For the majority of the standard components, both MAX_INPUTS and MAX_OUTPUTS are also specified for FLOW connectors. A common variation to this is for REJECT connectors, as described in the table above where COLOR, for example, is modified.
ITERATEThe ITERATE connector allows you to iterate data between components, rather than flow. For the majority of the standard components, both MAX_INPUTS and MAX_OUTPUTS are also specified for ITERATE connectors. I have observed little variation of to this, with the standard components.
LOOKUPTBC
MERGETBC
REFERENCETBC
ROWS_ENDTBC
RUN_ERRORTBC
RUN_IFThis value allows you to specify that a SubJob will be called in the event of a boolean condition being met. This condition is determined by the Job designer, not by the component designer. For the majority of standard components, no other attributes are set for this connector type. A few components set MAX_INPUTS and/or MAX_OUTPUTS.
RUN_OKTBC
SUBJOB_ERRORTBC
SUBJOB_OKTBC
TABLETBC
THEN_RUNTBC

PARAMETERS and ADVANCED_PARAMETERS Elements and Attributes

PARAMETERS and ADVANCED_PARAMETERS are where you specify the settings for your component. The only difference between these two repeating groups of elements is, which design tab they appear on, Basic settings or Advanced settings.

Emement AttributeMandatoryParentSample ValueDescription
PARAMETERSYesCOMPONENTPARAMETERS are where you specify the settings for your component. If you have no parameters, specify <PARAMETERS/>.
ADVANCED_PARAMETERSYesCOMPONENTADVANCED_PARAMETERS are where you specify the advanced settings for your component. If you have no parameters, specify <ADVANCED_PARAMETERS/>.
PARAMETERYes*PARAMETERSADVANCED_PARAMETERS are where you specify the advanced settings for your component. If you have no parameters, specify <ADVANCED_PARAMETERS/>.
@NAMEYesPARAMETERFILE_NAMEYou specify the unique name of this component using the string attribute, NAME. This is the name that you will use to reference the parameter, from within your JET source code files. You will also specify a screen label for the component, in the message properties file, for example, FILE_NAME.NALE=File Name.
@NUM_ROWYesPARAMETER10The integer attribute NUM_ROW specifies the relative line number where the parameter will appears on the component's Settings Pane. Remember that there are two setting panes, Basic Settings and Advanced Settings. If you have two parameters with NUM_ROW of 1 and 2, they will occupy the first two lines of the Setting Pane. If the values are 10 and 15, they will still ocupy the first two lines. The reason that many components have an incremental jump is to allow new paramters to be easily inserted. You may have multiple parameters on the same line.
@SHOWNoPARAMETERThis boolean attribute determines if the parameter should be available for input. This is unconditional. The parameter is either available, or not.
@SHOW_IFNoPARAMETERThe SHOW_IF attribute allows you to control if the parameter should be available for input. The result of SHOW_IF must return a boolean value. If true is returned, then the parameter will be shown. If a parameter is not shown, it will still retain any value that has been set. Example - SHOW_IF="USESTREAM=='false'".
@NOT_SHOW_IFNoPARAMETERThe NOT_SHOW_IF attribute performs in a similar manner to SHOW_IF except that the test is reversed.
@REQUIREDNoPARAMETERThis boolean attribute specifies is the attribute is required. If this attributes is not specified, the value is defaulted to false. TBC
@GROUPNoPARAMETERThis string attribute allows you to group parameters. Graphicallly, parameters within the same GROUP appear in a highlighted panel, together with a group name. In the case of Radio Buttons, this also controls these buttons as a group i.e. only a single button within the group may be selected. A good example of grouping is the tLoop component. This component supports both a For loop and a While loop. The loop type is determined by Radio Buttons that belong to the GROUP=LOOPTYPE.
@PROPERTY_VALUENoPARAMETERThe value for this parameter may come from either the Repository or it may be Built-In. This attribute specifieds a repository value, for example, REPOSITORY_VALUE="FILE_PATH". This attribute is used in conjunction with a parameter that specifies FIELD="PROPERTY_TYPE"; which is described in the following table. PROPERTY_TYPE must also specify a PROPERTY_VALUE to identify the Metadata type. A common setting would be PROPERTY_VALUE="DELIMITED" indicating a Delimited File.
@CONTEXTNoPARAMETERThis attribute is used by some of the Custom Code components and appears to indicate the JETET template file that is associated with the parameter, begin, main or end. Beyond this, I have not been able to determine the purpose of this attribute.
DEFAULTNoPARAMETERThe Element allows you to specify a default value for a parameter. A useful defaults - __COMP_DEFAULT_FILE_DIR__.

The following code-fragment shows a typical PARAMETERS and ADVANCED_PARAMETERS. This example has been taken from the tJava component.

<PARAMETERS>
	<PARAMETER NAME="CODE" FIELD="MEMO_JAVA" REQUIRED="false" NUM_ROW="2" NB_LINES="9" CONTEXT="begin">
		<DEFAULT>String foo = "bar";</DEFAULT>
	</PARAMETER>
</PARAMETERS>

<ADVANCED_PARAMETERS>
	<PARAMETER NAME="IMPORT" FIELD="MEMO_IMPORT" REQUIRED="false" NUM_ROW="1" NB_LINES="3">
		<DEFAULT>//import java.util.List;</DEFAULT>
	</PARAMETER>
</ADVANCED_PARAMETERS>

PARAMETERS and ADVANCED_PARAMETERS Elements and Attributes - PARAMETER FIELD Values

The following table describes the permitted values for the FIELD attribute.

ValueDescription
PROPERTY_TYPEMany components allow you to have parameters specified as either coming from from the Repository, or as Built-In to the component itself. This is often used for parameters such as database connections where you can define your database connection within the repository and then use this repository connection directly within your Job (I would never recommend doing this with database connections). This value is used in conjunction with the attribute REPOSITORY_VALUE.
TEXTThis is the simplest FIELD type, allowing the parameter to store a simple sting value.
MEMOSimilar in usage to TEXT, this FIELD type provides a multi-line input. The number of lines are specified by the attribute NB_LINES, for example, NB_LINES="9". A numner of styled MEMO fields are also available, for example, MEMO_JAVA. These styled memo fields are for providing features such as syntax-highlighting and auto-completion, if available.
MEMO_IMPORTSimilar in usage to MEMO, this FIELD type provides a multi-line input that is styled for Java import code.
MEMO_JAVASimilar in usage to MEMO, this FIELD type provides a multi-line input that is styled for Java.
MEMO_SQLSimilar in usage to MEMO, this FIELD type provides a multi-line input that is styled for SQL.
MEMO_MESSAGESimilar in usage to MEMO, this FIELD type provides a multi-line input that is styled for Email messages. This fields is only used by the standard component tSendMail.
RADIORadio buttons provide a boolean value and are used within a group. It makes no sense to have a single radio button. If you need to have multiple groups of radio buttons, then you can group these by using the attribute PARAMETER.GROUP. For more information on this attribute, refer to GROUP in the table above. If you do not need to group your boolean values, use a checkbox (CHECK) as detailed below.
CHECKCheckbox provides a boolean value. Unlike radio buttons (RADIO) show above, checkboxes do not need to be used as part of a group and simply toggle a boolean value on and off.
LABELThis string field allows you to specify a label that should be displayed. Usually, this is to provide a help message for another parameter. The message is specified in the DEFAULT element. An example of this can be seen with the tFileInputDelimited component, where some text is displayed prior to the File name/Stream parameter.
COLORThis field provides a color picker. You can specify a default color, for example <DEFAULT>"FF0000"<DEFAULT>.
DATEThis field provides a date picker. You can specify a default date, for example <DEFAULT>"1970-01-01 00:00:00"<DEFAULT>.
AS400_CHECKTBC
CLOSED_LISTTBC
CODETBC
COLUMN_LISTTBC
COMMANDTBC
COMPONENT_LISTTBC
CONNECTION_LISTTBC
CONTEXT_PARAM_NAME_LISTTBC
DBTABLETBC
DIRECTORYTBC
ENCODING_TYPETBC
EXTERNALTBC
FIELDTBC
FILETBC
FLOATTBC
GUESS_SCHEMATBC
HADOOP_JARS_DIALOGTBC
IMAGETBC
INTEGERTBC
LOOKUP_COLUMN_LISTTBC
MAPPING_TYPETBC
MODULE_LISTTBC
OPENED_LISTTBC
PASSWORDTBC
PREV_COLUMN_LISTTBC
PROCESS_TYPETBC
QUERYSTORE_TYPETBC
SCHEMA_TYPETBC
SCHEMA_XPATH_QUERYSTBC
STRINGTBC
StringTBC
TABLETBC
TDPIDTBC
TNS_EDITORTBC
WSDL2JAVATBC




Expert Consultancy from Yellow Pelican
comments powered by Disqus

© www.TalendByExample.com