A site about Talend
Talend provides the tRowGenerator component, for generating data. This component allows you to specify an arbitrary number of rows that should be generated, define a Schema, and then assign values to the columns that have been defined. Usually, random values are assigned, using the methods provided by Repository->Code->Routines->System->TalendDataGeneration; however, you may assign data using Routines of your choice.
My preference is to use tRowGenerator for row generation and, perhaps, the assignment of a Primary Key; but to map the remainder of my data using one or more tMap components. This offers maximum flexibility.
The methods provided by Talend's TalendDataGeneration Routines will give you a good start to your data generation needs; however, you may find them limiting. As part of this tutorial, I have built a set of Routines, TBEDataGeneration, which offer a greater breadth of data (for Address and Person). I will add to these, from time to time. These routines have a UK slant; however, you may modify these to suit your own needs.
You may download LibTBEDataGenerator and import it in to a Talend Project of your choice. This tutorial supports Talend 5.2.2 and above.
LibTBEAddressGenerator is a reusable Job that can be added to any of your task-specific Jobs, for creating Pseudo Random data. An example Job, PersonGeneratorExample is included as part of this download, and shows how to use LibTBEPersonGenerator.
LibTBEPersonGenerator is a reusable Job that can be added to any of your task-specific Jobs, for creating Pseudo Random data. An example Job, PersonGeneratorExample is included as part of this download, and shows how to use LibTBEPersonGenerator.
LibTBEPersonGenerator makes use of the included routine TBEDataGenerator, for generating the data items shown below.
The following screenshot shows sample data that has been generated using LibTBEPersonGenerator.
As can be seen from the screenshot below, you can add LibTBEPersonGenerator to any Job using the tRunJob component. Hit the Copy Child Job Schema button and connect to the component of your choice.
The numberOfRows parameter allows you to specify how many rows should be generated.
In this example, output is displayed to the console; however, you may connect this component to any other component that can receive data, including writing to an output file or to a database. If you are writing to the console, keep the numberOfRows low as writing larger quantities of data to the console may make the Talend Designer unresponsive.
This tutorial is not intended to be a lesson on password management and encryption. The techniques shown here are not intended to suggest how these should be managed within a production environment.
Derivations of the word Monkey have been used for generating passwords. Monkey was chosen as, following hacks on large Web Sites, it has been found to be the most common pasword; so, if you're thinking it looks familiar, now may be the time to change your own passwords. For more on passwords, read Schneier on Security.
The password Salt is a unique string that has been generated for each row. It is appended to the password before Hashing. This ensures that two identical passwords do not hash to the same value, protecting against attacks such as brute-force attacks using Rainbow Tables.
The hashed password shows the result of hashing the password and salt. This has been hashed using SHA-256.
Email address is set to a length of 254, as this is the maximum permitted length.
If you have any suggestions for the addition of data items to Person, or for new data-structures, please leave a comment.comments powered by Disqus