Expert Consultancy from Yellow Pelican

Talend Data Generation

A site about Talend

Talend Data Generation

Talend provides the tRowGenerator component, for generating data. This component allows you to specify an arbitrary number of rows that should be generated, define a Schema, and then assign values to the columns that have been defined. Usually, random values are assigned, using the methods provided by Repository->Code->Routines->System->TalendDataGeneration; however, you may assign data using Routines of your choice.

My preference is to use tRowGenerator for row generation and, perhaps, the assignment of a Primary Key; but to map the remainder of my data using one or more tMap components. This offers maximum flexibility.

The methods provided by Talend's TalendDataGeneration Routines will give you a good start to your data generation needs; however, you may find them limiting. As part of this tutorial, I have built a set of Routines, TBEDataGeneration, which offer a greater breadth of data (for Address and Person). I will add to these, from time to time. These routines have a UK slant; however, you may modify these to suit your own needs.

Download LibTBEDataGenerator

You may download LibTBEDataGenerator and import it in to a Talend Project of your choice. This tutorial supports Talend 5.2.2 and above.

Download

LibTBEAddressGenerator

LibTBEAddressGenerator is a reusable Job that can be added to any of your task-specific Jobs, for creating Pseudo Random data. An example Job, PersonGeneratorExample is included as part of this download, and shows how to use LibTBEPersonGenerator.

LibTBEPersonGenerator

LibTBEPersonGenerator is a reusable Job that can be added to any of your task-specific Jobs, for creating Pseudo Random data. An example Job, PersonGeneratorExample is included as part of this download, and shows how to use LibTBEPersonGenerator.

LibTBEPersonGenerator makes use of the included routine TBEDataGenerator, for generating the data items shown below.

Image 1

Sample Data

The following screenshot shows sample data that has been generated using LibTBEPersonGenerator.

Image 2

Usage

As can be seen from the screenshot below, you can add LibTBEPersonGenerator to any Job using the tRunJob component. Hit the Copy Child Job Schema button and connect to the component of your choice.

The numberOfRows parameter allows you to specify how many rows should be generated.

In this example, output is displayed to the console; however, you may connect this component to any other component that can receive data, including writing to an output file or to a database. If you are writing to the console, keep the numberOfRows low as writing larger quantities of data to the console may make the Talend Designer unresponsive.

Image 3

Passwords & Encryption

This tutorial is not intended to be a lesson on password management and encryption. The techniques shown here are not intended to suggest how these should be managed within a production environment.

Password

Password is shown in plain-text for demonstration only. Passwords should always be Salted and Hashed, and never stored in plain-text.

Derivations of the word Monkey have been used for generating passwords. Monkey was chosen as, following hacks on large Web Sites, it has been found to be the most common pasword; so, if you're thinking it looks familiar, now may be the time to change your own passwords. For more on passwords, read Schneier on Security.

Password Salt

The password Salt is a unique string that has been generated for each row. It is appended to the password before Hashing. This ensures that two identical passwords do not hash to the same value, protecting against attacks such as brute-force attacks using Rainbow Tables.

Hashed Password

The hashed password shows the result of hashing the password and salt. This has been hashed using SHA-256.

Email Address

Email address is set to a length of 254, as this is the maximum permitted length.

Conclusion

Talend and Java make it quick easy to generate Pseudo Random data, usually for testing purposes. This data can be high in volume and with good Entropy.

The Routine TBEDataGenerator demonstrates the use of various techniques including UUID generation and Hashing.

If you have any suggestions for the addition of data items to Person, or for new data-structures, please leave a comment.




Expert Consultancy from Yellow Pelican
comments powered by Disqus

© www.TalendByExample.com