A site about Talend
Data Integration is all about moving data around and a key aspect of this is the validation of your input data.
As part of the Data Integration product, Talend anready provides some basic data validation routines and we've already looked at certain aspects of data validation in the articles null Handling, String Handling and Date Handling. We've also looked at how we can implement validation with third-party libraries in out article Google's Phone Number Handling Library and Talend Routines Tutorial.
In this article, we'll review our overall data validation strategy, using some of the techniques that we learned in these other tutorials.
A lot of our data validation will be very basic, often consisting of a simple range-check. Ofthen, you will simply add these checks to a tMap mapping.
This example tests to see if the length of a String is exactly 10 characters. If it is, then the String is returned, otherwise, the String Error
is returned.
row1.myString == null ? "Error" : row1.myString.length() == 10 ? row1.myString : "Error"
This example tests to see if the range of an integer is between 1 and 12. If it is, then the integer is returned, otherwise, the value -1
is returned, indicating an error.
row1.myInt < 1 || row1.myInt > 12 ? -1 : row1.myInt
Many questions are asked on the forums about advanced validation. A good example of this is "How do I validate an Email Address?". This question is asked many times whether it's in relation to Talend (Java) or any other programming language.
In the case of the Email Address question, you may have a go at writing your own test "Does the string contain the @
character?" or Google the answer to find any number of suggested Regular Expressions that may or may not almost give you the right answer.