Expert Consultancy from Yellow Pelican

Talend Date Handling

A site about Talend

Date Handling

Talend Dates are represented by the Java Class Date. Dates are stored as the number of milliseconds past Epoch, regardless of your current Time Zone. Epoch is January 1, 1970, 00:00:00 GMT.

TalendDate.getCurrentDate()

Talend provides the routine Date TalendDate.getCurrentDate(), to return the current date and time. This relies on Calendar.getInstance().getTime();; which may be considered an inefficient approach. This routine is suitable for most circumstances; however, you may want to consider an alternative, if you are expecting to call this routine many times within your Job. A more efficient method is available, using, the routine TBEDate.getCurrentDate().

Displaying the Current Date and Time

The simplest way to display the current date and time is to call the toString() method. This will display the date and time in your current Time Zone, as shown below.

System.out.println(TalendDate.getCurrentDate().toString());
---
Wed Dec 04 16:04:09 GMT 2013

Lenient Date Parsing

The Java Class SimpleDateFormat allows, by default, lenient date parsing. This means that, for example, that 32-Jan-2013 would be considered to be 1-Feb-2013.

There is some debate in the Java world as to whether or not leniency should be on by default, with many suggesting that it should always be switched off as it serves no sensible benefit.

I believe that leniency may have been misunderstood during the development of the TalendDate routines, as can be seen from the following code-fragment, from the routine TalendDate.isDate.

	...
	try {
            testDate = sdf.parse(stringDate);
        } catch (ParseException e) {
            return false;
        }

        if (!sdf.format(testDate).equalsIgnoreCase(stringDate)) {
            return false;
        }
	...

In the first test try {..., the input date is tested to see if it can be parsed. If it is not, FALSE is returned; however, in this instance, 32-Jan-2013 would return TRUE as the parser is being lenient. The second test if(!sdf.format... attempts to remedy this by testing if the parsed date has the same value as the input date. In the case of 32-Jan-2013, it would not.

A more common approach to achieving this would be to make a call to SimpleDateFormat.setLenient(false), as shown below. There are often postings on the Talend Forum regarding the unexpected behaviour of bad date formats.

The following code fragment, taken from TBEDate, shows a more common approach.

        java.text.SimpleDateFormat sdf = new java.text.SimpleDateFormat(pattern);
        sdf.setLenient(false);

        try {
            sdf.parse(stringDate);
        } catch (ParseException e) {
            return false;
        }

A Simple Test of Lenient Parsing

To understand how Talend's lenient date parsing may give unexpected results, you can try the following. Create a file with the following values.

01-01-2013
20-05-2013
31-05-2013a
32-05-2013
1a-05-2013
aa-05-2013
31-12-2013

Create a simple Job with a tFileInputDelimited component, for reading your file, and define it's Schema as a single Date field with a format of dd-MM-yyyy. Connect this component to a a tLogRow component. Run this Job and you should see the results shown below.

Starting job DateTest at 10:22 05/12/2013.


[statistics] connecting to socket on port 3370
Unparseable date: "31-05-2013a"
Unparseable date: "1a-05-2013"
Unparseable date: "aa-05-2013"
[statistics] connected
.----------.
|tLogRow_1 |
|=--------=|
|newColumn |
|=--------=|
|01-01-2013|
|20-05-2013|
|01-06-2013|
|31-12-2013|
'----------'

[statistics] disconnected
Job DateTest ended at 10:22 05/12/2013. [exit code=0]

As can be seen, three of our rows have been discarded as the dates can not be parsed; however, lenient parsing allows the remapping of one of our dates 32-05-2013->01-06-2013.

We can alter this behaviour in the component tFileInputDelimited by checking the option tFileInputDelimited->Advanced settings->Check date. This causes date parsing, not to be lenient. Behind the scenes, the generated code is modified from row1.newColumn = ParserUtils.parseTo_Date(temp, "dd-MM-yyyy"); to row1.newColumn = ParserUtils.parseTo_Date(temp, "dd-MM-yyyy", false);. In this modified call, the method signature is ParserUtils.java.util.Date parseTo_Date(String s, String pattern, boolean lenient).

Personally, I would prefer to see all parsing as not being lenient or, at least, with this being the default.

Understanding this behaviour will reduce any confusion with your date handling.




Expert Consultancy from Yellow Pelican
comments powered by Disqus

© www.TalendByExample.com