Expert Consultancy from Yellow Pelican

Returning Values from a Subjob

A site about Talend

How to Return Values from a Subjob in Talend

A commonly asked Talend question is "How do you return a value from a Subjob?"

The answer to this question depends on what you're trying to achieve and what you mean by "return a value". To be more accurate, the questioner is usually asking "How do you return a value from a Child Job?" (that has been executed using a tRunJob component).

There are, potentially, four answers to this question. The first three, I have addressed in other articles. The third (and I believe that this is the question that is usually being asked (and is shrouded in the most mystery) is "How do you return a value from a child Job, using Context variables?"

Let's start off by summarising the first three answers. If one of these addresses your question, then you can move on to the correct article, otherwise, read on for an explanation of the fourth way.

Default tRunJob Handling

The default, for tRunJob, is to Die on child error (see Basic settings). Alternatively, you may choose to not Die on child error. On completion, tRunJob provides certain information, by way of globalMap, as follows: -

  • Error Message - ERROR_MESSAGE (After)
  • Child return code - CHILD_RETURN_CODE (After)
  • Child exception stack trace - CHILD_EXCEPTION_STACKTRACE (After)

If you're simply looking to perform Exception handling for your child Job, then these options and values may meet your needs.

Catching an Exception

Sometimes, you may find the Default tRunJob Handling, as described above, insufficient. In these instances, you may want to catch the child Jobs Exception and handle it yourself, possibly re-executing the Job. In these instances, you could opt to use a tJavaFlex component, to catch the Exception. This article provides a working example.

tBufferOutput

In the article Reusable Context Load Job, we looked at how the tBufferOutput component can be used to return data from a child Job to its parent. This is a great way to return row data. It is not the most appropriate way for returning control or status data.

Returning Values through Context

If you've read this far, then it looks like you want to know how, to return values from your child Job, using Context.

Returning values using Context goes hand in hand with transmitting Context to a child Job. There's plenty of documentation on transmitting Context and, if you've been using Talend for a while, you'll be getting to grips with this.

Returning values using context requires a little more understanding of Java. It's worth learning this now as it will help with many aspects of your Talend development.

If you're not already familiar with how to pass context to your child Job, using the tRunJob component, read our article on the tRunJob component, now.

Parameter Passing in Java

The key to understanding Talend parameter passing, is to understand how Java handles parameter passing. When you pass your Context variables to a child Job (or return values back), you are using the Java parameter passing mechanism.

Here are a couple of great articles that explain how Java handles variables, references, pointers and objects.

Parameter Passing in Java
Pass by Value Please

Pass by Value

Now you've read these articles, you'll understand that Java Passes by Value. This means that when you transmit your Context variables to your child Job, you can't simply reassign the value, for the parent Job to then pick it up.

Mutable and Immutable Objects

Objects in Java are either Mutable or Immutable, and there are good reasons for this. In a nutshell, If you pass a reference to an Immutable Object, Its state cannot be modified by the receiving method. Java simply does not allow it.

How does this affect Talend?

Let's take a typical use case. You have a child Job that processes a number of rows and you want to return the total number of rows processed back to the parent Job. It would be nice if you could pass an Integer variable to the child Job and for the child Job to assign a value that the parent can then read. It can't be done. Integer is Immutable. No mechanism is provided by Java to alter the original value - the child Job only has a copy of the reference to this Object.

The solution is to pass a reference to a Mutable Object than can be manipulated by the child Job.

Parameter Passing Example

Now that we have an understanding of Java parameter passing, we can put this in to practice. If you're new to Java, this may sound complex; but it isn't really. It just takes a little more effort than it first seems.

In this example, we're going to use the techniques described, to return the number of rows processed, from a child Job to its parent; and do it in a safe manner.

Passing Back Our Return Value

As previously discussed, we will not be able to pass our return value back, using an Immutable Object. This precludes the use of any of the primitive type wrappers such as Integer. We also want to do this in a Thread Safe manner. We will also need to consider the Scope of our variable.

Variable Scope

In our parent Job, at least, Scope will be an issue. We will need to store our Object where we will have global Scope. Our two options are Context or globalMap. My preference is to reserve Context for parameter passing (receiving parameters). In the case of our parent Job, this value will never be passed to the Job as a parameter, so globalMap is the natural choice.

A Thread Safe, Mutable Object

We now need to choose a Thread Safe and Mutable Object. Having given it much consideration, I have chosen to use ConcurrentHashMap, as a single Object that can be passed to all child Jobs, and can accept whatever parameters we choose to add. This will provide similar functionality to globalMap; except that globalMap is not Thread Safe.

If you expect to run aspects of your job in Parallel, then you should do this in a Thread Safe manner.

ReturnValueExample (Parent Job)

The following screenshot shows our parent Job, consisting of three components.

Image 1


tJava_1

Component tJava_1 is where we will instantiate ConcurrentHashMap, and we'll put a reference to it on globalMap. We will name this sharedMap.

As sharedMap is to be general purpose, it is defined with the default key-value pair of String, Object. This means that when we get a value, it will usually need to be Cast to the correct type. This behaviour is in line with globalMap.

// ReturnValueExample.tJava_1
globalMap.put("sharedMap", new java.util.concurrent.ConcurrentHashMap());


ReturnValueExampleChildJob (tRunJob_1)

Our child Job has a single Context variable, named sharedMap, and is of type Object. We will pass a reference to our ConcurrentHashMap Object, as shown below.

Image 2


tJava_2

Component tJava_2 is where we will retrieve the value that has been added by our child Job and, in the case of this example, simply display the value to the console.

As can be seen from the following code, the value returned from our child Job, will be written to the console. We need to access sharedMap through the reference that we have placed on globalMap, and Cast it to the correct type ((java.util.Map) globalMap.get("sharedMap")), and we need to Cast the value to an Integer.

// ReturnValueExample.tJava_2
System.out.println("ReturnValueExampleChildJob_NB_LINE=" + ((Integer) ((java.util.Map) globalMap.get("sharedMap")).get("ReturnValueExampleChildJob_NB_LINE")));

ReturnValueExampleChildJob (Child Job)

The following screenshot shows our child Job, consisting of three components. We do not need to spend too much time looking at the overall Job. This Job uses tFixedFlowInput_1 to generate five arbitrary rows, and then writes these rows to a file, using the component tFileOutputDelimited_1. The important part of this Job is the component tJava_1.

Image 3


tJava_1

Component tJava_1 is where we will assign the value that is to be returned to our parent Job and, in the case of this example, it will be the number of rows written by the component tFileOutputDelimited_1.

As can be seen from the following code, we need to Cast context.sharedMap, as it has been passed as type Object ((java.util.Map) context.sharedMap). The Map Key is a name of our choice, clearly indicating its purpose. The value to be returned, tFileOutputDelimited_1_NB_LINE, is the value that Talend has placed on globalMap.

// ReturnValueExampleChildJob.tJava_1
((java.util.Map) context.sharedMap).put("ReturnValueExampleChildJob_NB_LINE", globalMap.get("tFileOutputDelimited_1_NB_LINE"));

Executing the Job ReturnValueExample

Thw following, is the result of executing our parent Job.

Starting job ReturnValueExample at 18:03 11/09/2013.


[statistics] connecting to socket on port 3420
[statistics] connected
ReturnValueExampleChildJob_NB_LINE=5
[statistics] disconnected
Job ReturnValueExample ended at 18:03 11/09/2013. [exit code=0]

Conclusion

This article shows how a ConcurrentHashMap can be used to pass values to, and from child Jobs in a consistent and Thread Safe manner, using the Map interface.

Hopefully, this article has explained the principles behind this, rather than simply providing a prescriptive list of actions to achive this goal.

Although ConcurrentHashMap is Thread Safe, if you are running multi-threaded Jobs and updating the same Object (that has been added to sharedMap), you may get unpredictable results, as your get/put may not be part of an atomic transaction. This is outside the scope of this document and is not expected to be an issue for typical Talend development.

As there is a need to repeatedly Cast Objects, this may, at first glance, appear more complex that it is. If Talend was to support a Shared Map, then it would be a much more transparent process.

Finally, I do not find the need to return values in this way, in my day-to-day programming. It is possibly the exception to the rule.




Expert Consultancy from Yellow Pelican
comments powered by Disqus

© www.TalendByExample.com