A site about Talend
There are two basic methods for running SubJobs in parallel: -
The tParallelize component is only available in the Enterprise Edition of Talend and is outside the scope of this documentation. You can achieve all of the functionality of tParallelize by Multi-threading you Jobs.
Before rushing in and setting up all of your Jobs to run in parallel, there are several considerations that you should make. It is better to have a well-architected Job, that to simply run everything in parallel.
Of course, you should always design your Job to be as efficient as practicable; however, you should have a performance goal set, before looking at your tuning opportunities. You should also consider the cost of development.
If you're going to run two SubJobs in parallel, then you need to consider the dependencies between these two SubJobs. There may also be subsequent SubJobs that are dependent on the completion of both of these two SubJobs.
While ensuring that you get the maximum throughput possible for your Job, you also need to consider the effect that this may have on other resource. This includes; but is not limited to: -
If, for example, your reading from an OLTP database to populate your Data Warehouse, you won't be very popular if you fire off 20 bulk-extracts in parallel. It is also true that over-parallelization will have a negative impact on your overall throughput, as resources hit their limits.
The following screenshots shows a simple Job, with two SubJobs executing in parallel. Each of these SubJobs are tRunJob components, whose child Jobs announce their commencement and ending whilst sleep for a period of time. You'll see from the console output that these two SubJobs executed in parallel, with ParallelExampleJS2 completing first.
Note the setting of Multi thread execution is TRUE.
As well as setting the parent Job's Multi thread execution property to TRUE, you'll notice that the two SubJobs are not connected. Usually, for non-parallel Jobs, these would be connected using the OnSubJobOk trigger. Not only does this ensure that the second SubJob only executes on the successful completion of the first, but it also determines the execution order. For a non-parallel Job, SubJobs usually execute in the order that they were added; however, it is good pracrive to always connect them.
This third SubJob can only have a single OnSubJobOk connector attached, so we cannot easily use some simple triggers to do this.
The solution that I use is to wrap-up any SubJobs, that should run in parallel, in to a new Parallel Group Job, set this Job's Multi thread execution property to TRUE and then have the new dependency against this. This solution both achieves the desired result and provides clarity in the Job design.
The following screenshots show this amended design.
Note the setting of Multi thread execution is FALSE. It is now the Job ParallelExecutionGroup1 that is Multi-threaded.
As can be seen from the above screenshots, our first two SubJobs have been encapsulated in a new SubJob, ParallelExampleGroup1, and this SubJob has the dependency with our new SubJob, ParallelExampleSJ3.
In creating this second example, I simply renamed ParallelExample to ParallelExampleGroup1, and then recreated ParallelExample and then added the two dependent SubJobs.comments powered by Disqus