A site about Talend
I read an article recently, comparing the merits of using a tJoin or tMap component for joining data. On the face of it, it seemed a foregone conclusion that the tMap component is the only way to go as it has far more functionality than tJoin.
The only comparison that was not fully addressed, was the speed of execution of each component.
To test the speed of execution of these two components, I created two text files, each containing an identical set of 5,000,000 sequential Integer keys.
The options available to tJoin are fairly limited; however, I tried various options with tMap to see if I could get it to execute faster than it would with the default options. If I could achieve any benefit for this test case, it was marginal and with some options, I managed to slow tMap down greatly.
Without recording any emperical evidence (I ran these tests many times), it seems that tJoin is marginally faster; however, any speed gain seems irrelevant to anything other than the most demanding environment.
Of course, in all but the most trivial of our tasks, we usually want to do some form of manipulation of our data and this often involves the inclusion of a tMap.
In the following two screenshots, I've now introduced a trivial mapping of a String value. As can be seen from the throughput, tMap on its own wins hands-down for both speed and simplicity, and I see little use for tJoin in my day-to-day Talend work.
If you know any use-cases where tJoin is the winner, please feel free to comment.
comments powered by Disqus