Comments on: SSIS vs Hadoop – a Mapping Performance Showdown

By: Links Naji

Links Naji — Tue, 19 Nov 2013 23:58:44 +0000

Hey Bruno,

Thanks for your comments and your suggestions.

I wasn’t intentionally swaying the votes towards SSIS, this was actually based on a real dataset that I needed to process, essentially social media data (tweets, facebook posts, etc.), so its fair to say that in this particular real-world scenario, and using the technologies out of the box (with minimal configuration), SSIS has managed to outperform Hadoop by a considerable amount.

I think in a scenario where you are trying to reduce data across multiple nodes, then Hadoop is definitely the weapon of choice, since SSIS does not have a native way of grouping results across multiple instances of the service, also, if you are trying to “scale gracefully”, then again Hadoop will win hands down.

That being said, SSIS has a much healthier start time, and seems to executes mapping operations rapidly, additionally, I believe that reducing data on one instance of SSIS will be faster than on one instance of Hadoop.

But then again, with Hadoop 2 being released, and the updated Map/Reduce engine, this experiment could be considered outdated.

Regarding your suggested scenario, its definitely a good idea to give them a shout, and if I had some free time I might give it a go!

Cheers!

By: Bruno Wing

Bruno Wing — Fri, 15 Nov 2013 03:29:20 +0000

Your 2 scenarios was clearly build to let SSIS win… you have one with tiny amount of data (scenario 1) and the other one with no data (scenario 2).
Is you use SSIS to load this kind of amount of data, Hadoop is not design to work with this type of volume.
What about a test more funny like :
– one file of 10 millions records
– 100 files of 2 millions records

then at this time you could raise some good conclusion.

By: Is Hadoop the right tool for the job? | picnicerror.net

Is Hadoop the right tool for the job? | picnicerror.net — Sat, 09 Mar 2013 08:59:05 +0000

[…] won’t go into detail on this, as Links has already written up the results over on thinknook.com, but running our Map function on a single SSIS instance performed significantly better in each test […]