Tag Archive for: parallel processing

SSIS vs Hadoop – a Mapping Performance Showdown

28 Feb
February 28, 2013

On one corner we have Hadoop, a massively distributed JVM-based data processing engine with a Map & Reduce API and a proven track record in handling huge data-sets. On the other corner we have SSIS, a natively non-distributed ETL engine part of the SQL Server family tool-set with .NET code¬†extensibility¬†features and a drag and drop UI (for the most part anyway). Two sweet technologies, probably shouldn’t be compared to each other but we’re doing it anyway, pitted head to head against a data mapping task to the death (or at least to the recycling of my test VMs)… Now FIGHT!

Read more →