<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>Comments on: SSIS vs Hadoop &#8211; a Mapping Performance Showdown</title>
	<atom:link href="http://thinknook.com/ssis-vs-hadoop-a-mapping-performance-showdown-2013-02-28/feed/" rel="self" type="application/rss+xml" />
	<link>http://thinknook.com/ssis-vs-hadoop-a-mapping-performance-showdown-2013-02-28/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ssis-vs-hadoop-a-mapping-performance-showdown</link>
	<description>Because the world needs another Business Intelligence blog!</description>
	<lastBuildDate>Fri, 14 Sep 2018 19:30:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.41</generator>
	<item>
		<title>By: Links Naji</title>
		<link>http://thinknook.com/ssis-vs-hadoop-a-mapping-performance-showdown-2013-02-28/#comment-3121</link>
		<dc:creator><![CDATA[Links Naji]]></dc:creator>
		<pubDate>Tue, 19 Nov 2013 23:58:44 +0000</pubDate>
		<guid isPermaLink="false">http://thinknook.com/?p=1015#comment-3121</guid>
		<description><![CDATA[Hey Bruno,

Thanks for your comments and your suggestions.

I wasn&#039;t intentionally swaying the votes towards SSIS, this was actually based on a real dataset that I needed to process, essentially social media data (tweets, facebook posts, etc.), so its fair to say that in this particular real-world scenario, and using the technologies out of the box (with minimal configuration), SSIS has managed to outperform Hadoop by a considerable amount.

I think in a scenario where you are trying to &lt;em&gt;reduce data across multiple nodes&lt;/em&gt;, then Hadoop is definitely the weapon of choice, since SSIS does not have a native way of grouping results across multiple instances of the service, also, if you are trying to &quot;scale gracefully&quot;, then again Hadoop will win hands down.

That being said, SSIS has a much healthier start time, and seems to executes &lt;em&gt;mapping &lt;/em&gt;operations rapidly, additionally, I believe that &lt;em&gt;reducing &lt;/em&gt;data on one instance of SSIS will be faster than on one instance of Hadoop.

But then again, with Hadoop 2 being released, and the updated Map/Reduce engine, this experiment could be considered outdated.

Regarding your suggested scenario, its definitely a good idea to give them a shout, and if I had some free time I might give it a go!

Cheers!]]></description>
		<content:encoded><![CDATA[<p>Hey Bruno,</p>
<p>Thanks for your comments and your suggestions.</p>
<p>I wasn&#8217;t intentionally swaying the votes towards SSIS, this was actually based on a real dataset that I needed to process, essentially social media data (tweets, facebook posts, etc.), so its fair to say that in this particular real-world scenario, and using the technologies out of the box (with minimal configuration), SSIS has managed to outperform Hadoop by a considerable amount.</p>
<p>I think in a scenario where you are trying to <em>reduce data across multiple nodes</em>, then Hadoop is definitely the weapon of choice, since SSIS does not have a native way of grouping results across multiple instances of the service, also, if you are trying to &#8220;scale gracefully&#8221;, then again Hadoop will win hands down.</p>
<p>That being said, SSIS has a much healthier start time, and seems to executes <em>mapping </em>operations rapidly, additionally, I believe that <em>reducing </em>data on one instance of SSIS will be faster than on one instance of Hadoop.</p>
<p>But then again, with Hadoop 2 being released, and the updated Map/Reduce engine, this experiment could be considered outdated.</p>
<p>Regarding your suggested scenario, its definitely a good idea to give them a shout, and if I had some free time I might give it a go!</p>
<p>Cheers!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bruno Wing</title>
		<link>http://thinknook.com/ssis-vs-hadoop-a-mapping-performance-showdown-2013-02-28/#comment-3108</link>
		<dc:creator><![CDATA[Bruno Wing]]></dc:creator>
		<pubDate>Fri, 15 Nov 2013 03:29:20 +0000</pubDate>
		<guid isPermaLink="false">http://thinknook.com/?p=1015#comment-3108</guid>
		<description><![CDATA[Your 2 scenarios was clearly build to let SSIS win... you have one with tiny amount of data (scenario 1) and the other one with no data (scenario 2).
Is you use SSIS to load this kind of amount of data, Hadoop is not design to work with this type of volume.
What about a test more funny like :
  - one file of 10 millions records
  - 100 files of 2 millions records

then at this time you could raise some good conclusion.]]></description>
		<content:encoded><![CDATA[<p>Your 2 scenarios was clearly build to let SSIS win&#8230; you have one with tiny amount of data (scenario 1) and the other one with no data (scenario 2).<br />
Is you use SSIS to load this kind of amount of data, Hadoop is not design to work with this type of volume.<br />
What about a test more funny like :<br />
  &#8211; one file of 10 millions records<br />
  &#8211; 100 files of 2 millions records</p>
<p>then at this time you could raise some good conclusion.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Is Hadoop the right tool for the job? &#124; picnicerror.net</title>
		<link>http://thinknook.com/ssis-vs-hadoop-a-mapping-performance-showdown-2013-02-28/#comment-1550</link>
		<dc:creator><![CDATA[Is Hadoop the right tool for the job? &#124; picnicerror.net]]></dc:creator>
		<pubDate>Sat, 09 Mar 2013 08:59:05 +0000</pubDate>
		<guid isPermaLink="false">http://thinknook.com/?p=1015#comment-1550</guid>
		<description><![CDATA[[...] won&#8217;t go into detail on this, as Links has already written up the results over on thinknook.com, but running our Map function on a single SSIS instance performed significantly better in each test [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] won&#8217;t go into detail on this, as Links has already written up the results over on thinknook.com, but running our Map function on a single SSIS instance performed significantly better in each test [&#8230;]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.w3-edge.com/products/


Served from: thinknook.com @ 2026-04-15 01:00:24 by W3 Total Cache
-->