After a year from announcing partnership and starting the collaboration project, Microsoft (SQL Server) and Hortonworks (Hadoop) have finally announced the result of this integration: Microsoft HDInsight Server and HDInsight Azure Service.
So what is HDInsight? well, it is essentially Microsoft’s Hadoop-based distribution which is built on top of the Hortonworks Data Platform. So if you download Microsoft HDInsight Server for a local installation of the Hadoop distribution, then you will end up with a local cluster with your own Hadoop Hive able to run Hadoop jobs, as well as benefit from the already released Hadoop integration points with SharePoint and EXCEL. This is just so powerful!
This release is part of Microsoft’s “End-To-End” approach to handling Big Data, in which they are trying to close all the gaps in terms of fully holistic and comprehensive approach to data warehousing, analysis, prediction, data enrichment (or augmentation) and general Business Intelligence solutions.
You could sign-up for Microsoft’s HDInsight Service on Azure, this is a 10 minute process and by the end you can have up-to 16 clusters ready to start running Hadoop map/reduce jobs, or if you want to keep things local, you can install Microsoft’s HDInsight Server on Windows Server, the instructions for doing so are simple:
- Download and install Microsoft Web Platform Installer 4.0, currently the distribution is being released through this medium.
- Once downloaded, open Microsoft Web Platform Installer and search for “Hadoop“, you should receive a result with the title “Microsoft HDInsight for Windows Server Community Technology Preview“, select this installer and click on “Install“. I had to try the installer a couple of times before it worked, probably due to the high demand for this installer.
- Once the installer completes and configures the Hadoop distribution and the user interface for managing the cluster on IIS, you should be able to browse to your local Hadoop cluster using the following address: http://localhost:8085/, if everything went ok, you should see the screen below which is familiar to anyone who managed to get on the “Hadoop on Azure” community preview that was available at the start of 2012. If the link to the Hadoop Dashboard does not work, then you can go to IIS and check-out what Port it has been assigned to, you should be able to see 2 new sites under Sites in IIS, these are: HadoopDashboard and HadoopWebAPI
And in the classic Microsoft installation manner, that is literally all it takes to get Hadoop on Windows going. It doesn’t sound like much, but if you ever tried to get Hadoop to run on a Windows machine before you will understand the struggle, “fiddliness” and instability that you had to deal with in the past to get Hadoop to run either natively or through Cygwin (both ways are not recommended for production use).
In order to navigate and use Microsoft’s HDInsight, there is a helpful albeit basic bunch of tutorials on the Azure Website that can help you get get a grip on the basics, along with the classic “Hello World” tutorial to get you started with running Pig and Hive jobs on your cluster. Also there is a 30 pages word doc jump start guide for HDInsight which can give you a bit more details and overview.
Its important to note that this release of Microsoft’s HDInsight is currently in Community Preview, so it is not recommended to use it for production.
Microsoft Big Data: Hadoop through HDInsight Server rated 4 out of 5 by 1 readers