Tag Archive for: spark

Architectures for Running SQL Server Analysis Service (SSAS) on Data in Hadoop Hive

25 Feb
February 25, 2013

Recently I have been involved in researching and building a low-latency high-data-volume OLAP environment for a social entity and interaction analysis platform, the perfect mixture of concepts such as Big Data collection and processing,  large-scale Network Analysis, Natural Language Processing (NLP) and a highly scaled-out OLAP environment for end users to explore and discover data (essentially a Self-Service and Exploratory BI layer).

It is by all means not an easy mission to orchestrate all the technologies that back those concepts, particularly if you are interested in using the optimum solution for the problem at hand, for example Big Data might be better handled by a Hadoop layer, but Hadoop or Hive (at least on their own) are not geared up to respond to OLAP queries, which are real-time by nature, and even if they were, your end-user needs familiar tools and interfaces to analyse and study this data, which is where SQL Server Analysis Service and the whole Microsoft BI stack might come in and offer great integration with already existing business applications (such as Office or SharePoint).

This post discusses a few architectural approaches to exposing a Hadoop layer through a SQL Server Analysis Service (SSAS) interface, with references to data-latency, redundancy and over-all performance.

Read more →