Archive for category: MS SQL Server

Architectures for Running SQL Server Analysis Service (SSAS) on Data in Hadoop Hive

25 Feb
February 25, 2013

Recently I have been involved in researching and building a low-latency high-data-volume OLAP environment for a social entity and interaction analysis platform, the perfect mixture of concepts such as Big Data collection and processing,  large-scale Network Analysis, Natural Language Processing (NLP) and a highly scaled-out OLAP environment for end users to explore and discover data (essentially a Self-Service and Exploratory BI layer).

It is by all means not an easy mission to orchestrate all the technologies that back those concepts, particularly if you are interested in using the optimum solution for the problem at hand, for example Big Data might be better handled by a Hadoop layer, but Hadoop or Hive (at least on their own) are not geared up to respond to OLAP queries, which are real-time by nature, and even if they were, your end-user needs familiar tools and interfaces to analyse and study this data, which is where SQL Server Analysis Service and the whole Microsoft BI stack might come in and offer great integration with already existing business applications (such as Office or SharePoint).

This post discusses a few architectural approaches to exposing a Hadoop layer through a SQL Server Analysis Service (SSAS) interface, with references to data-latency, redundancy and over-all performance.

Read more →

Running Highcharts within SSRS (or any JS Graph Library)

22 Jan
January 22, 2013

In a previous post I described how to convert an SSRS graph into a Highcharts graph by consuming the XML output of the report from the SSRS Web Service and converting that to an input for a Highcharts graph.

That article seemed to be very popular (in fact was the top most popular for a while), so I decided to take this concept a step further, In this article I will show you how, using JavaScript injection into SSRS reports, you can display a Highcharts graph from within SSRS itself (just like any other SSRS report) when the SSRS report is rendered into HTML.

Read more →

10 Tips to Improve your Text Classification Algorithm Accuracy and Performance

21 Jan
January 21, 2013

In this article I discuss some methods you could adopt to improve the accuracy of your text classifier, I’ve taken a generalized approach so the recommendations here should really apply for most text classification problem you are dealing with, be it Sentiment Analysis, Topic Classification or any text based classifier. This is by no means a comprehensive list, but it should provide a nice introduction into the subject of text classification algorithm optimisation.

Read more →

Text Classification Threshold Performance Graph

20 Jan
January 20, 2013

One way to increase the accuracy of a classification algorithm is to allow the algorithm to return an “Unknown” value, particularly when the probability of what we are trying to classify is too low to simply belong in one class and the algorithm is essentially guessing an answer, leading to incorrect classification.

In this post I will try and explore a method for researching and implementing the “Unknown” result in your classifier based on the probability distribution results of a classification, the idea is to give you the tools to tweak the optimum thresholds that gives you the best accuracy, while maintaining acceptable level of over-all coverage of data.

Read more →

SQL Saturday in Edinburgh on June 2013

20 Jan
January 20, 2013

SQL Saturday is finally coming to Scotland with a session scheduled for Edinburgh in June  2013.

Pretty damn exciting if you’re into the whole SQL Server scene, and considering I live in Edinburgh you bet I’ll be there (which might sway your decision not to go). Last year I missed SQL Saturday in Dublin because I couldn’t sort out travel arrangements in time, which was such a disappointment considering everyone else on my team ended up going.

Read more →

Testing & Diagnosing a Text Classification Algorithm

19 Jan
January 19, 2013

To get something going with text (or any) classification algorithm is easy enough, all you need is an algorithm, such as Maximum Entropy or Naive Bayes, an implementation of each is available in many different flavors across various programming languages (I use NLTK on Python for text classification), and a bunch of already classified corpus data to train your algorithm on and that is it, you got yourself a basic classifier.

But the story rarely ends here, and to get any decent production-level performance or accuracy out of your classification algorithm, you’ll need to iteratively test your algorithm for optimum configuration, understand how different classes interact with each other, and diagnose any abnormality or irregularity you’re algorithm is experiencing.

In this post I hope to cover some basic mathematical tools for diagnosing and testing a classification algorithm, I will be taking a real life algorithm that I have worked as an example, and explore the various techniques we used to better understand how well it is performing, and when it is not performing, what is the underlying characteristic of this failure.

Read more →

Generic Trend Classification Engine using Pearson Correlation Coefficient

16 Dec
December 16, 2012

Trend analysis in my experience is generally done through manual (human) review and exploration of data through various BI tools, these tools do a great job by visually highlighting data that can be of interest to the data analyst, and when coupled with data-mining techniques such as clustering and forecasting, it gives us invaluable and actionable information that can help us further explore and exploit the business or data model at hand. As far as I can tell, the name of the game these days is “exploratory data analysis and mining”, at least in terms of Business Intelligence products on the market and the direction they are taking.

Read more →

SQL Server Grant Execute Permissions on Stored Procedures

09 Dec
December 9, 2012

There are a few ways you could grant a user execution permission on stored procedures, through assigning permissions on different object hierarchies (objects/schema/database) you can control the level of permissions to ensure optimum security and flexibility.

This post will go through how to grant SQL Server execution permissions on individual stored procedure objects within a database, how to grant execution permission on all object within a schema (including functions), and finally how to grant execution permission across the whole database.

Read more →

Querying the Full-Text Index in SQL Server

05 Dec
December 5, 2012

SQL Server provides Full-Text search capabilities through it’s Full-Text Index, a mature document search tool with neat features like thesaurus and stop-word integration as well as some semantic search and keyword extraction features in SQL Server 2012.

The Full-Text Index is used through 2 (scalar) functions CONTAINS and FREETEXT, and 2 (table-valued) functions CONTAINSTABLE and FREETEXTTABLE. In this post I will be briefly exploring the difference between each of those functions.

Read more →

SQL Server Locking Control and Transaction Isolation Levels

03 Dec
December 3, 2012

SQL Server uses two methods to ensure transactional consistency and protects the data that is being accessed, these are Locks and Row-Versioning, these methods ensure that you manage your data concurrency effectively by specifying the level of access other transactions have to the data being processed, the game here is to balance either resources or data integrity against concurrency.

Read more →