Archive for month: September, 2012

Import Dmoz Content through C# to SQL Server

24 Sep
September 24, 2012

Dmoz (the Open Directory Project) has a wealth of data in relation to websites, as well as a comprehensive list of categories, this has been established through years of maintaining the directory (before and after being bought by Google), and being one of the most “sought after” real-estate in terms of link building.

Recently I came across Dmoz data through a classification research project I was working on, essentially we had a Naive Bayes classifier which we were trying to use to classify companies (through a description snippet) into categories, and then extract which other competitors of this company exist within the same category… Simples!

In order to import the Dmoz data into SQL Server, I resorted to using the Dmoz Data Importer solution by bodzebod, which although very good and did the job well for the Dmoz Structure files, which contains all the category classifications, bodzebod has not yet implemented the import of the Dmoz content file, which actually contains the data. This post presents a solution to importing Dmoz content file into a SQL Server database through C#, building on the work of bodzebod.

Read more →

Twitter Sentiment Analysis Training Corpus (Dataset)

22 Sep
September 22, 2012

An essential part of creating a Sentiment Analysis algorithm (or any Data Mining algorithm for that matter) is to have a comprehensive dataset or corpus to learn from, as well as a test dataset to ensure that the accuracy of your algorithm meets the standards you expect. This will also allow you to tweak your algorithm and deduce better (or more precise) features of natural language that you could extract from the text that contribute towards stronger sentiment classification, rather than using a generic “word bag” approach.

This post will contain a corpus of already classified tweets in terms of sentiment, this Twitter sentiment dataset is by no means diverse and should not be used in a final product for sentiment analysis, at least not without diluting the dataset with a much more diverse one.

Read more →

Kill a Session, SPID or Connection to Analysis Service Cube

21 Sep
September 21, 2012

A very common requirement when administering an SSAS instance or cube is killing a particular connection, SPID or Session, this could be due to a lengthy operation exceeding the expected time to completing, or merely cancelling a transaction that was issued by mistake, hopefully that wont be a schema change on a live environment tho!.

This post goes through the XMLA required for killing an SSAS command, as well as the Analysis Service DMVs that can be utilized to identify the required IDs.

Read more →