Archive for category: C#

Rehashing SQL Server Hashing Algorithms for Large Text Fields

23 Mar
March 23, 2013

Hashing can be a very useful technique when dealing with the storage and look up of large text fields (say a table of URLs or Search Keywords), these fields will incur high resource utilization on any database engine if used directly in DML statements, in which they are either filtered by or aggregated on. Any index built on these fields is costly to maintain, if it is at all possible given that SQL Server limits index size to 900 bytes.

Using hashing functions we can facilitate the handling of large textual data in the relational engine, leading to improved performance when these fields are being compared to satisfy a query, hashing can also be used to build unique and non-unique indexes that are easier to manage than directly using the text fields in the index definition. In this post we will discuss a few options for hashing large text data using functions native within SQL Server, as well as provide other external  hashing algorithms that we can integrate into Microsoft’s SQL Server (or any RDBMS for that matter) that might provide a better practical performance. 

Read more →

Import Dmoz Content through C# to SQL Server

24 Sep
September 24, 2012

Dmoz (the Open Directory Project) has a wealth of data in relation to websites, as well as a comprehensive list of categories, this has been established through years of maintaining the directory (before and after being bought by Google), and being one of the most “sought after” real-estate in terms of link building.

Recently I came across Dmoz data through a classification research project I was working on, essentially we had a Naive Bayes classifier which we were trying to use to classify companies (through a description snippet) into categories, and then extract which other competitors of this company exist within the same category… Simples!

In order to import the Dmoz data into SQL Server, I resorted to using the Dmoz Data Importer solution by bodzebod, which although very good and did the job well for the Dmoz Structure files, which contains all the category classifications, bodzebod has not yet implemented the import of the Dmoz content file, which actually contains the data. This post presents a solution to importing Dmoz content file into a SQL Server database through C#, building on the work of bodzebod.

Read more →

C# Bayesian Network Client Library

21 Mar
March 21, 2012

Bayesian Networks, particularly in its Naive (or Idiotic, as some angry physicist might call it), is an absolutely amazing and intuitive way for reasoning with a Probabilistic Network model. The Bayesian model has been heavily used across a wide array of industries, even though the Naive model is very much a simplistic view of what an actual Bayesian model might looks like, it is still a very practical approximation that has gained a lot of popularity in fields such as classifications and segmentations. This post introduces a client library for running reasoning patterns on a custom-built Bayesian Network.

Read more →

Execute PHP File Within C#

10 Dec
December 10, 2011

Running PHP from C-Sharp is not a great way to go about building a stable code base, but sometimes it is a necessary evil.

I’ve ran into this very issue recently when trying to convert a new PHP PageRank hashing algorithm into C-Sharp, the problem mainly stemmed from the fact that the code did a lot of Byte Shifting, which is does not map from one programming language to another easily, and so thought I would share my experience regarding how to run PHP code in csharp.

Read more →