SQL Server offers Full Text Search capabilities integrated within it’s framework, Full-Text Search offers very fast search capabilities over large text columns, along with advance search features such as Stemming or Thesaurus, as well as Infection (tense) Searches and Proximity Search, to name a few.
Out of the box, SQL Server’s FullText Search currently does not ship with a thesaurus for any language, here I present an English thesaurus that can be used to stem English language searches.
Microsoft did promise that a thesaurus will be included in future editions of SQL Server, although until then, here is a SQL Server English Language Thesaurus (XML), formatted in the exact manner accepted by SQL Server’s FULL-TEXT Search engine.
This XML thesaurus is based on the open-source CSV thesaurus that can be found at Stormrage.
There is an excellent post by Lowell on SQL Server Central, which shows how you can upload the thesaurus CSV file onto a structured table.
The imported data will appear in two tables, TheThesaurus and TheSynonyms, as SQL Server’s thesaurus takes in an XML file, you will need to write a query (FOR XML) that joins up these two tables to produce the output required by the FT Engine. I like FOR XML PATH, here is my query that I used to convert the data to an acceptable format for SQL Server:
SET @ThesaurusXML = (SELECT
T.[Word] AS pat,
T.[Word] AS sub,
SELECT [TheDefinition] AS sub
FROM TheSynonyms S
WHERE T.ThesaurusID = S.ThesaurusID
FOR XML PATH(''),TYPE
-- T.ThesaurusID IN (252,1)
FOR XML PATH('replacement'),ROOT('thesaurus'),TYPE
SQL Server stores thesaurus XML files by default within its installation structure, generally can be found in the following path structure:
..\Program Files\Microsoft SQL Server\MSSQL
There you will see a collection of files, each file is relevant to a specific language code, aside from the global file (tsglobal.xml).
The global thesaurus file by default is run against every query using the keyword expansion feature, as well as the language specific thesaurus file. Be careful not over-load your global thesaurus, particularly if you are running a multi-language Full-Text search engine.
The XML thesaurus file is set to be diacritics insensitive, you can change the XML value to 1 (from 0) if that is a business requirement.
After uploading the XML thesaurus, you will need to load the FullText Thesaurus file using the following SQL command:
Make sure you have enough memory and CPU resources to process the XML file, it can be resource intensive.
Full-Text is a very cool technology and easy to setup too, I really recommend having a browse through the advance features offered by this search framework.