Twitter Sentiment Analysis Training Corpus (Dataset)
An essential part of creating a Sentiment Analysis algorithm (or any Data Mining algorithm for that matter) is to have a comprehensive dataset or corpus to learn from, as well as a test dataset to ensure that the accuracy of your algorithm meets the standards you expect. This will also allow you to tweak your algorithm and deduce better (or more precise) features of natural language that you could extract from the text that contribute towards stronger sentiment classification, rather than using a generic “word bag” approach.
This post will contain a corpus of already classified tweets in terms of sentiment, this Twitter sentiment dataset is by no means diverse and should not be used in a final product for sentiment analysis, at least not without diluting the dataset with a much more diverse one.