Last weekend, I had some time to work on a sentiment analysis project. Specifically, I worked on a bitcoin sentiment through analyzing freshly collected tweets.
To collect the fresh tweets, I followed the example from Microsoft. I like this method very much because it contains a progress bar, so you know the progress in real time. This could be very important, especially when you collect some topics which are not that popular. It can take several hours. One problem you may not like is that the “progressbar” package for Python 3.0 is not available for anaconda, at least as I tested.
To restrict the topic on bitcoin, I have added “bitcoin’ and “cryptocurrency” as the filter. I have collected 10,000 tweets in total. Since it is a weekend project, I don’t have much time to write my own sentiment algorithm. Fortunately, there is one python package called “Textblob” which simplifies your text processing. You can find the information about Textblob here. This package provides a simple API for common natural language processing (NLP) tasks such as classification as well as sentiment analysis.
To know a little bit more about the tweet contents, I have pooled all the tweet text altogether. To make it a little bit cleaner, I have filtered with stop words as well as the punctuations. Since there are a lot links and retweets, I have also filtered three additional strings (“http”, “https” and “RT”). After a simple pre-processing, I was able to generate a world cloud as shown below using wordcloud python package. As expected, the Bitcoin is the most frequent word shown in the tweets. Other related words such as blockchain, Ethereum, cryptocurrency are also shown in high frequency.
To further analyze the sentiment, I have used the TextBlob package as mentioned before. Out of the 10,000 tweets, 5,493 tweets are classified as neutral, 3,698 tweets are classified as positive, and 809 tweets are classified as negative. So less than 10 percent of the tweets are negative.
As I have mentioned earlier, I don’t have much time to work on the project. The processing steps could be a little bit simple. For example, emojis used in tweets could be a good indicator for sentiment analysis. Furthermore, the sample size is also a little bit small, so it may not reflect the real world sentiment. Anyway, from this analysis, it seems that the bitcoin sentiment is still pretty positive. If you want to learn more about how I did it, please feel free to check it out on my Github.
If you have any comments, you are more than welcome to do it. Thank you for your time.