I’ve refined and partially overhauled my algorithms to analyze sentiment in Tweets over the last weeks with some notable results. Here is what I came up with so far. I am starting to feel like I’m doing science instead of the tedious tasks I did over my previous semesters.
Reading and interpreting sentences with an algorithm instead by yourself is tough. Reading tweets is worse, so much worse. Let me tell you about some concepts I came up with.
Parsing data files is always a little difficult, since you can’t be sure that your data is formatted properly. I mentioned in earlier posts that I am currently creating a Reader for my training data. Here is how I am doing.
I am far from creating the best code possible but last week I spent some time writing a half decent Reader for my training data sets. I will write my code in Java, since it’s my most fluent programming language. But first I’ll write some lines about the pipeline.
Turns out, twitter.com doesn’t like it when you request tons of tweets and it took me quite a while to get a decent number of them. Here is a little update, what I received so far.