The second AI ZA meetup was held on 21 June 2016 at Entelect’s headquarters in Melrose Arch, Johannesburg. The event was packed with over 50 attendees eager to learn about artificial intelligence and machine learning.
The goal of this meetup was to continue with a more in depth classification problem as well as run through the introductory concepts for all those that were new to the meetup.
The first concept covered was how to determine if a classification algorithm can be used. The diagram below illustrates the decision tree for deciding if a classification algorithm is right for the problem being tackled and dataset at hand.
Two families of algorithms were explored; the Linear Support Vector Machines(SVC) algorithm, and the Naive Bayes algorithm. These were selected to illustrate the differences in classification algorithms, and where each are most effective.
Linear SVC is used for supervised learning for high dimensional spaces where the number of features being analyzed is large, however, it can perform poorly when the number of dimensions is far greater than the number of samples.
Naive Bayes is best when each feature is meant to be analyzed independently without any assumptions of relationships between features. This makes it a good classification algorithm for the problem tackled in the hackathon, which was text analysis.
The goal of the hackathon was to analyze the Amazon fine foods reviews dataset and determine if a review score could be assigned based on a person’s textual review. Most of the meetup group teamed up and tried to solve this problem together, whilst some got up and running with the introductory hackathon on password strength.
After 2 hours of hacking away on the text sentiment problem, some got close, but realized that sentiment analysis is difficult and is often inaccurate. Many were looking forward to expanding on their code after the meetup to improve the accuracy of their implementation.
By simply querying the dataset, some interesting trends came to light. Many 5 star reviews have the word “love” in it, about a third of those reviews also have the word “hate” in it! The dataset also contained quite a number of 1 star reviews with the word “love” or “like” in them. This goes to show that simply using keywords for textual analysis can be extremely inaccurate.
The meetup ended with some interesting casual chats about artificial intelligence, philosophy, ethics, virtual reality, and our future as humans.
To tinker with this problem, visit the GitHub repository