How to approach sentiment analysis of user reviews

Athanasios Liapis

In this article, we discuss how to develop a mathematical model to analyse sentiment, using a hypothetical scenario of a client wishing to classify a large volume of user comments as having a positive or negative sentiment.
The data in this case consists of online comments on a product. The comments do not have a specific structure but are received as free text.

Strategy

To carry out this analysis, we require a mathematical model to classify the input data. In this case though, the number of parameters is not finite and the result would be too complicated and never accurate enough.
Therefore, we choose to train a model using machine-learning techniques. More specifically, we will train a neural network model using data that we have manually classified as positive or negative.

The decision to follow this route is based on:

    amount of previous data available – the amount needed to train a neural network model depends on the complexity of the problem and the accuracy we want to achieve
    quality of the data – poor quality data will affect the result; in this example we are controlling the quality, classifying them manually
    target accuracy – different applications (prediction, decision making, etc) have different needs; in this example we can tolerate a small percentage of errors; the accuracy reached in similar applications is >95%
    complexity of the model – despite advances in AI, its applications cannot correlate cause-and-effect relationships; in some cases a solid mathematical model created by a human operator is more accurate

Solution

This supervised solution includes many steps that can be split across organisations or handled by a single organisation. Previously, Oxford Computer Consultants has worked on complete solutions or partly on them, depending on the client’s needs. The steps are:

  1. Data Collection
    a. Get access to the data from any source and in any format.
  2. Data Preparation
    a. Clean – The data will not always be in a uniform format. It may have been obtained from various sources with different coding, format, etc. So, standardisation may be needed.
    b. Label – The existing data is one of the most important assets. These need to be labelled manually. The precision of this process will significantly affect future results in the long term. In this case, it involves reading the comments and labelling them as positive/negative.
    c. Split into train and test data sets – The biggest part of the data will be used to train the model. Another small part will be labelled to act as the evaluation sample.
  3. Choose a Model
    a. Artificial neural networks
    b. Decision trees
    c. Support vector machines
    d. Regression analysis
    e. Bayesian networks
    f. Genetic algorithms
  4. Train the Model – Run the chosen algorithm against the training data.
  5. Evaluate the Model – Run the generated model on the sample data and measure the accuracy.
  6. Parameter Tuning – Depending on the use of the results, it is necessary to fine tune some parameters. For example, if it’s more important for the negatives to be correct, then we can lower the possibilities of a false negatively classified comment.
  7. Consume – When the model is trained to meet our needs, then it is ready to classify unlimited input data. This can become part of a tailor-made automated solution, or part of software already in use.

Further possible steps

Parts of this solution can be automated so that the data flow is uninterrupted during consumption.

The challenge lies in the handling of raw data. This may need either to be given a specific structure or have other automated techniques applied during standardisation. Typical examples are anonymisation and data mining.
We recommend that to further optimise the accuracy of the solution, future data are considered and embodied in the training data set.

Conclusion

A simple solution like this leverages assets including collected data and the client’s classification expertise. Combining these, we can achieve mid and long-term resource savings or even exploitation of data on a scale that was not possible before.
Artificial Intelligence is already offering some beneficial solutions for businesses. Take advantage of them and stay ahead of the competition.

Athanasios Liapis is a Software Engineer at Oxford Computer Consultants working on client projects involving AI, modelling and machine learning.