Trialling automated sentiment analysis

Introduction

…A deep up-pouring from some saltier well  

Within me, bursts its watery syllable.

Le Monocle de Mon Oncle, Wallace Stevens

My goals are to build my financial strength and stay in the black so that i can relx and not worry about where im getting my money.

Research participant, adapted

 

Language is slippery, intricate, peculiar. To interpret even simple sentences, never mind the above examples, requires an understanding of, not merely word definitions and grammatical rules, but nuance and context. As humans, we’re language experts – some, like Steven Pinker, have even argued that language is a human instinct – and as such, after years of social interaction, we’re able to do much of the job of interpretation automatically, without conscious effort, just as most of us don’t have to think about the complex motor movements involved each time we walk.

Text analytics, or text mining, is the process of deriving high-quality information from text. And just like so many human processes, we humans are trying to get computers to do it for us. Here at Relish, we’re always interested in learning more about what’s out there. Can we researchers use automated text analytics, either to uncover more and better insight, or to get to the insight faster?

So, where to start?

This was a daunting question. A great many text analytics tools exist, in part because text analytics can take many forms, including categorisation, clustering, summarisation and sentiment analysis.

To analyse large datasets covering many topics – such as all the tweets this year that mention a brand, or every first line given to a chatbot helpdesk – categorisation and clustering are vital. That’s because, in order to make sense of the data, it’s necessary to put strings of text into different ‘buckets’ based on what they’re about.

However, as primary researchers, our data looks different. We’re typically able to direct participants towards specific areas for discussion, and the relatively small size of our datasets can make clustering less important. So, in trying to conduct automated text analytics ourselves, we focussed first on sentiment analysis. Not so much the meanings of the words, but the emotions underneath.

At this point I should say that I started out sceptical about how good a job software could really do here, and that this scepticism often proved well-founded. Rather than a sales pitch, what follows is an honest account of trialling a sentiment analysis tool, with an assessment of its strengths and limitations.

The trial

We ran a qualitative online community in which participants completed a week-long diary task. The text from this community, which after cleaning came to over 2,500 text strings, would be our dataset. Our objective was to understand the emotions and sentiments behind participants’ language, and if different cohorts talked about themselves and their behaviour in different ways. How did our findings compare to those reached through regular human analysis?

We trialled multiple ‘packages’ constructed to run within R, and the most successful trial was with a package called syuzhet. This was designed to apply sentiment analysis to literary texts – syuzhet is actually a term from Russian literary criticism –  and it works with the NRC emotion lexicon, which categorises over 14,000 English words according to two sentiments (negative and positive) and eight emotions. Scores for individual words can then be aggregated across sentences.

Let’s take a look at our example participant text, which I’ve adapted for demonstration purposes:

My goals are to build my financial strength and stay in the black so that i can relx and not worry about where im getting my money.

Makes sense … sort of … maybe. How did syuzhet arrive at these scores? It looks up individual words in the NRC emotion lexicon, and totals the scores across the entire sentence:

Taking the most interesting words in turn:

  • build: This, correctly as I see it, is scored positively.
  • strength: Not only a positive score, but one for trust as well, which makes sense.
  • black: syuzhet is not sophisticated enough to pick up on the idiom in the black, which of course has positive associations with being in credit (as opposed to in the red). Instead, we get scores for sadness and negativity. So, completely wrong.
  • relx: The typo here means no score, illustrating the importance of correcting typos where possible.
  • worry: A word that does indeed have negative associations with anticipation, fear and sadness, but here it’s qualified by the word not. However, one could argue that its use in this instance still carries the negative associations, even though they are something the writer is trying to avoid. This is an example of where human analysts might disagree.
  • money: This is associated with anger, anticipation, joy, surprise, trust and positivity. As stated above, syuzhet is designed for literary texts, and this is an example of a more literary, or even poetic, interpretation. Understandable, but to me, some of these feel a bit of a stretch.

 

A mixed picture. How did syuzhet do when looking at large volumes of text across different cohorts, in comparison to human analysis?

Human analysis of their responses uncovered various differences between the cohorts, built on by the types of behaviours and products they were using. Syuzhet, when analysing responses to the relevant questions across participant types, also found these differences. For example, text strings from one cohort in particular scored more highly on anticipation and joy, and syuzhet demonstrated through higher sentiment scores that they tended to be more positive about their lives in general.

So, for all its limitations, automated sentiment analysis uncovered some of the same insights as human analysis. And while there were questions for which syuzhet’s scores made little sense, in other cases, especially where participants used emotionally charged language, it did a good job. Scores here enabled us, simply by sorting or filtering sentences, to find powerful examples of participants’ displaying their emotions.

Learnings and next steps

I fully acknowledge that we’ve only just scratched the surface of what sentiment analysis can offer, and that our simple trial had its limitations. Nevertheless, we learned a lot for next time:

  • Most importantly, automated sentiment analysis is no substitute for its human equivalent. You might be worried about robots taking your job, and you might even be right to be worried. But syuzhet is not that robot (unless you’re, you know, really bad). This tool simply cannot provide the richness and insight of human analysis…
  • …but it can complement or accelerate it. Data preparation and R coding aside, the generation of scores – imperfect, but internally consistent – takes only a few seconds; and as we’ve seen, in some circumstances they can be valuable. Automated sentiment analysis can generate hypotheses, rapidly identify the most powerful text strings, and provide quantified evidence.
  • Speaking of data preparation, don’t underestimate the amount of time the cleaning process takes. This could include correcting spelling errors, removing brand or concept names, and even developing domain-specific lexicons that should also be excluded from analysis. (A starting point for developing such lexicons might be word frequency analysis.)
  • The question matters. We’re trying to understand broad sentiments and emotions, not minor details or technical processes. Our results made most sense where we invited an emotional response.
  • When presenting, illustrate what the scores mean through the most powerful examples. Comparison, whether between audiences or concepts, is more powerful and meaningful than total scores, which can feel abstract and be hard to interpret.

 

For the future, where we include suitable questions in our research (either qualitative or quantitative), or otherwise are analysing a large corpus of text, we’ll continue to experiment with syuzhet, and where appropriate use the results to complement – not replace – our more traditional approaches. We’ll also continue to investigate alternative tools and packages, both for sentiment analysis and other forms of text analytics.

Joe