• Keine Ergebnisse gefunden

Triangulated Sentiment Analysis of Tweets

N/A
N/A
Protected

Academic year: 2022

Aktie "Triangulated Sentiment Analysis of Tweets"

Copied!
9
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Dr Simone E Griesser School of Applied Psychology

FHNW University of Applied Sciences and Arts Northwestern Switzerland

Triangulated Sentiment Analysis of Tweets

6

th

Swiss Data Science Conference, 14

th

June 2019, Berne

Dr Neha Gupta

Warwick Institute for the Science of Cities University of Warwick, United Kingdom

Agenda

 Research Motivation

 Overview of Lexical, Machine Learning, and Psycholinguistic Sentiment Approaches

 Dataset

 Sentiment Analysis with Lexical, Machine Learning, and Psycholinguistic Approaches

 Results

 The Nuances of Psycholinguistics: Sentiment Intensity

 Outlook

(2)

14 June 2019

Institute for Market Supply and Consumer Decision-Making Dr Simone E Griesser 3

Research Motivation: Data Abundance and Lack of Interdisciplinary Approach

Computer Science has been at forefront of developing sentiment scoring approaches.

Increasing amount of unstructured language data available reflecting

consumer opinions.

Understand similarities and differences between sentiment approaches from

computer science and psychology. These approaches

complement each other.

(Brysbaert, Keuleers &

Mandera, 2014)

Computer Science Approaches: Lexical and Machine Learning

Data Dictionaries (Lexicons)

Word Comparisons

Sentiment Scores

Lexical Method Several Machine Learning Methods

(3)

14 June 2019

Institute for Market Supply and Consumer Decision-Making Dr Simone E Griesser 5

Psychological Approach: Psycholinguistics

 Psycholinguistics is concerned with language comprehension and the relationship between language and psychological processes. (Miller, 1965;

Rubenstein & Aborn, 1960)

 Views sentiment a continuum and differentiates between different positive emotions, e.g. how positive.

 Emotional experiences are multidimensional.

(Warriner, et al., 2013) Sentiment Positivity

content happy joyous excited

Comparing and Contrasting the Different Approaches

Ensuing Propositions

 Lexical and Psycholinguistics approaches are similar due to unigram.

 Lexical and Machine Learning approaches are similar due to the same initial dictionary database.

 Lexical and Machine Learning approaches are similar due to calculus similarity: the number of negative word occurrences are subtracted from the number of positive word occurrences.

Lexical Machine Learning Psycholinguistics

Unigram Bigram Unigram

1 or >1 Lexicon(s) 1 or >1 Lexicon(s) Training data

1 Dictionary database (lexicon)

(4)

14 June 2019

Institute for Market Supply and Consumer Decision-Making Dr Simone E Griesser 7

The Dataset and Research Context: Service Outage

 Skype outage on 21stSeptember 2015.

 Data collection with Twitter streaming API and twitter4j API Java package.

 Real-time collection of 1% - 40% of sent tweets.

 Use of keywords, ‘#skypedown’ and

‘skypedown’ in the tweet text.

 Collection of approximately 10,000 tweets.

Sentiment Scoring: Lexical Approach

 Remove stop words from text.

 Extract unigrams (single words).

 Obtain sentiment scores per words from Bing- Liu lexicon.

 Classify tweets into positive, negative, and neutral categories.

Data Cleaning

Compare with

Scoring Algorithm Opinion Lexicons

Obtain Text

Bag of Words

(5)

14 June 2019

Institute for Market Supply and Consumer Decision-Making Dr Neha Gupta 9

Sentiment Scoring: Machine Learning Approach

 Divide into training dataset (labelled) and test dataset.

 Train machine learning model (Logistic Regression).

 Check performance – cross validation.

 Run model on unseen data.

 Repeat.

Sentiment Scoring: Psycholinguistic Approach

 Sentiment positivity ratings for 13,915 word lemmas. (Warriner et al. 2013)

 Each word has been rated at least by 18 individuals.

 Database has been recently used in the consumer behaviour discipline.

(Ren & Nickerson 2014; Hildebrand et al. 2017)

Removal of numbers, website links, emoticons, special characters, and stop words

Rate remaining words according to word sentiment positivity ratings

Sentiment Positivity per tweet Psycholinguisticdatabase Tweets about Skype outage

Tokenise cleaned tweets

Compute mean and median sentiment positivity per tweet

1 2 3 4 5 6 7 8 9

1 = completely unhappy, annoyed, unsatisfied, melancholic, or despaired

9 = completely happy, pleased, satisfied, or contented

(6)

14 June 2019

Institute for Market Supply and Consumer Decision-Making Dr Simone E Griesser 11

Results: Visual Comparison

Sentiment mean, - 1 = unhappy, 1 = happy Sentiment mean, - 1 = unhappy, 1 = happy

Results: Statistical Comparison with Kendall’s tau

 Lexical and psycholinguistics dictionary databases (lexicon) seem to be somewhat similar.

 Approaches seem to start deviating from each other with the learning algorithm.

 Similarities or differences cannot be explained in terms of data cleaning processes or differing stop words.

Lexical Machine Learning Psycholinguistic Mean Psycholinguistic Median

Lexical - .473*** .466*** .403***

Machine Learning - .295*** .244***

Psycholinguistic Mean - .847***

Psycholinguistic Mean -

(7)

14 June 2019

Institute for Market Supply and Consumer Decision-Making Dr Simone E Griesser 13

The Nuances of Psycholinguistics: How to Obtain More Customer Insight

 Emotional experiences are multidimensional:

(Warriner, et al., 2013)

 Sentiment positivity: language valence

Sentiment Positivity

happy joyous excited content

The Nuances of Psycholinguistics: How to Obtain More Customer Insight

 Emotional experiences are multidimensional:

(Warriner, et al., 2013)

 Sentiment positivity: language valence

 Sentiment intensity: language arousal content

happy

joyous excited

Sentiment Intensity

1 2 3 4 5 6 7 8 9

1 = completely relaxed, calm, sluggish, dull, sleepy, or unaroused.

9 = completely stimulated, excited, frenzied, jittery, wide-awake, or aroused.

(8)

14 June 2019

Institute for Market Supply and Consumer Decision-Making Dr Simone E Griesser 15

The Nuances of Psycholinguistics: Use in Customer Relationship Management (CRM)

Customer Delight (Oliver, Rust & Varki, 1997): ‘delighted’customers are more satisfied and loyal than

‘content’ customers.

 ‘Delight’ is a stronger positive emotion than ‘content’ strong emotions more powerfully influence customer satisfaction than weakly experienced emotions.

 The Lexical and Machine Learning approaches poorly reflect these nuances because:

 The words ‘delighted’ and ‘content’ are treated as equally positive.

 According to the computation method of the Lexical and Machine Learning approaches, these sentences would have an equal sentiment:

“It was a joyous event, but I was displeased about the weather”.

“It was a joyous event, but I was upset about the weather”.

Psycholinguistic approach addresses this lack of detail for CRM with nuanced sentiment positivity and sentiment intensity scores.

The Nuances of Psycholinguistics: Sentiment Intensity in CRM

 Despite service failure, Skype customers were not strongly upset.

Maybe only very unhappy customers were strongly upset?

 Selection of tweets in the sample whose sentiment was three standard deviations above or below the mean.

 Correlation of sentiment positivity and sentiment intensity:

 With increasing sentiment positivity, unhappy and happy customers use slightly calmer language (tau = - .115, z = - 15.453, p <.001; tau = - .143, z = -5.185, p <.001).

 Negligible difference in sentiment intensity between

(9)

14 June 2019

Institute for Market Supply and Consumer Decision-Making Dr Simone E Griesser 17

Outlook

 Monitor customer sentiment positivity and intensity in written or spoken language to assess the impact of:

 Service recovery actions.

 Customer inconveniences, i.e. delayed or wrong delivery, on customer satisfaction.

 Better understand when your customer gets frustrated with self-service technology and wants a member of staff: Very negative and emotionally intense language high levels of frustration.

 Monitor the performance of complaint handling or call centres by analysing customer language.

 Reduce market research and customer insight cost.

 More from Psycholinguistics: Language abstractness.

 Measure similarities between brands.

 Measure brand or product knowledge of individual customer groups.

Thank you!

Dr Simone Griesser

Senior Research and Teaching Fellow

School of Applied Psychology FHNW

Institute for Market Supply and Consumer Decision-Making Riggenbachstrasse 16

4600 Olten T +41 62 957 26 78 Simone.Griesser@fhnw.ch

https://www.linkedin.com/in/simonegriesser https://www.fhnw.ch/en/people/dr-simone-griesser

Referenzen

ÄHNLICHE DOKUMENTE

We augment the h-index with polarities of citation links and developed the “p-index” (Ma et al. and n is the amount of negative citations, with positive citation

Even though the domain-independent semantic orientation of the word “unpredictable” is negative, the two latent variables that generate the word the most have positive

205.. Motivated by the latter approach, this paper presents a method for automati- cally creating a sentiment lexicon in a new language using a sentiment lexicon in a

Wird ein Knoten mit einer nicht-neutralen Polarit¨at (also negative oder positive) gefunden, wird diese als Polarit¨at f¨ur den entsprechenden Aspekt angenommen.. Nur wenn

Die Stimmungskurven sind jeweils auf die Anzahl der Tweets normiert, der Stimmungsscore er- gibt sich also als N N (positive (positive_tweets)+N _ tweets) N (negative

We have designed an experiment intended to predict the levels of acoustic expressivity in arbitrary text using sentiment analysis scores and the number of words in the

The polarity values from the statistical and the linguistic classification are then combined into a joint global polarity value that is used for presenting the segments in the

Since there is a huge amount of news every day, our goal is to offer a semi-automatic approach by taking news data from the Eu- rope Media Monitor [1], conducting sentiment analysis