Knowledge Mining: Text mining

File: Lab_sentiment_syuzhet01.R

Theme: Running sentiment anlaysis using syuzhet package

Data: Twitter data via REST API

Sample program for using rtweet, syuzhet for sentiment analysis

Be sure you get Twitter developer account

install.packages(c(“easypackages”,“rtweet”,“tidyverse”,“RColorBrewer”,“tidytext”,“syuzhet”))

library(easypackages)
libraries("rtweet","tidyverse","RColorBrewer","tidytext","syuzhet")
## Loading required package: rtweet
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter()  masks stats::filter()
## x purrr::flatten() masks rtweet::flatten()
## x dplyr::lag()     masks stats::lag()
## Loading required package: RColorBrewer
## Loading required package: tidytext
## Loading required package: syuzhet
## 
## Attaching package: 'syuzhet'
## The following object is masked from 'package:rtweet':
## 
##     get_tokens
## All packages loaded successfully

Use rtweet to collect Twitter data via API

Prerequisite: Twitter developer account

Required package: rtweet

Create token for direct authentication to access Twitter data

token <- rtweet::create_token(
app="Knowledge_Mining",
consumer_key <- "KFo5Ua7O1xQDtArxRhYWyPQt9",
consumer_secret <- "O97SYv4LR05vXyvYH2ChXT2AGkFCPScfNvo3GEpz1T3teHe4Mj",
access_token_key <- "1502021687990128643-w8CtO5zFuGDcs2OtVU0Qs7ib9nfImQ",
access_secret <- "EPc28tX2CSSWkcqlBg5HbaAp709eX3HsuPRewogpZsu58")

Check token

rtweet::get_token()
## <Token>
## <oauth_endpoint>
##  request:   https://api.twitter.com/oauth/request_token
##  authorize: https://api.twitter.com/oauth/authenticate
##  access:    https://api.twitter.com/oauth/access_token
## <oauth_app> Knowledge_Mining
##   key:    KFo5Ua7O1xQDtArxRhYWyPQt9
##   secret: <hidden>
## <credentials> oauth_token, oauth_token_secret
## ---

Collect data from Twitter using keyword “Taiwan”

tw <- search_tweets("Taiwan", n=1000)

Sentiment analysis

twtweets <- iconv(tw$text) # convert text data encoding
tw_sent_nrc <- get_nrc_sentiment(twtweets) # Get sentiment scores using NRC lexicon
## Warning: `spread_()` was deprecated in tidyr 1.2.0.
## Please use `spread()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
barplot(colSums(tw_sent_nrc),
        las = 2,
        col = rainbow(10),
        ylab = 'Count',
        main = 'Sentiment Scores Tweets of "Taiwan"')

tw_sent <- get_sentiment(twtweets) # Get sentiment scores 
plot(tw_sent, pch=20, cex = .3, col = "blue")