Knowledge Mining: Text mining
File: Lab_sentiment_tidytext01.R
Theme: Running sentiment anlaysis using tidytext package
Once my academic research developer account was aproved, I created a project Knowledge_Mining
The Knowledge_Mining project came with “Keys and tokens” for direct authentication to access Twitter data
Enter key and tokens from Twitter academic research developer account
(Developer Portal-> Projects & Apps -> Effect of nonprofits on community subjective wellbeing -> Knowledge_Mining -> Keys and tokens)
twitter_token <- rtweet::create_token(
app="Knowledge_Mining",
consumer_key <- "KFo5Ua7O1xQDtArxRhYWyPQt9",
consumer_secret <- "O97SYv4LR05vXyvYH2ChXT2AGkFCPScfNvo3GEpz1T3teHe4Mj",
access_token_key <- "1502021687990128643-w8CtO5zFuGDcs2OtVU0Qs7ib9nfImQ",
access_secret <- "EPc28tX2CSSWkcqlBg5HbaAp709eX3HsuPRewogpZsu58")
tw <- search_tweets("taiwan", n=100, retryonratelimit = TRUE)
Plot by time
ts_plot(tw,"mins",cex=.25,alpha=1) +
theme_bw() +
theme(text = element_text(family="Palatino"),
plot.title = element_text(hjust = 0.5),plot.subtitle = element_text(hjust = 0.5),plot.caption = element_text(hjust = 0.5)) +
labs(title = "Frequency of keyword 'Taiwan' used in last 100,000 Twitter tweets",
subtitle = "Twitter tweet counts aggregated per minute interval ",
caption = "\nSource: Data collected from Twitter's REST API via rtweet",hjust = 0.5)

Preprocess text data
twtxt = tw$text
textDF <- tibble(txt = tw$text)
tidytwt= textDF %>%
unnest_tokens(word, txt)
tidytwt <- tidytwt %>% anti_join(stop_words) # Removing stopwords
## Joining, by = "word"
tidytwt %>%
count(word, sort = TRUE) %>%
filter(n > 10) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col() +
xlab("Keyword") + ylab("Count") +
coord_flip() + theme_bw()

tidytwt <- tidytwt %>%
mutate(linenumber = row_number()) # create linenumber
Use Plotly library to plot density chart
ggplot(sentiment_tw, aes(sentiment, fill = posneg)) +
geom_density(alpha = 0.5, position = "stack") +
ggtitle("stacked sentiment density chart")+theme_bw()

bing_word_counts <- tidytwt %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining, by = "word"
bing_word_counts %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~sentiment, scales = "free_y") +
labs(y = "Sentiments toward Taiwan, March 9, 2022",
x = NULL) +
coord_flip() + theme_bw()+ theme(strip.text.x = element_text(family="Palatino"),
axis.title.x=element_text(face="bold", size=15,family="Palatino"),
axis.title.y=element_text(family="Palatino"),
axis.text.x = element_text(family="Palatino"),
axis.text.y = element_text(family="Palatino"))
## Selecting by n
