Import a stata file into R
library("haven")
TEDS_2016 <- read_dta ("https://github.com/datageneration/home/blob/master/DataProgramming/data/TEDS_2016.dta?raw=true")
Generate frequency table of the Tondu variable
library(descr)
freq(TEDS_2016$Tondu)
## Position on unification and independence
## Frequency Percent
## 1 27 1.598
## 2 180 10.651
## 3 546 32.308
## 4 328 19.408
## 5 380 22.485
## 6 108 6.391
## 9 121 7.160
## Total 1690 100.000
Generate barchart of the Tondu variable
library(ggplot2)
ggplot(TEDS_2016, aes(Tondu)) +
geom_bar()
## Don't know how to automatically pick scale for object of type haven_labelled/vctrs_vctr/double. Defaulting to continuous.
Assign labels to the Tondu variable
TEDS_2016$Tondu<-as.numeric(TEDS_2016$Tondu,labels=c("Unification now","Status quo,unif.infuture","Statusquo,decidelater","Statusquoforever","Statusquo,indep.infuture","Independencenow","No response"))
What problems do you encounter when dealing with the dataset?
There appears to be no values for Tondu around the category of 7.5. Tondu referes to an attitude toward unification with mainland China.
How to deal with missing values?
We can simply drop observations with missing values or we can try to impute them (e.g. average imputation or common point imputation).
Explore the relationship between Tondu and other variables including female, DPP, age, income, edu, Taiwanese and Econ_worse. What methods would you use?
For numerical variables such as age, income and years of education, we can correlate them with Tondu or use scatterplots and a linear regression modeling. For categorical variables such as gender, DPP and Taiwanese in relationship to Tondu, we can use a bar plot.
How about the votetsai variable (vote for DPP candidate Tsai Ing-wen)?
Since votetsai is dychotomous, and if we assume it is a DV, we can use logit to analyze the effect of Tondu on whether a voter votes for Tsai Ing-wen or not.