Import a stata file into R

library("haven")
TEDS_2016 <- read_dta ("https://github.com/datageneration/home/blob/master/DataProgramming/data/TEDS_2016.dta?raw=true")

Generate frequency table of the Tondu variable

library(descr)
freq(TEDS_2016$Tondu)

## Position on unification and independence 
##       Frequency Percent
## 1            27   1.598
## 2           180  10.651
## 3           546  32.308
## 4           328  19.408
## 5           380  22.485
## 6           108   6.391
## 9           121   7.160
## Total      1690 100.000

Generate barchart of the Tondu variable

library(ggplot2)
ggplot(TEDS_2016, aes(Tondu)) + 
  geom_bar()
## Don't know how to automatically pick scale for object of type haven_labelled/vctrs_vctr/double. Defaulting to continuous.

Assign labels to the Tondu variable

TEDS_2016$Tondu<-as.numeric(TEDS_2016$Tondu,labels=c("Unification now","Status quo,unif.infuture","Statusquo,decidelater","Statusquoforever","Statusquo,indep.infuture","Independencenow","No response"))

What problems do you encounter when dealing with the dataset?

There appears to be no values for Tondu around the category of 7.5. Tondu referes to an attitude toward unification with mainland China.

How to deal with missing values?

We can simply drop observations with missing values or we can try to impute them (e.g. average imputation or common point imputation).

Explore the relationship between Tondu and other variables including female, DPP, age, income, edu, Taiwanese and Econ_worse. What methods would you use?

For numerical variables such as age, income and years of education, we can correlate them with Tondu or use scatterplots and a linear regression modeling. For categorical variables such as gender, DPP and Taiwanese in relationship to Tondu, we can use a bar plot.

How about the votetsai variable (vote for DPP candidate Tsai Ing-wen)?

Since votetsai is dychotomous, and if we assume it is a DV, we can use logit to analyze the effect of Tondu on whether a voter votes for Tsai Ing-wen or not.