Use the TEDS2016 dataset to run a logit (logistic regression) model using female as sole predictor. The dependent variable is the vote (1-0) for Tsai Ing-wen, the female candidate for the then opposition party Democratic Progressive Party (DPP). Access the data set using the following codes:

library(haven)
TEDS_2016<-read_stata("https://github.com/datageneration/home/blob/master/DataProgramming/data/TEDS_2016.dta?raw=true")

Check the dataset

names(TEDS_2016)

##  [1] "District"        "Sex"             "Age"             "Edu"            
##  [5] "Arear"           "Career"          "Career8"         "Ethnic"         
##  [9] "Party"           "PartyID"         "Tondu"           "Tondu3"         
## [13] "nI2"             "votetsai"        "green"           "votetsai_nm"    
## [17] "votetsai_all"    "Independence"    "Unification"     "sq"             
## [21] "Taiwanese"       "edu"             "female"          "whitecollar"    
## [25] "lowincome"       "income"          "income_nm"       "age"            
## [29] "KMT"             "DPP"             "npp"             "noparty"        
## [33] "pfp"             "South"           "north"           "Minnan_father"  
## [37] "Mainland_father" "Econ_worse"      "Inequality"      "inequality5"    
## [41] "econworse5"      "Govt_for_public" "pubwelf5"        "Govt_dont_care" 
## [45] "highincome"      "votekmt"         "votekmt_nm"      "Blue"           
## [49] "Green"           "No_Party"        "voteblue"        "voteblue_nm"    
## [53] "votedpp_1"       "votekmt_1"

Logistic regression

 teds.fit=glm(votetsai~female, data=TEDS_2016,family=binomial)
summary(teds.fit)

## 
## Call:
## glm(formula = votetsai ~ female, family = binomial, data = TEDS_2016)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4180  -1.3889   0.9546   0.9797   0.9797  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.54971    0.08245   6.667 2.61e-11 ***
## female      -0.06517    0.11644  -0.560    0.576    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1666.5  on 1260  degrees of freedom
## Residual deviance: 1666.2  on 1259  degrees of freedom
##   (429 observations deleted due to missingness)
## AIC: 1670.2
## 
## Number of Fisher Scoring iterations: 4

Female voters are not more likely to vote for President Tsai becasue the coefficient for “female” (-0.06) is negative and it is not statistically significant.

Improve the model by adding party ID variables (KMT, DPP) and other demographic variables (age, edu, income)

teds.fit2=glm(votetsai~female+KMT+DPP+Age+edu+income,
                data=TEDS_2016,family=binomial)
summary(teds.fit2)

## 
## Call:
## glm(formula = votetsai ~ female + KMT + DPP + Age + edu + income, 
##     family = binomial, data = TEDS_2016)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7416  -0.3658   0.2370   0.3098   2.5712  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.73673    0.50898   3.412 0.000644 ***
## female       0.04276    0.17769   0.241 0.809828    
## KMT         -3.14616    0.25036 -12.567  < 2e-16 ***
## DPP          2.90604    0.26860  10.819  < 2e-16 ***
## Age         -0.18582    0.08132  -2.285 0.022307 *  
## edu         -0.21355    0.08135  -2.625 0.008660 ** 
## income       0.01534    0.03447   0.445 0.656222    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1661.76  on 1256  degrees of freedom
## Residual deviance:  833.61  on 1250  degrees of freedom
##   (433 observations deleted due to missingness)
## AIC: 847.61
## 
## Number of Fisher Scoring iterations: 6

Add more variables to further improve the model

teds.fit3=glm(votetsai~female+KMT+DPP+Age+edu+income+Independence+Econ_worse+Govt_dont_care+Minnan_father+Mainland_father+Taiwanese,
                 data=TEDS_2016,family=binomial)
summary(teds.fit3)

## 
## Call:
## glm(formula = votetsai ~ female + KMT + DPP + Age + edu + income + 
##     Independence + Econ_worse + Govt_dont_care + Minnan_father + 
##     Mainland_father + Taiwanese, family = binomial, data = TEDS_2016)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0923  -0.3137   0.1752   0.4018   2.7948  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      0.30622    0.58758   0.521  0.60226    
## female          -0.09986    0.18979  -0.526  0.59878    
## KMT             -2.91362    0.25916 -11.243  < 2e-16 ***
## DPP              2.47566    0.27566   8.981  < 2e-16 ***
## Age             -0.01681    0.08932  -0.188  0.85075    
## edu             -0.12769    0.08846  -1.444  0.14887    
## income           0.02281    0.03643   0.626  0.53127    
## Independence     0.99884    0.25097   3.980 6.89e-05 ***
## Econ_worse       0.31991    0.19007   1.683  0.09236 .  
## Govt_dont_care  -0.02141    0.18852  -0.114  0.90960    
## Minnan_father   -0.23182    0.25413  -0.912  0.36166    
## Mainland_father -1.04536    0.39853  -2.623  0.00872 ** 
## Taiwanese        0.89430    0.19939   4.485 7.28e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1661.76  on 1256  degrees of freedom
## Residual deviance:  767.27  on 1244  degrees of freedom
##   (433 observations deleted due to missingness)
## AIC: 793.27
## 
## Number of Fisher Scoring iterations: 6

With the addition of new variables, age and education become statistically insignificant. The other two variables (KMT and DPP) hold in significance. Additionally, “Independence,” “Mainland_father” and “Econ_worse” also become statistically significant.

Logistic regression in STATA

Load the dataset

use “https://github.com/datageneration/home/blob/master/DataProgramming/data/TEDS_2016.dta?raw=true”

Logistic regression

logit votetsai Independence Econ_worse Govt_dont_care Minnan_father Mainland_father Taiwanese KMT DPP age edu female

Output

The difference between the R and Stata models is that the R logit model includes “income” while the Stata-based model does not. However, the two models’ results are quite similar (the same variables are significant in both models and in the same direction).

Logistic regression with TEDS_2016

Dasha Djukic-Min

4/5/2022

Use the TEDS2016 dataset to run a logit (logistic regression) model using female as sole predictor. The dependent variable is the vote (1-0) for Tsai Ing-wen, the female candidate for the then opposition party Democratic Progressive Party (DPP). Access the data set using the following codes:

Check the dataset

Logistic regression

Female voters are not more likely to vote for President Tsai becasue the coefficient for “female” (-0.06) is negative and it is not statistically significant.

Improve the model by adding party ID variables (KMT, DPP) and other demographic variables (age, edu, income)

Add more variables to further improve the model

With the addition of new variables, age and education become statistically insignificant. The other two variables (KMT and DPP) hold in significance. Additionally, “Independence,” “Mainland_father” and “Econ_worse” also become statistically significant.

Logistic regression in STATA

Load the dataset

Logistic regression

Output

The difference between the R and Stata models is that the R logit model includes “income” while the Stata-based model does not. However, the two models’ results are quite similar (the same variables are significant in both models and in the same direction).