library(readr)
<-read_csv('https://www.chesdata.eu/s/CHES_2024_final_v2.csv') ches
Classwork 1: Indexing and subsetting
This document was created using Quarto a sort of language for combining code and text. When you’re following along in class, you can either copy-paste code directly from this page into an R-studio script file, or you can go over to the github site for this class, download the .qmd file for this classwork and then open it in R-studio and work with it as an interactive document.
Data and packages
You’ll want to start by loading the CHES data. We can do this using the read_csv
function from the readr package. We could access this data set by downloading it from the CHES website and then importing it from a local file, but for .csv
files like this one, we can actually download it by putting the web address directly in to our function in quotation marks:
After you’ve imported the data, take a second to View()
the data frame you just imported and see how things are coded.
Note that in the raw csv data, the party family variable is initially a set of numbers instead of a text value.
$family[1:5] ches
[1] 5 5 7 7 3
In order to convert these to text, we’ll need to consult the codebook and then convert the numeric variables to their respective text values.
R has a special built-in data type for this exact scenario called a “factor variable”.
R factors are often a source of confusion for new users. Under the hood, a factor variable is just a set of numbers with an extra piece of data that maps those numeric values to text labels. We could also store variables like this as regular text values, but factor variables are helpful because they can make it easier to do things like fix the ordering of survey response items when we’re making a graph or table.
We’ll talk more about working with factors later on, but for now it’s sufficient to know that we’re doing this to turn the numbers into readable text:
<-c("Radical Right",
labels"Conservatives",
"Liberal",
"Christian-Democratic",
"Socialist",
"Radical Left",
"Green",
"Regionalist",
"No family",
"Confessional",
"Agrarian/Center")
$family<-factor(ches$family, labels=labels) ches
We’ll do the same thing for the “country” variable:
<-c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 20, 21, 22,
country_levels23, 24, 25, 26, 27, 28, 29, 31, 34, 35, 36, 37, 38, 40, 45)
<-c("Belgium", "Denmark", "Germany", "Greece", "Spain", "France",
country_labels"Ireland", "Italy", "Netherlands", "United Kingdom", "Portugal",
"Austria", "Finland", "Sweden", "Bulgaria", "Czech Republic",
"Estonia", "Hungary", "Latvia", "Lithuania","Poland", "Romania",
"Slovakia", "Slovenia", "Croatia", "Turkey", "Norway", "Switzerland",
"Malta", "Luxembourg", "Cyprus", "Iceland")
$country<-factor(ches$country, levels=country_levels, labels=country_labels) ches
Take a minute to View()
the data set and see how things are coded now.
Using a pipe
A lot of code you’ll see in this course will use the “pipe” operator. Pipes are a way of making code more readable by chaining together a set of operations so that they read from left to right.
Consider a script that creates 100 random numbers, exponentiates them, sorts them from highest to lowest, and then plots the result. One way to do this would be with a set of nested commands like this:
set.seed(100)
# get 100 samples from a normal distribution with mean zero and sd =1 and plot them
plot(sort(exp(rnorm(100))))
Alternatively, we could write each step on a separate line and assign the results to a new variable:
set.seed(100)
# get 100 samples from a normal distribution with mean zero and sd =1
<-rnorm(100)
x# exponentiate it
<-exp(x)
exp_x# sort it
<-sort(exp_x)
sorted_x# plot the result
plot(sorted_x)
Using the pipe operator allows us to perform this same operation without the nested parentheses or the creation of intermediate variables:
set.seed(100)
rnorm(100)|>
exp()|>
sort()|>
plot()
Note that the pipe command will default to using the left-hand-side object as the first argument for the right hand side, but you can explicitly reference the left-hand-side variable using the _
. This is especially useful for functions that don’t take “data” as their first argument. One example of this is the lm()
command:
# This gives an error:
|>
cheslm(eu_position ~ immigrate_policy)
Error in as.data.frame.default(data): cannot coerce class '"formula"' to a data.frame
# this works
|>
cheslm(eu_position ~ immigrate_policy, data=_)
Call:
lm(formula = eu_position ~ immigrate_policy, data = ches)
Coefficients:
(Intercept) immigrate_policy
6.8833 -0.3641
The %>%
is a version of the pipe that is associated with the magrittr
package. This mostly works just like the |>
pipe that is part of base R, but it has some minor differences such as using .
instead of _
to reference data. (I mention it here because you might see it elsewhere, especially in code that pre-dates the introduction of a native pipe operator)
library(magrittr)
%>%
cheslm(eu_position ~ immigrate_policy, data=.)
Call:
lm(formula = eu_position ~ immigrate_policy, data = .)
Coefficients:
(Intercept) immigrate_policy
6.8833 -0.3641
Q1
Restate the following operation using a sequence of pipes
<-subset(ches, select=family) # retrieving only the families column
chesfam
<-table(chesfam) # creating a frequency table
chesfam
<-sort(chesfam) # sorting from lowest to highest chesfam
# type your code here...
Q2
Use one of the pipe operators to retrieve only the countries that are EU members (you’ll want to use the subset
function for this)
# type your code here...
Q3
Use the pipe operator to get the square root of the variance for lrecon
(which measures the left-right positioning for each party)
# type your code here...
Subsetting
Remember we have several options for sub-setting data sets. The most basic is by using a logical operator. For instance: the expression ches$family=="Confessional"
would return TRUE
for any element where the value of family was “Confessional” and FALSE
for any element where it didn’t. Now I can use this logical vector to subset my data frame, returning all rows for confessional parties:
<-ches[ches$family=="Confessional", ]
confessional_parties
confessional_parties
country | party_id | party | family | electionyear | vote | seat | epvote | eu_position | eu_salience | eu_dissent | eu_blur | lrecon | lrecon_blur | lrecon_dissent | lrecon_salience | galtan | galtan_blur | galtan_dissent | galtan_salience | lrgen | immigrate_policy | immigrate_salience | immigrate_dissent | multiculturalism | multicult_salience | redistribution | redist_salience | climate_change | climate_change_salience | environment | environment_salience | spendvtax | deregulation | civlib_laworder | womens_rights | lgbtq_rights | samesex_marriage | religious_principles | ethnic_minorities | nationalism | urban_rural | protectionism | regions | executive_power | judicial_independence | corrupt_salience | anti_islam | people_v_elite | anti_elite_salience | eu_foreign | eu_intmark | eu_russia |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Greece | 419 | Niki | Confessional | 2023 | 0.00 | 0 | 4.37 | 2.777778 | 3.555556 | 1.000000 | 6.4 | 5.666666 | 6.250000 | 1.500000 | 2.833333 | 9.500000 | 0.2500000 | 1.166667 | 9.300000 | 9.200000 | 9.500000 | 8.888889 | 0.3333333 | 9.625000 | 7.375000 | 5.750000 | 2.600000 | 8.000000 | 1.3333334 | 5.000000 | 1.000 | 6.000000 | 4.333334 | 9.000000 | 9.200000 | 9.500000 | 10.000000 | 9.600000 | 9.800000 | 9.714286 | NA | 10.000000 | 7.500000 | 8.000000 | NA | 3.50 | 6.666666 | 6.00 | 8.5000000 | 2.5 | 2.000000 | 9.50 |
Netherlands | 1006 | SGP | Confessional | 2023 | 2.08 | 3 | 3.70 | 2.750000 | 3.454546 | 2.250000 | 2.5 | 7.200000 | 1.000000 | 2.333333 | 4.083334 | 9.416667 | 0.3750000 | 0.500000 | 9.500000 | 8.000000 | 8.666667 | 5.750000 | 1.5000000 | 8.600000 | 6.875000 | 4.857143 | 3.625000 | 6.200000 | 3.0000000 | 6.000000 | 2.000 | NA | 6.666666 | 8.250000 | 9.800000 | 9.833333 | 9.800000 | 9.857142 | 8.000000 | 5.666666 | 8.571428 | 6.000000 | 3.500000 | 5.250000 | 1.000000 | 1.00 | 4.000000 | 1.50 | 1.7777778 | 1.5 | 3.666667 | 4.00 |
Netherlands | 1016 | CU | Confessional | 2023 | 2.04 | 3 | 2.89 | 4.583334 | 4.000000 | 4.000000 | 1.0 | 4.333334 | 1.600000 | 3.000000 | 5.666666 | 6.916666 | 1.5714285 | 2.500000 | 7.250000 | 4.583334 | 4.250000 | 5.166666 | 3.1666667 | 5.272728 | 5.222222 | 4.111111 | 4.666666 | 3.000000 | 4.7500000 | 3.800000 | 4.000 | 4.000000 | 4.000000 | 4.200000 | 4.600000 | 6.200000 | 5.400000 | 9.000000 | 4.000000 | 3.666667 | 6.333334 | 5.666666 | 4.333334 | 1.600000 | 0.500000 | 1.00 | 2.666667 | 1.50 | 0.8888889 | 3.5 | 5.000000 | 3.00 |
Finland | 1409 | KD | Confessional | 2023 | 4.20 | 5 | 4.12 | 4.000000 | 5.111111 | 3.142857 | 5.0 | 7.000000 | 5.166666 | 4.200000 | 5.818182 | 8.818182 | 2.2000000 | 1.333333 | 8.000000 | 7.545454 | 7.181818 | 4.909091 | 2.4000001 | 8.090909 | 4.800000 | 5.700000 | 5.090909 | 6.400000 | 3.5999999 | 5.800000 | 3.800 | 6.875000 | 5.000000 | 6.500000 | 5.000000 | 9.000000 | 10.000000 | 9.400000 | 6.833334 | 7.200000 | 5.833334 | 6.000000 | 5.428571 | 3.600000 | 3.400000 | 6.00 | 3.333333 | 2.25 | 2.2857144 | 3.0 | 3.750000 | 1.75 |
Turkey | 3416 | YRP | Confessional | 2023 | 2.81 | 5 | NA | 1.250000 | 1.538462 | 0.200000 | 1.0 | 6.166666 | 5.666666 | 0.800000 | 6.785714 | 9.785714 | 1.0000000 | 1.250000 | 8.714286 | 9.142858 | 6.750000 | 6.307692 | 1.9090909 | 7.416666 | 7.416666 | 4.428571 | 5.461538 | 6.000000 | 0.6666667 | 6.333334 | 2.375 | 3.875000 | 4.500000 | 8.285714 | 9.285714 | 10.000000 | 10.000000 | 10.000000 | 7.500000 | 8.571428 | 5.428571 | 7.000000 | 7.000000 | 8.666667 | 6.833334 | 4.75 | 7.333334 | 2.40 | 7.8571429 | NA | NA | NA |
Switzerland | 3607 | EVP/PEV | Confessional | 2023 | 1.95 | 2 | NA | 3.900000 | 3.800000 | 3.333333 | 4.5 | 5.090909 | 5.000000 | NA | 3.428571 | 7.250000 | 0.6666667 | 2.000000 | 7.900000 | 5.636363 | 4.750000 | 2.833333 | 2.4000001 | 7.375000 | 5.600000 | 4.111111 | 4.333334 | 5.333334 | 4.0000000 | 3.800000 | 6.000 | 3.666667 | NA | 4.400000 | 8.333333 | 6.800000 | 7.000000 | 8.285714 | 6.000000 | 6.600000 | 6.250000 | 6.000000 | 4.000000 | 4.500000 | 1.750000 | 2.00 | 2.000000 | 8.20 | 1.3333334 | NA | NA | NA |
Switzerland | 3608 | EDU/UDF | Confessional | 2023 | 1.23 | 2 | NA | 1.833333 | 4.000000 | 1.000000 | 3.0 | 7.222222 | 5.666666 | NA | 3.571429 | 9.250000 | 0.3333333 | 1.000000 | 8.700000 | 8.750000 | 8.636364 | 6.888889 | 0.8571429 | 8.727273 | 7.714286 | 7.000000 | 3.600000 | 6.000000 | 2.6666667 | 7.666666 | 3.000 | 7.250000 | NA | 8.000000 | 8.750000 | 9.500000 | 8.833333 | 8.714286 | 8.750000 | 9.000000 | 7.200000 | 8.000000 | 3.666667 | 7.000000 | 3.333333 | 2.50 | 2.000000 | 9.20 | 4.0000000 | NA | NA | NA |
We can also chain together multiple logical expressions to get a more specific slice of the data. For instance, maybe I only want agrarian parties whose position on economic issues is greater than or equal to 5. I could get by combining two logical comparisons with an ampersand:
<-ches[ches$family == "Agrarian/Center" & ches$lrecon>=5,]
right_agrarians
right_agrarians
country | party_id | party | family | electionyear | vote | seat | epvote | eu_position | eu_salience | eu_dissent | eu_blur | lrecon | lrecon_blur | lrecon_dissent | lrecon_salience | galtan | galtan_blur | galtan_dissent | galtan_salience | lrgen | immigrate_policy | immigrate_salience | immigrate_dissent | multiculturalism | multicult_salience | redistribution | redist_salience | climate_change | climate_change_salience | environment | environment_salience | spendvtax | deregulation | civlib_laworder | womens_rights | lgbtq_rights | samesex_marriage | religious_principles | ethnic_minorities | nationalism | urban_rural | protectionism | regions | executive_power | judicial_independence | corrupt_salience | anti_islam | people_v_elite | anti_elite_salience | eu_foreign | eu_intmark | eu_russia |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Finland | 1403 | KESK | Agrarian/Center | 2023 | 11.3 | 23 | 11.76 | 4.818182 | 5.600000 | 3.857143 | 4.500000 | 5.818182 | 4.333334 | 4.600000 | 6.363637 | 6.454546 | 4.600 | 4.666666 | 4.636363 | 5.727272 | 5.636363 | 4.818182 | 4.9090910 | 6.272728 | 5.272728 | 5.090909 | 6.090909 | 5.800000 | 4.800000 | 5.500000 | 5.333334 | 5.25 | 5.333334 | 5.500000 | 3.833333 | 5.40 | 5.6666665 | 6.4 | 5.833334 | 5.8 | 9.166667 | 6.000000 | 2.285714 | 3.600000 | 1.8 | 6.000000 | 2.500000 | 1.750000 | 3.714286 | 3.5 | 4.400000 | 1.500000 |
Sweden | 1603 | C | Agrarian/Center | 2022 | 6.7 | 24 | 7.29 | 6.105263 | 4.526316 | 3.500000 | 3.153846 | 7.842105 | 2.090909 | 2.500000 | 7.000000 | 2.947368 | 2.375 | 2.400000 | 6.631579 | 5.947369 | 3.421053 | 6.157895 | 3.3157895 | 3.315790 | 6.000000 | 6.578947 | 5.368421 | 2.888889 | 7.222222 | 2.500000 | 7.500000 | 7.50 | 8.533334 | 3.100000 | 2.444444 | 1.25 | 0.8888889 | 2.5 | 3.090909 | 2.9 | 6.777778 | 1.222222 | 4.285714 | 4.333334 | 1.0 | 1.777778 | 2.142857 | 2.545454 | 2.166667 | 4.4 | 6.444445 | 2.666667 |
Iceland | 4503 | F | Agrarian/Center | 2021 | 7.8 | 5 | NA | 2.000000 | 3.000000 | 3.000000 | 2.800000 | 5.166666 | 3.333333 | 2.333333 | 6.333334 | 5.166666 | 3.200 | 6.000000 | 3.000000 | 5.500000 | 5.333334 | 4.666666 | 3.6666667 | 6.666666 | 4.166666 | 4.833334 | 5.333334 | 4.000000 | 4.333334 | 6.333334 | 2.666667 | 4.00 | 4.400000 | 5.333334 | 2.000000 | 2.00 | 0.5000000 | 0.0 | 4.000000 | 8.0 | 9.250000 | 8.333333 | 3.000000 | 5.666666 | 3.0 | 1.000000 | 0.500000 | 1.000000 | 0.000000 | NA | NA | NA |
Iceland | 4504 | M | Agrarian/Center | 2021 | 12.1 | 8 | NA | 1.166667 | 5.833334 | 3.000000 | 0.400000 | 8.333333 | 2.666667 | 1.000000 | 7.666666 | 8.833333 | 2.400 | 1.000000 | 8.166667 | 8.500000 | 9.000000 | 8.833333 | 0.6666667 | 9.500000 | 9.000000 | 7.200000 | 7.800000 | 9.666667 | 8.666667 | 6.333334 | 6.000000 | 9.00 | 7.200000 | 8.333333 | 3.333333 | 5.50 | 2.6666667 | 0.0 | 8.000000 | 9.5 | 8.500000 | 7.666666 | 3.500000 | 5.666666 | 4.0 | 3.000000 | 3.000000 | 5.000000 | 5.333334 | NA | NA | NA |
We can also use the subset
function to do the same operation above. Note that this function expects a data set as its first argument, and, once we’ve specified that, we can reference columns within that data set without needing to use the $
notation:
subset(ches, family == "Confessional")
country | party_id | party | family | electionyear | vote | seat | epvote | eu_position | eu_salience | eu_dissent | eu_blur | lrecon | lrecon_blur | lrecon_dissent | lrecon_salience | galtan | galtan_blur | galtan_dissent | galtan_salience | lrgen | immigrate_policy | immigrate_salience | immigrate_dissent | multiculturalism | multicult_salience | redistribution | redist_salience | climate_change | climate_change_salience | environment | environment_salience | spendvtax | deregulation | civlib_laworder | womens_rights | lgbtq_rights | samesex_marriage | religious_principles | ethnic_minorities | nationalism | urban_rural | protectionism | regions | executive_power | judicial_independence | corrupt_salience | anti_islam | people_v_elite | anti_elite_salience | eu_foreign | eu_intmark | eu_russia |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Greece | 419 | Niki | Confessional | 2023 | 0.00 | 0 | 4.37 | 2.777778 | 3.555556 | 1.000000 | 6.4 | 5.666666 | 6.250000 | 1.500000 | 2.833333 | 9.500000 | 0.2500000 | 1.166667 | 9.300000 | 9.200000 | 9.500000 | 8.888889 | 0.3333333 | 9.625000 | 7.375000 | 5.750000 | 2.600000 | 8.000000 | 1.3333334 | 5.000000 | 1.000 | 6.000000 | 4.333334 | 9.000000 | 9.200000 | 9.500000 | 10.000000 | 9.600000 | 9.800000 | 9.714286 | NA | 10.000000 | 7.500000 | 8.000000 | NA | 3.50 | 6.666666 | 6.00 | 8.5000000 | 2.5 | 2.000000 | 9.50 |
Netherlands | 1006 | SGP | Confessional | 2023 | 2.08 | 3 | 3.70 | 2.750000 | 3.454546 | 2.250000 | 2.5 | 7.200000 | 1.000000 | 2.333333 | 4.083334 | 9.416667 | 0.3750000 | 0.500000 | 9.500000 | 8.000000 | 8.666667 | 5.750000 | 1.5000000 | 8.600000 | 6.875000 | 4.857143 | 3.625000 | 6.200000 | 3.0000000 | 6.000000 | 2.000 | NA | 6.666666 | 8.250000 | 9.800000 | 9.833333 | 9.800000 | 9.857142 | 8.000000 | 5.666666 | 8.571428 | 6.000000 | 3.500000 | 5.250000 | 1.000000 | 1.00 | 4.000000 | 1.50 | 1.7777778 | 1.5 | 3.666667 | 4.00 |
Netherlands | 1016 | CU | Confessional | 2023 | 2.04 | 3 | 2.89 | 4.583334 | 4.000000 | 4.000000 | 1.0 | 4.333334 | 1.600000 | 3.000000 | 5.666666 | 6.916666 | 1.5714285 | 2.500000 | 7.250000 | 4.583334 | 4.250000 | 5.166666 | 3.1666667 | 5.272728 | 5.222222 | 4.111111 | 4.666666 | 3.000000 | 4.7500000 | 3.800000 | 4.000 | 4.000000 | 4.000000 | 4.200000 | 4.600000 | 6.200000 | 5.400000 | 9.000000 | 4.000000 | 3.666667 | 6.333334 | 5.666666 | 4.333334 | 1.600000 | 0.500000 | 1.00 | 2.666667 | 1.50 | 0.8888889 | 3.5 | 5.000000 | 3.00 |
Finland | 1409 | KD | Confessional | 2023 | 4.20 | 5 | 4.12 | 4.000000 | 5.111111 | 3.142857 | 5.0 | 7.000000 | 5.166666 | 4.200000 | 5.818182 | 8.818182 | 2.2000000 | 1.333333 | 8.000000 | 7.545454 | 7.181818 | 4.909091 | 2.4000001 | 8.090909 | 4.800000 | 5.700000 | 5.090909 | 6.400000 | 3.5999999 | 5.800000 | 3.800 | 6.875000 | 5.000000 | 6.500000 | 5.000000 | 9.000000 | 10.000000 | 9.400000 | 6.833334 | 7.200000 | 5.833334 | 6.000000 | 5.428571 | 3.600000 | 3.400000 | 6.00 | 3.333333 | 2.25 | 2.2857144 | 3.0 | 3.750000 | 1.75 |
Turkey | 3416 | YRP | Confessional | 2023 | 2.81 | 5 | NA | 1.250000 | 1.538462 | 0.200000 | 1.0 | 6.166666 | 5.666666 | 0.800000 | 6.785714 | 9.785714 | 1.0000000 | 1.250000 | 8.714286 | 9.142858 | 6.750000 | 6.307692 | 1.9090909 | 7.416666 | 7.416666 | 4.428571 | 5.461538 | 6.000000 | 0.6666667 | 6.333334 | 2.375 | 3.875000 | 4.500000 | 8.285714 | 9.285714 | 10.000000 | 10.000000 | 10.000000 | 7.500000 | 8.571428 | 5.428571 | 7.000000 | 7.000000 | 8.666667 | 6.833334 | 4.75 | 7.333334 | 2.40 | 7.8571429 | NA | NA | NA |
Switzerland | 3607 | EVP/PEV | Confessional | 2023 | 1.95 | 2 | NA | 3.900000 | 3.800000 | 3.333333 | 4.5 | 5.090909 | 5.000000 | NA | 3.428571 | 7.250000 | 0.6666667 | 2.000000 | 7.900000 | 5.636363 | 4.750000 | 2.833333 | 2.4000001 | 7.375000 | 5.600000 | 4.111111 | 4.333334 | 5.333334 | 4.0000000 | 3.800000 | 6.000 | 3.666667 | NA | 4.400000 | 8.333333 | 6.800000 | 7.000000 | 8.285714 | 6.000000 | 6.600000 | 6.250000 | 6.000000 | 4.000000 | 4.500000 | 1.750000 | 2.00 | 2.000000 | 8.20 | 1.3333334 | NA | NA | NA |
Switzerland | 3608 | EDU/UDF | Confessional | 2023 | 1.23 | 2 | NA | 1.833333 | 4.000000 | 1.000000 | 3.0 | 7.222222 | 5.666666 | NA | 3.571429 | 9.250000 | 0.3333333 | 1.000000 | 8.700000 | 8.750000 | 8.636364 | 6.888889 | 0.8571429 | 8.727273 | 7.714286 | 7.000000 | 3.600000 | 6.000000 | 2.6666667 | 7.666666 | 3.000 | 7.250000 | NA | 8.000000 | 8.750000 | 9.500000 | 8.833333 | 8.714286 | 8.750000 | 9.000000 | 7.200000 | 8.000000 | 3.666667 | 7.000000 | 3.333333 | 2.50 | 2.000000 | 9.20 | 4.0000000 | NA | NA | NA |
# OR:
# ches |> subset(family=="Confessional")
Another option we’ll explore more in the future is the filter
argument from the dplyr library. The syntax for this is almost identical to the syntax for subset
, and it also takes a dataset as its first argument:
library(dplyr)
|>filter(family=="Confessional") ches
country | party_id | party | family | electionyear | vote | seat | epvote | eu_position | eu_salience | eu_dissent | eu_blur | lrecon | lrecon_blur | lrecon_dissent | lrecon_salience | galtan | galtan_blur | galtan_dissent | galtan_salience | lrgen | immigrate_policy | immigrate_salience | immigrate_dissent | multiculturalism | multicult_salience | redistribution | redist_salience | climate_change | climate_change_salience | environment | environment_salience | spendvtax | deregulation | civlib_laworder | womens_rights | lgbtq_rights | samesex_marriage | religious_principles | ethnic_minorities | nationalism | urban_rural | protectionism | regions | executive_power | judicial_independence | corrupt_salience | anti_islam | people_v_elite | anti_elite_salience | eu_foreign | eu_intmark | eu_russia |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Greece | 419 | Niki | Confessional | 2023 | 0.00 | 0 | 4.37 | 2.777778 | 3.555556 | 1.000000 | 6.4 | 5.666666 | 6.250000 | 1.500000 | 2.833333 | 9.500000 | 0.2500000 | 1.166667 | 9.300000 | 9.200000 | 9.500000 | 8.888889 | 0.3333333 | 9.625000 | 7.375000 | 5.750000 | 2.600000 | 8.000000 | 1.3333334 | 5.000000 | 1.000 | 6.000000 | 4.333334 | 9.000000 | 9.200000 | 9.500000 | 10.000000 | 9.600000 | 9.800000 | 9.714286 | NA | 10.000000 | 7.500000 | 8.000000 | NA | 3.50 | 6.666666 | 6.00 | 8.5000000 | 2.5 | 2.000000 | 9.50 |
Netherlands | 1006 | SGP | Confessional | 2023 | 2.08 | 3 | 3.70 | 2.750000 | 3.454546 | 2.250000 | 2.5 | 7.200000 | 1.000000 | 2.333333 | 4.083334 | 9.416667 | 0.3750000 | 0.500000 | 9.500000 | 8.000000 | 8.666667 | 5.750000 | 1.5000000 | 8.600000 | 6.875000 | 4.857143 | 3.625000 | 6.200000 | 3.0000000 | 6.000000 | 2.000 | NA | 6.666666 | 8.250000 | 9.800000 | 9.833333 | 9.800000 | 9.857142 | 8.000000 | 5.666666 | 8.571428 | 6.000000 | 3.500000 | 5.250000 | 1.000000 | 1.00 | 4.000000 | 1.50 | 1.7777778 | 1.5 | 3.666667 | 4.00 |
Netherlands | 1016 | CU | Confessional | 2023 | 2.04 | 3 | 2.89 | 4.583334 | 4.000000 | 4.000000 | 1.0 | 4.333334 | 1.600000 | 3.000000 | 5.666666 | 6.916666 | 1.5714285 | 2.500000 | 7.250000 | 4.583334 | 4.250000 | 5.166666 | 3.1666667 | 5.272728 | 5.222222 | 4.111111 | 4.666666 | 3.000000 | 4.7500000 | 3.800000 | 4.000 | 4.000000 | 4.000000 | 4.200000 | 4.600000 | 6.200000 | 5.400000 | 9.000000 | 4.000000 | 3.666667 | 6.333334 | 5.666666 | 4.333334 | 1.600000 | 0.500000 | 1.00 | 2.666667 | 1.50 | 0.8888889 | 3.5 | 5.000000 | 3.00 |
Finland | 1409 | KD | Confessional | 2023 | 4.20 | 5 | 4.12 | 4.000000 | 5.111111 | 3.142857 | 5.0 | 7.000000 | 5.166666 | 4.200000 | 5.818182 | 8.818182 | 2.2000000 | 1.333333 | 8.000000 | 7.545454 | 7.181818 | 4.909091 | 2.4000001 | 8.090909 | 4.800000 | 5.700000 | 5.090909 | 6.400000 | 3.5999999 | 5.800000 | 3.800 | 6.875000 | 5.000000 | 6.500000 | 5.000000 | 9.000000 | 10.000000 | 9.400000 | 6.833334 | 7.200000 | 5.833334 | 6.000000 | 5.428571 | 3.600000 | 3.400000 | 6.00 | 3.333333 | 2.25 | 2.2857144 | 3.0 | 3.750000 | 1.75 |
Turkey | 3416 | YRP | Confessional | 2023 | 2.81 | 5 | NA | 1.250000 | 1.538462 | 0.200000 | 1.0 | 6.166666 | 5.666666 | 0.800000 | 6.785714 | 9.785714 | 1.0000000 | 1.250000 | 8.714286 | 9.142858 | 6.750000 | 6.307692 | 1.9090909 | 7.416666 | 7.416666 | 4.428571 | 5.461538 | 6.000000 | 0.6666667 | 6.333334 | 2.375 | 3.875000 | 4.500000 | 8.285714 | 9.285714 | 10.000000 | 10.000000 | 10.000000 | 7.500000 | 8.571428 | 5.428571 | 7.000000 | 7.000000 | 8.666667 | 6.833334 | 4.75 | 7.333334 | 2.40 | 7.8571429 | NA | NA | NA |
Switzerland | 3607 | EVP/PEV | Confessional | 2023 | 1.95 | 2 | NA | 3.900000 | 3.800000 | 3.333333 | 4.5 | 5.090909 | 5.000000 | NA | 3.428571 | 7.250000 | 0.6666667 | 2.000000 | 7.900000 | 5.636363 | 4.750000 | 2.833333 | 2.4000001 | 7.375000 | 5.600000 | 4.111111 | 4.333334 | 5.333334 | 4.0000000 | 3.800000 | 6.000 | 3.666667 | NA | 4.400000 | 8.333333 | 6.800000 | 7.000000 | 8.285714 | 6.000000 | 6.600000 | 6.250000 | 6.000000 | 4.000000 | 4.500000 | 1.750000 | 2.00 | 2.000000 | 8.20 | 1.3333334 | NA | NA | NA |
Switzerland | 3608 | EDU/UDF | Confessional | 2023 | 1.23 | 2 | NA | 1.833333 | 4.000000 | 1.000000 | 3.0 | 7.222222 | 5.666666 | NA | 3.571429 | 9.250000 | 0.3333333 | 1.000000 | 8.700000 | 8.750000 | 8.636364 | 6.888889 | 0.8571429 | 8.727273 | 7.714286 | 7.000000 | 3.600000 | 6.000000 | 2.6666667 | 7.666666 | 3.000 | 7.250000 | NA | 8.000000 | 8.750000 | 9.500000 | 8.833333 | 8.714286 | 8.750000 | 9.000000 | 7.200000 | 8.000000 | 3.666667 | 7.000000 | 3.333333 | 2.50 | 2.000000 | 9.20 | 4.0000000 | NA | NA | NA |
Q5
Use one of the sub-setting operations above to create a subset of the CHES data that only includes parties that received at least 10% of the vote in the last election:
Frequency tables
R’s table
command will generate either one or two-dimensional frequency tables.
table(ches$family)
Radical Right Conservatives Liberal
48 26 46
Christian-Democratic Socialist Radical Left
17 38 26
Green Regionalist No family
31 22 11
Confessional Agrarian/Center
7 7
In some cases, it may be more useful to examine the percentages in each group, rather than the raw frequencies. You can use the prop.table
function to generate proportions from an existing table object.
<-table(ches$family)
table_of_families
prop.table(table_of_families)
Radical Right Conservatives Liberal
0.17204301 0.09318996 0.16487455
Christian-Democratic Socialist Radical Left
0.06093190 0.13620072 0.09318996
Green Regionalist No family
0.11111111 0.07885305 0.03942652
Confessional Agrarian/Center
0.02508961 0.02508961
Crosstabs
Adding a second categorical variable to the table function will generate a cross tab. The first group will be presented in the rows, and the second group in the columns. In this example, I’m creating a new variable called radical_right
that is TRUE if a given party is part of the radical right family. Then I’m creating a cross tab that allows me to see how many radical right parties held seats based on the year in which the election occurred:
$radical_right <- ches$family == "Radical Right"
ches
table(ches$electionyear, ches$radical_right)
FALSE TRUE
2021 39 5
2022 41 10
2023 80 17
2024 71 16
In many cases, I’m interested in getting percentages rather than just raw frequencies. I can use the prop.table
function to get proportions from a frequency table. By default, prop.table
will just give the cell percentages:
<-table(ches$electionyear, ches$radical_right)
rr_table
prop.table(rr_table)
FALSE TRUE
2021 0.13978495 0.01792115
2022 0.14695341 0.03584229
2023 0.28673835 0.06093190
2024 0.25448029 0.05734767
…But this isn’t always a useful metric. More often, we’ll want to use a cross tab to answer a question like “what percentage of people in group X have characteristic Y?” For instance: “what percentage of parties in a countries parliament were from the radical right family” To answer this question, I’ll need to calculate:
\[ \frac{\text{Number of rad right wing parties in the data for year i}}{\text{Total number of parties in year i}} \]
Q6
Use table
and prop.table
function to calculate the proportion of radical right parties in each country’s parliament. You’ll want to look at the help
file for the function to figure out how to do this.
# Enter your code here
The Janitor package
There are a number of packages that (ostensibly) help you make nicer looking cross tabs that can be easily exported to a report. Its worth exploring these to see which ones are most useful for you, but one I want to highlight is the janitor package, which allows you to use the tabyl
command to create tables:
library(janitor)
<- tabyl(dat=ches, var1=electionyear, var2=radical_right)
t1
t1
electionyear | FALSE | TRUE |
---|---|---|
2021 | 39 | 5 |
2022 | 41 | 10 |
2023 | 80 | 17 |
2024 | 71 | 16 |
On its own, this isn’t much to write home about, but it provides a lot of nice options for adorning
the table with additional statistics:
|>
t1adorn_totals("row")|> # add totals
adorn_percentages("row")|> # calculate percentages
adorn_pct_formatting() # format percentages
electionyear | FALSE | TRUE |
---|---|---|
2021 | 88.6% | 11.4% |
2022 | 80.4% | 19.6% |
2023 | 82.5% | 17.5% |
2024 | 81.6% | 18.4% |
Total | 82.8% | 17.2% |
The janitor package has the advantage of having relatively simple syntax, which makes it great for exploring a data set. However, if you’re looking to include cross tabs in a report or presentation you may want to consider one of the R packages that can create more aesthetically pleasing tables. For this you might want to consider gt
, kableExtra
, or the modelsummary
package.
Aggregation and grouping
We can use aggregate
to apply a function to each level of a categorical variable. To do this we’ll use R’s formula syntax, which in general will look something like this:
y ~ x
The variable on the left will be the outcome measure you’re aggregating, the variable on the right will be the category you’re aggregating over. So, if we want to get the average level of corruption salience in each country, we can run this:
# x = the formula,
# FUN= should be a function you want to run for each group
<- aggregate(x = corrupt_salience ~ country ,
corruption_salience FUN = mean,
data = ches )
corruption_salience
country | corrupt_salience |
---|---|
Belgium | 0.4583333 |
Denmark | 2.6166666 |
Germany | 4.2949736 |
Greece | 5.5648148 |
Spain | 4.7142857 |
France | 2.6000000 |
Ireland | 4.9541667 |
Italy | 5.1000000 |
Netherlands | 1.6666667 |
United Kingdom | 5.8261905 |
Portugal | 8.3333333 |
Austria | 6.0400000 |
Finland | 6.7222222 |
Sweden | 1.8055556 |
Bulgaria | 5.8909091 |
Czech Republic | 5.1817198 |
Estonia | 6.8750000 |
Hungary | 6.6871693 |
Latvia | 5.3833334 |
Lithuania | 5.5749999 |
Poland | 6.3730159 |
Romania | 4.8809524 |
Slovakia | 5.0462963 |
Slovenia | 6.7708334 |
Croatia | 6.8556547 |
Turkey | 6.3571429 |
Norway | 2.2777778 |
Switzerland | 1.9880952 |
Iceland | 4.5000000 |
dplyr method:
There’s also a dplyr
method for this, that may be a little easier to parse. Here, we just use “group by” to declare grouping variable, and then use “summarize” to summarize over members of each group.
library(dplyr) # if you haven't already loaded dplyr
<-ches|>
newtablegroup_by(country)|>
summarize(avg_corruption_salience= mean(corrupt_salience))
Going forward, we’ll mostly use the dplyr
commands for this class, but knowing aggregate
can be useful for interpreting other people’s code.
Q7
Use either the dplyr
commands or aggregate
to get each party family’s average position on redistribution issues.
# your code here....
Functions
Remember that R functions are just snippets of code that we can re-use. If I have done a calculation once, I can probably write a function to apply it to more general cases.
For instance, here’s some code that I could use to get the top parties by vote share:
# this gives row number for the highest to lowest value
<-order(ches$vote, decreasing=T)[1:10]
top_n
# this subsets the values of party by the index in top_n:
$party[top_n] ches
[1] "PL" "PN" "ND" "RN" "AKP"
[6] "PiS" "GS" "HDZ" "Fidesz-KDNP" "LAB"
I could probably make this more generalized by creating a function.
<- function(labels, values, n){ # the arguments
topnFunction <- order(values, decreasing= T)[1:n] # the subset
top_n <-labels[top_n] # the result
resultreturn(result) # the return statement
}
Now I can use this in a bunch of different scenarios. Maybe I want to get the top parties by seat share instead of vote share, then all I need to do is change the values
argument:
topnFunction(ches$party, ches$seat, n=10)
[1] "LAB" "AKP" "SPD" "PiS" "CHP" "PO" "CDU" "ND" "RN" "PP"
Or I can get more observations by changing n
topnFunction(ches$party, ches$seat, n=15)
[1] "LAB" "AKP" "SPD" "PiS" "CHP"
[6] "PO" "CDU" "ND" "RN" "PP"
[11] "Fidesz-KDNP" "PSOE" "CONS" "FDI" "Grunen"
Q8
How could I modify the function above to return both the party names and the values themselves?
#...your code here