Classwork 1: Indexing and subsetting

This document was created using Quarto a sort of language for combining code and text. When you’re following along in class, you can either copy-paste code directly from this page into an R-studio script file, or you can go over to the github site for this class, download the .qmd file for this classwork and then open it in R-studio and work with it as an interactive document.

Data and packages

You’ll want to start by loading the CHES data. We can do this using the read_csv function from the readr package. We could access this data set by downloading it from the CHES website and then importing it from a local file, but for .csv files like this one, we can actually download it by putting the web address directly in to our function in quotation marks:

library(readr)

ches<-read_csv('https://www.chesdata.eu/s/CHES_2024_final_v2.csv')

After you’ve imported the data, take a second to View() the data frame you just imported and see how things are coded.

Note that in the raw csv data, the party family variable is initially a set of numbers instead of a text value.

ches$family[1:5]
[1] 5 5 7 7 3

In order to convert these to text, we’ll need to consult the codebook and then convert the numeric variables to their respective text values.

R has a special built-in data type for this exact scenario called a “factor variable”.

R factors are often a source of confusion for new users. Under the hood, a factor variable is just a set of numbers with an extra piece of data that maps those numeric values to text labels. We could also store variables like this as regular text values, but factor variables are helpful because they can make it easier to do things like fix the ordering of survey response items when we’re making a graph or table.

We’ll talk more about working with factors later on, but for now it’s sufficient to know that we’re doing this to turn the numbers into readable text:

labels<-c("Radical Right",
          "Conservatives",
          "Liberal", 
          "Christian-Democratic",
          "Socialist",
          "Radical Left",
          "Green", 
          "Regionalist", 
          "No family",
          "Confessional",
          "Agrarian/Center")

ches$family<-factor(ches$family, labels=labels)

We’ll do the same thing for the “country” variable:

country_levels<-c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 20, 21, 22, 
          23, 24, 25, 26, 27, 28, 29, 31, 34, 35, 36, 37, 38, 40, 45)
country_labels<-c("Belgium", "Denmark", "Germany", "Greece", "Spain", "France", 
          "Ireland", "Italy", "Netherlands",  "United Kingdom", "Portugal", 
          "Austria", "Finland", "Sweden", "Bulgaria", "Czech Republic",  
          "Estonia", "Hungary", "Latvia", "Lithuania","Poland", "Romania", 
          "Slovakia", "Slovenia", "Croatia", "Turkey", "Norway", "Switzerland", 
          "Malta", "Luxembourg", "Cyprus", "Iceland")

ches$country<-factor(ches$country, levels=country_levels, labels=country_labels)

Take a minute to View() the data set and see how things are coded now.

Using a pipe

A lot of code you’ll see in this course will use the “pipe” operator. Pipes are a way of making code more readable by chaining together a set of operations so that they read from left to right.

Consider a script that creates 100 random numbers, exponentiates them, sorts them from highest to lowest, and then plots the result. One way to do this would be with a set of nested commands like this:

set.seed(100)
# get 100 samples from a normal distribution with mean zero and sd =1 and plot them
plot(sort(exp(rnorm(100))))

Alternatively, we could write each step on a separate line and assign the results to a new variable:

set.seed(100)
# get 100 samples from a normal distribution with mean zero and sd =1 
x<-rnorm(100)
# exponentiate it
exp_x<-exp(x)
# sort it 
sorted_x<-sort(exp_x)
# plot the result
plot(sorted_x)

Using the pipe operator allows us to perform this same operation without the nested parentheses or the creation of intermediate variables:

set.seed(100)
rnorm(100)|>
  exp()|>
  sort()|>
  plot()

Note that the pipe command will default to using the left-hand-side object as the first argument for the right hand side, but you can explicitly reference the left-hand-side variable using the _. This is especially useful for functions that don’t take “data” as their first argument. One example of this is the lm() command:

# This gives an error:
ches|>
  lm(eu_position ~ immigrate_policy)
Error in as.data.frame.default(data): cannot coerce class '"formula"' to a data.frame
# this works
ches|>
  lm(eu_position ~ immigrate_policy, data=_)

Call:
lm(formula = eu_position ~ immigrate_policy, data = ches)

Coefficients:
     (Intercept)  immigrate_policy  
          6.8833           -0.3641  

The %>% is a version of the pipe that is associated with the magrittr package. This mostly works just like the |> pipe that is part of base R, but it has some minor differences such as using . instead of _ to reference data. (I mention it here because you might see it elsewhere, especially in code that pre-dates the introduction of a native pipe operator)

library(magrittr)
ches%>%
  lm(eu_position ~ immigrate_policy, data=.)

Call:
lm(formula = eu_position ~ immigrate_policy, data = .)

Coefficients:
     (Intercept)  immigrate_policy  
          6.8833           -0.3641  

Q1

Restate the following operation using a sequence of pipes

chesfam<-subset(ches, select=family) # retrieving only the families column

chesfam<-table(chesfam) # creating a frequency table

chesfam<-sort(chesfam) # sorting from lowest to highest
# type your code here...

Q2

Use one of the pipe operators to retrieve only the countries that are EU members (you’ll want to use the subset function for this)

# type your code here...

Q3

Use the pipe operator to get the square root of the variance for lrecon (which measures the left-right positioning for each party)

# type your code here...

Subsetting

Remember we have several options for sub-setting data sets. The most basic is by using a logical operator. For instance: the expression ches$family=="Confessional" would return TRUE for any element where the value of family was “Confessional” and FALSE for any element where it didn’t. Now I can use this logical vector to subset my data frame, returning all rows for confessional parties:

confessional_parties<-ches[ches$family=="Confessional", ]


confessional_parties
country party_id party family electionyear vote seat epvote eu_position eu_salience eu_dissent eu_blur lrecon lrecon_blur lrecon_dissent lrecon_salience galtan galtan_blur galtan_dissent galtan_salience lrgen immigrate_policy immigrate_salience immigrate_dissent multiculturalism multicult_salience redistribution redist_salience climate_change climate_change_salience environment environment_salience spendvtax deregulation civlib_laworder womens_rights lgbtq_rights samesex_marriage religious_principles ethnic_minorities nationalism urban_rural protectionism regions executive_power judicial_independence corrupt_salience anti_islam people_v_elite anti_elite_salience eu_foreign eu_intmark eu_russia
Greece 419 Niki Confessional 2023 0.00 0 4.37 2.777778 3.555556 1.000000 6.4 5.666666 6.250000 1.500000 2.833333 9.500000 0.2500000 1.166667 9.300000 9.200000 9.500000 8.888889 0.3333333 9.625000 7.375000 5.750000 2.600000 8.000000 1.3333334 5.000000 1.000 6.000000 4.333334 9.000000 9.200000 9.500000 10.000000 9.600000 9.800000 9.714286 NA 10.000000 7.500000 8.000000 NA 3.50 6.666666 6.00 8.5000000 2.5 2.000000 9.50
Netherlands 1006 SGP Confessional 2023 2.08 3 3.70 2.750000 3.454546 2.250000 2.5 7.200000 1.000000 2.333333 4.083334 9.416667 0.3750000 0.500000 9.500000 8.000000 8.666667 5.750000 1.5000000 8.600000 6.875000 4.857143 3.625000 6.200000 3.0000000 6.000000 2.000 NA 6.666666 8.250000 9.800000 9.833333 9.800000 9.857142 8.000000 5.666666 8.571428 6.000000 3.500000 5.250000 1.000000 1.00 4.000000 1.50 1.7777778 1.5 3.666667 4.00
Netherlands 1016 CU Confessional 2023 2.04 3 2.89 4.583334 4.000000 4.000000 1.0 4.333334 1.600000 3.000000 5.666666 6.916666 1.5714285 2.500000 7.250000 4.583334 4.250000 5.166666 3.1666667 5.272728 5.222222 4.111111 4.666666 3.000000 4.7500000 3.800000 4.000 4.000000 4.000000 4.200000 4.600000 6.200000 5.400000 9.000000 4.000000 3.666667 6.333334 5.666666 4.333334 1.600000 0.500000 1.00 2.666667 1.50 0.8888889 3.5 5.000000 3.00
Finland 1409 KD Confessional 2023 4.20 5 4.12 4.000000 5.111111 3.142857 5.0 7.000000 5.166666 4.200000 5.818182 8.818182 2.2000000 1.333333 8.000000 7.545454 7.181818 4.909091 2.4000001 8.090909 4.800000 5.700000 5.090909 6.400000 3.5999999 5.800000 3.800 6.875000 5.000000 6.500000 5.000000 9.000000 10.000000 9.400000 6.833334 7.200000 5.833334 6.000000 5.428571 3.600000 3.400000 6.00 3.333333 2.25 2.2857144 3.0 3.750000 1.75
Turkey 3416 YRP Confessional 2023 2.81 5 NA 1.250000 1.538462 0.200000 1.0 6.166666 5.666666 0.800000 6.785714 9.785714 1.0000000 1.250000 8.714286 9.142858 6.750000 6.307692 1.9090909 7.416666 7.416666 4.428571 5.461538 6.000000 0.6666667 6.333334 2.375 3.875000 4.500000 8.285714 9.285714 10.000000 10.000000 10.000000 7.500000 8.571428 5.428571 7.000000 7.000000 8.666667 6.833334 4.75 7.333334 2.40 7.8571429 NA NA NA
Switzerland 3607 EVP/PEV Confessional 2023 1.95 2 NA 3.900000 3.800000 3.333333 4.5 5.090909 5.000000 NA 3.428571 7.250000 0.6666667 2.000000 7.900000 5.636363 4.750000 2.833333 2.4000001 7.375000 5.600000 4.111111 4.333334 5.333334 4.0000000 3.800000 6.000 3.666667 NA 4.400000 8.333333 6.800000 7.000000 8.285714 6.000000 6.600000 6.250000 6.000000 4.000000 4.500000 1.750000 2.00 2.000000 8.20 1.3333334 NA NA NA
Switzerland 3608 EDU/UDF Confessional 2023 1.23 2 NA 1.833333 4.000000 1.000000 3.0 7.222222 5.666666 NA 3.571429 9.250000 0.3333333 1.000000 8.700000 8.750000 8.636364 6.888889 0.8571429 8.727273 7.714286 7.000000 3.600000 6.000000 2.6666667 7.666666 3.000 7.250000 NA 8.000000 8.750000 9.500000 8.833333 8.714286 8.750000 9.000000 7.200000 8.000000 3.666667 7.000000 3.333333 2.50 2.000000 9.20 4.0000000 NA NA NA

We can also chain together multiple logical expressions to get a more specific slice of the data. For instance, maybe I only want agrarian parties whose position on economic issues is greater than or equal to 5. I could get by combining two logical comparisons with an ampersand:

right_agrarians<-ches[ches$family == "Agrarian/Center" & ches$lrecon>=5,]



right_agrarians
country party_id party family electionyear vote seat epvote eu_position eu_salience eu_dissent eu_blur lrecon lrecon_blur lrecon_dissent lrecon_salience galtan galtan_blur galtan_dissent galtan_salience lrgen immigrate_policy immigrate_salience immigrate_dissent multiculturalism multicult_salience redistribution redist_salience climate_change climate_change_salience environment environment_salience spendvtax deregulation civlib_laworder womens_rights lgbtq_rights samesex_marriage religious_principles ethnic_minorities nationalism urban_rural protectionism regions executive_power judicial_independence corrupt_salience anti_islam people_v_elite anti_elite_salience eu_foreign eu_intmark eu_russia
Finland 1403 KESK Agrarian/Center 2023 11.3 23 11.76 4.818182 5.600000 3.857143 4.500000 5.818182 4.333334 4.600000 6.363637 6.454546 4.600 4.666666 4.636363 5.727272 5.636363 4.818182 4.9090910 6.272728 5.272728 5.090909 6.090909 5.800000 4.800000 5.500000 5.333334 5.25 5.333334 5.500000 3.833333 5.40 5.6666665 6.4 5.833334 5.8 9.166667 6.000000 2.285714 3.600000 1.8 6.000000 2.500000 1.750000 3.714286 3.5 4.400000 1.500000
Sweden 1603 C Agrarian/Center 2022 6.7 24 7.29 6.105263 4.526316 3.500000 3.153846 7.842105 2.090909 2.500000 7.000000 2.947368 2.375 2.400000 6.631579 5.947369 3.421053 6.157895 3.3157895 3.315790 6.000000 6.578947 5.368421 2.888889 7.222222 2.500000 7.500000 7.50 8.533334 3.100000 2.444444 1.25 0.8888889 2.5 3.090909 2.9 6.777778 1.222222 4.285714 4.333334 1.0 1.777778 2.142857 2.545454 2.166667 4.4 6.444445 2.666667
Iceland 4503 F Agrarian/Center 2021 7.8 5 NA 2.000000 3.000000 3.000000 2.800000 5.166666 3.333333 2.333333 6.333334 5.166666 3.200 6.000000 3.000000 5.500000 5.333334 4.666666 3.6666667 6.666666 4.166666 4.833334 5.333334 4.000000 4.333334 6.333334 2.666667 4.00 4.400000 5.333334 2.000000 2.00 0.5000000 0.0 4.000000 8.0 9.250000 8.333333 3.000000 5.666666 3.0 1.000000 0.500000 1.000000 0.000000 NA NA NA
Iceland 4504 M Agrarian/Center 2021 12.1 8 NA 1.166667 5.833334 3.000000 0.400000 8.333333 2.666667 1.000000 7.666666 8.833333 2.400 1.000000 8.166667 8.500000 9.000000 8.833333 0.6666667 9.500000 9.000000 7.200000 7.800000 9.666667 8.666667 6.333334 6.000000 9.00 7.200000 8.333333 3.333333 5.50 2.6666667 0.0 8.000000 9.5 8.500000 7.666666 3.500000 5.666666 4.0 3.000000 3.000000 5.000000 5.333334 NA NA NA

We can also use the subset function to do the same operation above. Note that this function expects a data set as its first argument, and, once we’ve specified that, we can reference columns within that data set without needing to use the $ notation:

subset(ches, family == "Confessional")
country party_id party family electionyear vote seat epvote eu_position eu_salience eu_dissent eu_blur lrecon lrecon_blur lrecon_dissent lrecon_salience galtan galtan_blur galtan_dissent galtan_salience lrgen immigrate_policy immigrate_salience immigrate_dissent multiculturalism multicult_salience redistribution redist_salience climate_change climate_change_salience environment environment_salience spendvtax deregulation civlib_laworder womens_rights lgbtq_rights samesex_marriage religious_principles ethnic_minorities nationalism urban_rural protectionism regions executive_power judicial_independence corrupt_salience anti_islam people_v_elite anti_elite_salience eu_foreign eu_intmark eu_russia
Greece 419 Niki Confessional 2023 0.00 0 4.37 2.777778 3.555556 1.000000 6.4 5.666666 6.250000 1.500000 2.833333 9.500000 0.2500000 1.166667 9.300000 9.200000 9.500000 8.888889 0.3333333 9.625000 7.375000 5.750000 2.600000 8.000000 1.3333334 5.000000 1.000 6.000000 4.333334 9.000000 9.200000 9.500000 10.000000 9.600000 9.800000 9.714286 NA 10.000000 7.500000 8.000000 NA 3.50 6.666666 6.00 8.5000000 2.5 2.000000 9.50
Netherlands 1006 SGP Confessional 2023 2.08 3 3.70 2.750000 3.454546 2.250000 2.5 7.200000 1.000000 2.333333 4.083334 9.416667 0.3750000 0.500000 9.500000 8.000000 8.666667 5.750000 1.5000000 8.600000 6.875000 4.857143 3.625000 6.200000 3.0000000 6.000000 2.000 NA 6.666666 8.250000 9.800000 9.833333 9.800000 9.857142 8.000000 5.666666 8.571428 6.000000 3.500000 5.250000 1.000000 1.00 4.000000 1.50 1.7777778 1.5 3.666667 4.00
Netherlands 1016 CU Confessional 2023 2.04 3 2.89 4.583334 4.000000 4.000000 1.0 4.333334 1.600000 3.000000 5.666666 6.916666 1.5714285 2.500000 7.250000 4.583334 4.250000 5.166666 3.1666667 5.272728 5.222222 4.111111 4.666666 3.000000 4.7500000 3.800000 4.000 4.000000 4.000000 4.200000 4.600000 6.200000 5.400000 9.000000 4.000000 3.666667 6.333334 5.666666 4.333334 1.600000 0.500000 1.00 2.666667 1.50 0.8888889 3.5 5.000000 3.00
Finland 1409 KD Confessional 2023 4.20 5 4.12 4.000000 5.111111 3.142857 5.0 7.000000 5.166666 4.200000 5.818182 8.818182 2.2000000 1.333333 8.000000 7.545454 7.181818 4.909091 2.4000001 8.090909 4.800000 5.700000 5.090909 6.400000 3.5999999 5.800000 3.800 6.875000 5.000000 6.500000 5.000000 9.000000 10.000000 9.400000 6.833334 7.200000 5.833334 6.000000 5.428571 3.600000 3.400000 6.00 3.333333 2.25 2.2857144 3.0 3.750000 1.75
Turkey 3416 YRP Confessional 2023 2.81 5 NA 1.250000 1.538462 0.200000 1.0 6.166666 5.666666 0.800000 6.785714 9.785714 1.0000000 1.250000 8.714286 9.142858 6.750000 6.307692 1.9090909 7.416666 7.416666 4.428571 5.461538 6.000000 0.6666667 6.333334 2.375 3.875000 4.500000 8.285714 9.285714 10.000000 10.000000 10.000000 7.500000 8.571428 5.428571 7.000000 7.000000 8.666667 6.833334 4.75 7.333334 2.40 7.8571429 NA NA NA
Switzerland 3607 EVP/PEV Confessional 2023 1.95 2 NA 3.900000 3.800000 3.333333 4.5 5.090909 5.000000 NA 3.428571 7.250000 0.6666667 2.000000 7.900000 5.636363 4.750000 2.833333 2.4000001 7.375000 5.600000 4.111111 4.333334 5.333334 4.0000000 3.800000 6.000 3.666667 NA 4.400000 8.333333 6.800000 7.000000 8.285714 6.000000 6.600000 6.250000 6.000000 4.000000 4.500000 1.750000 2.00 2.000000 8.20 1.3333334 NA NA NA
Switzerland 3608 EDU/UDF Confessional 2023 1.23 2 NA 1.833333 4.000000 1.000000 3.0 7.222222 5.666666 NA 3.571429 9.250000 0.3333333 1.000000 8.700000 8.750000 8.636364 6.888889 0.8571429 8.727273 7.714286 7.000000 3.600000 6.000000 2.6666667 7.666666 3.000 7.250000 NA 8.000000 8.750000 9.500000 8.833333 8.714286 8.750000 9.000000 7.200000 8.000000 3.666667 7.000000 3.333333 2.50 2.000000 9.20 4.0000000 NA NA NA
# OR:
# ches |> subset(family=="Confessional")

Another option we’ll explore more in the future is the filter argument from the dplyr library. The syntax for this is almost identical to the syntax for subset, and it also takes a dataset as its first argument:

library(dplyr)
ches|>filter(family=="Confessional")
country party_id party family electionyear vote seat epvote eu_position eu_salience eu_dissent eu_blur lrecon lrecon_blur lrecon_dissent lrecon_salience galtan galtan_blur galtan_dissent galtan_salience lrgen immigrate_policy immigrate_salience immigrate_dissent multiculturalism multicult_salience redistribution redist_salience climate_change climate_change_salience environment environment_salience spendvtax deregulation civlib_laworder womens_rights lgbtq_rights samesex_marriage religious_principles ethnic_minorities nationalism urban_rural protectionism regions executive_power judicial_independence corrupt_salience anti_islam people_v_elite anti_elite_salience eu_foreign eu_intmark eu_russia
Greece 419 Niki Confessional 2023 0.00 0 4.37 2.777778 3.555556 1.000000 6.4 5.666666 6.250000 1.500000 2.833333 9.500000 0.2500000 1.166667 9.300000 9.200000 9.500000 8.888889 0.3333333 9.625000 7.375000 5.750000 2.600000 8.000000 1.3333334 5.000000 1.000 6.000000 4.333334 9.000000 9.200000 9.500000 10.000000 9.600000 9.800000 9.714286 NA 10.000000 7.500000 8.000000 NA 3.50 6.666666 6.00 8.5000000 2.5 2.000000 9.50
Netherlands 1006 SGP Confessional 2023 2.08 3 3.70 2.750000 3.454546 2.250000 2.5 7.200000 1.000000 2.333333 4.083334 9.416667 0.3750000 0.500000 9.500000 8.000000 8.666667 5.750000 1.5000000 8.600000 6.875000 4.857143 3.625000 6.200000 3.0000000 6.000000 2.000 NA 6.666666 8.250000 9.800000 9.833333 9.800000 9.857142 8.000000 5.666666 8.571428 6.000000 3.500000 5.250000 1.000000 1.00 4.000000 1.50 1.7777778 1.5 3.666667 4.00
Netherlands 1016 CU Confessional 2023 2.04 3 2.89 4.583334 4.000000 4.000000 1.0 4.333334 1.600000 3.000000 5.666666 6.916666 1.5714285 2.500000 7.250000 4.583334 4.250000 5.166666 3.1666667 5.272728 5.222222 4.111111 4.666666 3.000000 4.7500000 3.800000 4.000 4.000000 4.000000 4.200000 4.600000 6.200000 5.400000 9.000000 4.000000 3.666667 6.333334 5.666666 4.333334 1.600000 0.500000 1.00 2.666667 1.50 0.8888889 3.5 5.000000 3.00
Finland 1409 KD Confessional 2023 4.20 5 4.12 4.000000 5.111111 3.142857 5.0 7.000000 5.166666 4.200000 5.818182 8.818182 2.2000000 1.333333 8.000000 7.545454 7.181818 4.909091 2.4000001 8.090909 4.800000 5.700000 5.090909 6.400000 3.5999999 5.800000 3.800 6.875000 5.000000 6.500000 5.000000 9.000000 10.000000 9.400000 6.833334 7.200000 5.833334 6.000000 5.428571 3.600000 3.400000 6.00 3.333333 2.25 2.2857144 3.0 3.750000 1.75
Turkey 3416 YRP Confessional 2023 2.81 5 NA 1.250000 1.538462 0.200000 1.0 6.166666 5.666666 0.800000 6.785714 9.785714 1.0000000 1.250000 8.714286 9.142858 6.750000 6.307692 1.9090909 7.416666 7.416666 4.428571 5.461538 6.000000 0.6666667 6.333334 2.375 3.875000 4.500000 8.285714 9.285714 10.000000 10.000000 10.000000 7.500000 8.571428 5.428571 7.000000 7.000000 8.666667 6.833334 4.75 7.333334 2.40 7.8571429 NA NA NA
Switzerland 3607 EVP/PEV Confessional 2023 1.95 2 NA 3.900000 3.800000 3.333333 4.5 5.090909 5.000000 NA 3.428571 7.250000 0.6666667 2.000000 7.900000 5.636363 4.750000 2.833333 2.4000001 7.375000 5.600000 4.111111 4.333334 5.333334 4.0000000 3.800000 6.000 3.666667 NA 4.400000 8.333333 6.800000 7.000000 8.285714 6.000000 6.600000 6.250000 6.000000 4.000000 4.500000 1.750000 2.00 2.000000 8.20 1.3333334 NA NA NA
Switzerland 3608 EDU/UDF Confessional 2023 1.23 2 NA 1.833333 4.000000 1.000000 3.0 7.222222 5.666666 NA 3.571429 9.250000 0.3333333 1.000000 8.700000 8.750000 8.636364 6.888889 0.8571429 8.727273 7.714286 7.000000 3.600000 6.000000 2.6666667 7.666666 3.000 7.250000 NA 8.000000 8.750000 9.500000 8.833333 8.714286 8.750000 9.000000 7.200000 8.000000 3.666667 7.000000 3.333333 2.50 2.000000 9.20 4.0000000 NA NA NA

Q5

Use one of the sub-setting operations above to create a subset of the CHES data that only includes parties that received at least 10% of the vote in the last election:

Frequency tables

R’s table command will generate either one or two-dimensional frequency tables.

table(ches$family)

       Radical Right        Conservatives              Liberal 
                  48                   26                   46 
Christian-Democratic            Socialist         Radical Left 
                  17                   38                   26 
               Green          Regionalist            No family 
                  31                   22                   11 
        Confessional      Agrarian/Center 
                   7                    7 

In some cases, it may be more useful to examine the percentages in each group, rather than the raw frequencies. You can use the prop.table function to generate proportions from an existing table object.

table_of_families<-table(ches$family)

prop.table(table_of_families)

       Radical Right        Conservatives              Liberal 
          0.17204301           0.09318996           0.16487455 
Christian-Democratic            Socialist         Radical Left 
          0.06093190           0.13620072           0.09318996 
               Green          Regionalist            No family 
          0.11111111           0.07885305           0.03942652 
        Confessional      Agrarian/Center 
          0.02508961           0.02508961 

Crosstabs

Adding a second categorical variable to the table function will generate a cross tab. The first group will be presented in the rows, and the second group in the columns. In this example, I’m creating a new variable called radical_right that is TRUE if a given party is part of the radical right family. Then I’m creating a cross tab that allows me to see how many radical right parties held seats based on the year in which the election occurred:

ches$radical_right <-  ches$family == "Radical Right"


table(ches$electionyear, ches$radical_right)
      
       FALSE TRUE
  2021    39    5
  2022    41   10
  2023    80   17
  2024    71   16

In many cases, I’m interested in getting percentages rather than just raw frequencies. I can use the prop.table function to get proportions from a frequency table. By default, prop.table will just give the cell percentages:

rr_table<-table(ches$electionyear, ches$radical_right)

prop.table(rr_table)
      
            FALSE       TRUE
  2021 0.13978495 0.01792115
  2022 0.14695341 0.03584229
  2023 0.28673835 0.06093190
  2024 0.25448029 0.05734767

…But this isn’t always a useful metric. More often, we’ll want to use a cross tab to answer a question like “what percentage of people in group X have characteristic Y?” For instance: “what percentage of parties in a countries parliament were from the radical right family” To answer this question, I’ll need to calculate:

\[ \frac{\text{Number of rad right wing parties in the data for year i}}{\text{Total number of parties in year i}} \]

Q6

Use table and prop.table function to calculate the proportion of radical right parties in each country’s parliament. You’ll want to look at the help file for the function to figure out how to do this.

# Enter your code here

The Janitor package

There are a number of packages that (ostensibly) help you make nicer looking cross tabs that can be easily exported to a report. Its worth exploring these to see which ones are most useful for you, but one I want to highlight is the janitor package, which allows you to use the tabyl command to create tables:

library(janitor)

t1 <- tabyl(dat=ches, var1=electionyear, var2=radical_right)

t1
electionyear FALSE TRUE
2021 39 5
2022 41 10
2023 80 17
2024 71 16

On its own, this isn’t much to write home about, but it provides a lot of nice options for adorning the table with additional statistics:


t1|>
  adorn_totals("row")|> # add totals
  adorn_percentages("row")|> # calculate percentages
  adorn_pct_formatting() # format percentages
electionyear FALSE TRUE
2021 88.6% 11.4%
2022 80.4% 19.6%
2023 82.5% 17.5%
2024 81.6% 18.4%
Total 82.8% 17.2%
R-Packages for presentation-ready tables

The janitor package has the advantage of having relatively simple syntax, which makes it great for exploring a data set. However, if you’re looking to include cross tabs in a report or presentation you may want to consider one of the R packages that can create more aesthetically pleasing tables. For this you might want to consider gt, kableExtra, or the modelsummary package.

Aggregation and grouping

We can use aggregate to apply a function to each level of a categorical variable. To do this we’ll use R’s formula syntax, which in general will look something like this:

y ~ x

The variable on the left will be the outcome measure you’re aggregating, the variable on the right will be the category you’re aggregating over. So, if we want to get the average level of corruption salience in each country, we can run this:

# x = the formula, 
# FUN= should be a function you want to run for each group
corruption_salience <- aggregate(x = corrupt_salience ~ country , 
                                 FUN = mean, 
                                 data = ches )
corruption_salience
country corrupt_salience
Belgium 0.4583333
Denmark 2.6166666
Germany 4.2949736
Greece 5.5648148
Spain 4.7142857
France 2.6000000
Ireland 4.9541667
Italy 5.1000000
Netherlands 1.6666667
United Kingdom 5.8261905
Portugal 8.3333333
Austria 6.0400000
Finland 6.7222222
Sweden 1.8055556
Bulgaria 5.8909091
Czech Republic 5.1817198
Estonia 6.8750000
Hungary 6.6871693
Latvia 5.3833334
Lithuania 5.5749999
Poland 6.3730159
Romania 4.8809524
Slovakia 5.0462963
Slovenia 6.7708334
Croatia 6.8556547
Turkey 6.3571429
Norway 2.2777778
Switzerland 1.9880952
Iceland 4.5000000

dplyr method:

There’s also a dplyr method for this, that may be a little easier to parse. Here, we just use “group by” to declare grouping variable, and then use “summarize” to summarize over members of each group.

library(dplyr) # if you haven't already loaded dplyr
newtable<-ches|>
  group_by(country)|>
  summarize(avg_corruption_salience= mean(corrupt_salience))

Going forward, we’ll mostly use the dplyr commands for this class, but knowing aggregate can be useful for interpreting other people’s code.

Q7

Use either the dplyr commands or aggregate to get each party family’s average position on redistribution issues.

# your code here....

Functions

Remember that R functions are just snippets of code that we can re-use. If I have done a calculation once, I can probably write a function to apply it to more general cases.

For instance, here’s some code that I could use to get the top parties by vote share:

# this gives row number for the highest to lowest value
top_n<-order(ches$vote, decreasing=T)[1:10] 

# this subsets the values of party by the index in top_n:

ches$party[top_n]
 [1] "PL"          "PN"          "ND"          "RN"          "AKP"        
 [6] "PiS"         "GS"          "HDZ"         "Fidesz-KDNP" "LAB"        

I could probably make this more generalized by creating a function.

topnFunction <- function(labels, values, n){ # the arguments 
  top_n <- order(values, decreasing= T)[1:n] # the subset
  result<-labels[top_n] # the result  
  return(result) # the return statement
  
}

Now I can use this in a bunch of different scenarios. Maybe I want to get the top parties by seat share instead of vote share, then all I need to do is change the values argument:

topnFunction(ches$party, ches$seat, n=10)
 [1] "LAB" "AKP" "SPD" "PiS" "CHP" "PO"  "CDU" "ND"  "RN"  "PP" 

Or I can get more observations by changing n

topnFunction(ches$party, ches$seat, n=15)
 [1] "LAB"         "AKP"         "SPD"         "PiS"         "CHP"        
 [6] "PO"          "CDU"         "ND"          "RN"          "PP"         
[11] "Fidesz-KDNP" "PSOE"        "CONS"        "FDI"         "Grunen"     

Q8

How could I modify the function above to return both the party names and the values themselves?

#...your code here