Classwork 1: Indexing and subsetting

This document was created using Quarto a sort of language for combining code and text. When you’re following along in class, you can either copy-paste code directly from this page into an R-studio script file, or you can go over to the github site for this class, download the .qmd file for this classwork and then open it in R-studio and work with it as an interactive document.

Data and packages

You’ll want to start by loading the CHES data. We can do this using the read_csv function from the readr package. We could access this data set by downloading it from the CHES website and then importing it from a local file, but for .csv files like this one, we can actually download it by putting the web address directly in to our function in quotation marks:

library(readr)

ches<-read_csv('https://www.chesdata.eu/s/CHES_2024_final_v2.csv')

After you’ve imported the data, take a second to View() the data frame you just imported and see how things are coded.

Note that in the raw csv data, the party family variable is initially a set of numbers instead of a text value.

ches$family[1:5]

[1] 5 5 7 7 3

In order to convert these to text, we’ll need to consult the codebook and then convert the numeric variables to their respective text values.

R has a special built-in data type for this exact scenario called a “factor variable”.

R factors are often a source of confusion for new users. Under the hood, a factor variable is just a set of numbers with an extra piece of data that maps those numeric values to text labels. We could also store variables like this as regular text values, but factor variables are helpful because they can make it easier to do things like fix the ordering of survey response items when we’re making a graph or table.

We’ll talk more about working with factors later on, but for now it’s sufficient to know that we’re doing this to turn the numbers into readable text:

labels<-c("Radical Right",
          "Conservatives",
          "Liberal", 
          "Christian-Democratic",
          "Socialist",
          "Radical Left",
          "Green", 
          "Regionalist", 
          "No family",
          "Confessional",
          "Agrarian/Center")

ches$family<-factor(ches$family, labels=labels)

We’ll do the same thing for the “country” variable:

country_levels<-c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 20, 21, 22, 
          23, 24, 25, 26, 27, 28, 29, 31, 34, 35, 36, 37, 38, 40, 45)
country_labels<-c("Belgium", "Denmark", "Germany", "Greece", "Spain", "France", 
          "Ireland", "Italy", "Netherlands",  "United Kingdom", "Portugal", 
          "Austria", "Finland", "Sweden", "Bulgaria", "Czech Republic",  
          "Estonia", "Hungary", "Latvia", "Lithuania","Poland", "Romania", 
          "Slovakia", "Slovenia", "Croatia", "Turkey", "Norway", "Switzerland", 
          "Malta", "Luxembourg", "Cyprus", "Iceland")

ches$country<-factor(ches$country, levels=country_levels, labels=country_labels)

Take a minute to View() the data set and see how things are coded now.

Using a pipe

A lot of code you’ll see in this course will use the “pipe” operator. Pipes are a way of making code more readable by chaining together a set of operations so that they read from left to right.

Consider a script that creates 100 random numbers, exponentiates them, sorts them from highest to lowest, and then plots the result. One way to do this would be with a set of nested commands like this:

set.seed(100)
# get 100 samples from a normal distribution with mean zero and sd =1 and plot them
plot(sort(exp(rnorm(100))))

Alternatively, we could write each step on a separate line and assign the results to a new variable:

set.seed(100)
# get 100 samples from a normal distribution with mean zero and sd =1 
x<-rnorm(100)
# exponentiate it
exp_x<-exp(x)
# sort it 
sorted_x<-sort(exp_x)
# plot the result
plot(sorted_x)

Using the pipe operator allows us to perform this same operation without the nested parentheses or the creation of intermediate variables:

set.seed(100)
rnorm(100)|>
  exp()|>
  sort()|>
  plot()

Note that the pipe command will default to using the left-hand-side object as the first argument for the right hand side, but you can explicitly reference the left-hand-side variable using the _. This is especially useful for functions that don’t take “data” as their first argument. One example of this is the lm() command:

# This gives an error:
ches|>
  lm(eu_position ~ immigrate_policy)

Error in as.data.frame.default(data): cannot coerce class '"formula"' to a data.frame

# this works
ches|>
  lm(eu_position ~ immigrate_policy, data=_)


Call:
lm(formula = eu_position ~ immigrate_policy, data = ches)

Coefficients:
     (Intercept)  immigrate_policy  
          6.8833           -0.3641

The %>% is a version of the pipe that is associated with the magrittr package. This mostly works just like the |> pipe that is part of base R, but it has some minor differences such as using . instead of _ to reference data. (I mention it here because you might see it elsewhere, especially in code that pre-dates the introduction of a native pipe operator)

library(magrittr)
ches%>%
  lm(eu_position ~ immigrate_policy, data=.)


Call:
lm(formula = eu_position ~ immigrate_policy, data = .)

Coefficients:
     (Intercept)  immigrate_policy  
          6.8833           -0.3641

Q1

Restate the following operation using a sequence of pipes

chesfam<-subset(ches, select=family) # retrieving only the families column

chesfam<-table(chesfam) # creating a frequency table

chesfam<-sort(chesfam) # sorting from lowest to highest

# type your code here...

Q2

Use one of the pipe operators to retrieve only the countries that are EU members (you’ll want to use the subset function for this)

# type your code here...

Q3

Use the pipe operator to get the square root of the variance for lrecon (which measures the left-right positioning for each party)

# type your code here...

Subsetting

Remember we have several options for sub-setting data sets. The most basic is by using a logical operator. For instance: the expression ches$family=="Confessional" would return TRUE for any element where the value of family was “Confessional” and FALSE for any element where it didn’t. Now I can use this logical vector to subset my data frame, returning all rows for confessional parties:

confessional_parties<-ches[ches$family=="Confessional", ]


confessional_parties

country	party_id	party	family	electionyear	vote	seat	epvote	eu_position	eu_salience	eu_dissent	eu_blur	lrecon	lrecon_blur	lrecon_dissent	lrecon_salience	galtan	galtan_blur	galtan_dissent	galtan_salience	lrgen	immigrate_policy	immigrate_salience	immigrate_dissent	multiculturalism	multicult_salience	redistribution	redist_salience	climate_change	climate_change_salience	environment	environment_salience	spendvtax	deregulation	civlib_laworder	womens_rights	lgbtq_rights	samesex_marriage	religious_principles	ethnic_minorities	nationalism	urban_rural	protectionism	regions	executive_power	judicial_independence	corrupt_salience	anti_islam	people_v_elite	anti_elite_salience	eu_foreign	eu_intmark	eu_russia
Greece	419	Niki	Confessional	2023	0.00	0	4.37	2.777778	3.555556	1.000000	6.4	5.666666	6.250000	1.500000	2.833333	9.500000	0.2500000	1.166667	9.300000	9.200000	9.500000	8.888889	0.3333333	9.625000	7.375000	5.750000	2.600000	8.000000	1.3333334	5.000000	1.000	6.000000	4.333334	9.000000	9.200000	9.500000	10.000000	9.600000	9.800000	9.714286	NA	10.000000	7.500000	8.000000	NA	3.50	6.666666	6.00	8.5000000	2.5	2.000000	9.50
Netherlands	1006	SGP	Confessional	2023	2.08	3	3.70	2.750000	3.454546	2.250000	2.5	7.200000	1.000000	2.333333	4.083334	9.416667	0.3750000	0.500000	9.500000	8.000000	8.666667	5.750000	1.5000000	8.600000	6.875000	4.857143	3.625000	6.200000	3.0000000	6.000000	2.000	NA	6.666666	8.250000	9.800000	9.833333	9.800000	9.857142	8.000000	5.666666	8.571428	6.000000	3.500000	5.250000	1.000000	1.00	4.000000	1.50	1.7777778	1.5	3.666667	4.00
Netherlands	1016	CU	Confessional	2023	2.04	3	2.89	4.583334	4.000000	4.000000	1.0	4.333334	1.600000	3.000000	5.666666	6.916666	1.5714285	2.500000	7.250000	4.583334	4.250000	5.166666	3.1666667	5.272728	5.222222	4.111111	4.666666	3.000000	4.7500000	3.800000	4.000	4.000000	4.000000	4.200000	4.600000	6.200000	5.400000	9.000000	4.000000	3.666667	6.333334	5.666666	4.333334	1.600000	0.500000	1.00	2.666667	1.50	0.8888889	3.5	5.000000	3.00
Finland	1409	KD	Confessional	2023	4.20	5	4.12	4.000000	5.111111	3.142857	5.0	7.000000	5.166666	4.200000	5.818182	8.818182	2.2000000	1.333333	8.000000	7.545454	7.181818	4.909091	2.4000001	8.090909	4.800000	5.700000	5.090909	6.400000	3.5999999	5.800000	3.800	6.875000	5.000000	6.500000	5.000000	9.000000	10.000000	9.400000	6.833334	7.200000	5.833334	6.000000	5.428571	3.600000	3.400000	6.00	3.333333	2.25	2.2857144	3.0	3.750000	1.75
Turkey	3416	YRP	Confessional	2023	2.81	5	NA	1.250000	1.538462	0.200000	1.0	6.166666	5.666666	0.800000	6.785714	9.785714	1.0000000	1.250000	8.714286	9.142858	6.750000	6.307692	1.9090909	7.416666	7.416666	4.428571	5.461538	6.000000	0.6666667	6.333334	2.375	3.875000	4.500000	8.285714	9.285714	10.000000	10.000000	10.000000	7.500000	8.571428	5.428571	7.000000	7.000000	8.666667	6.833334	4.75	7.333334	2.40	7.8571429	NA	NA	NA
Switzerland	3607	EVP/PEV	Confessional	2023	1.95	2	NA	3.900000	3.800000	3.333333	4.5	5.090909	5.000000	NA	3.428571	7.250000	0.6666667	2.000000	7.900000	5.636363	4.750000	2.833333	2.4000001	7.375000	5.600000	4.111111	4.333334	5.333334	4.0000000	3.800000	6.000	3.666667	NA	4.400000	8.333333	6.800000	7.000000	8.285714	6.000000	6.600000	6.250000	6.000000	4.000000	4.500000	1.750000	2.00	2.000000	8.20	1.3333334	NA	NA	NA
Switzerland	3608	EDU/UDF	Confessional	2023	1.23	2	NA	1.833333	4.000000	1.000000	3.0	7.222222	5.666666	NA	3.571429	9.250000	0.3333333	1.000000	8.700000	8.750000	8.636364	6.888889	0.8571429	8.727273	7.714286	7.000000	3.600000	6.000000	2.6666667	7.666666	3.000	7.250000	NA	8.000000	8.750000	9.500000	8.833333	8.714286	8.750000	9.000000	7.200000	8.000000	3.666667	7.000000	3.333333	2.50	2.000000	9.20	4.0000000	NA	NA	NA

We can also chain together multiple logical expressions to get a more specific slice of the data. For instance, maybe I only want agrarian parties whose position on economic issues is greater than or equal to 5. I could get by combining two logical comparisons with an ampersand:

right_agrarians<-ches[ches$family == "Agrarian/Center" & ches$lrecon>=5,]



right_agrarians

country	party_id	party	family	electionyear	vote	seat	epvote	eu_position	eu_salience	eu_dissent	eu_blur	lrecon	lrecon_blur	lrecon_dissent	lrecon_salience	galtan	galtan_blur	galtan_dissent	galtan_salience	lrgen	immigrate_policy	immigrate_salience	immigrate_dissent	multiculturalism	multicult_salience	redistribution	redist_salience	climate_change	climate_change_salience	environment	environment_salience	spendvtax	deregulation	civlib_laworder	womens_rights	lgbtq_rights	samesex_marriage	religious_principles	ethnic_minorities	nationalism	urban_rural	protectionism	regions	executive_power	judicial_independence	corrupt_salience	anti_islam	people_v_elite	anti_elite_salience	eu_foreign	eu_intmark	eu_russia
Finland	1403	KESK	Agrarian/Center	2023	11.3	23	11.76	4.818182	5.600000	3.857143	4.500000	5.818182	4.333334	4.600000	6.363637	6.454546	4.600	4.666666	4.636363	5.727272	5.636363	4.818182	4.9090910	6.272728	5.272728	5.090909	6.090909	5.800000	4.800000	5.500000	5.333334	5.25	5.333334	5.500000	3.833333	5.40	5.6666665	6.4	5.833334	5.8	9.166667	6.000000	2.285714	3.600000	1.8	6.000000	2.500000	1.750000	3.714286	3.5	4.400000	1.500000
Sweden	1603	C	Agrarian/Center	2022	6.7	24	7.29	6.105263	4.526316	3.500000	3.153846	7.842105	2.090909	2.500000	7.000000	2.947368	2.375	2.400000	6.631579	5.947369	3.421053	6.157895	3.3157895	3.315790	6.000000	6.578947	5.368421	2.888889	7.222222	2.500000	7.500000	7.50	8.533334	3.100000	2.444444	1.25	0.8888889	2.5	3.090909	2.9	6.777778	1.222222	4.285714	4.333334	1.0	1.777778	2.142857	2.545454	2.166667	4.4	6.444445	2.666667
Iceland	4503	F	Agrarian/Center	2021	7.8	5	NA	2.000000	3.000000	3.000000	2.800000	5.166666	3.333333	2.333333	6.333334	5.166666	3.200	6.000000	3.000000	5.500000	5.333334	4.666666	3.6666667	6.666666	4.166666	4.833334	5.333334	4.000000	4.333334	6.333334	2.666667	4.00	4.400000	5.333334	2.000000	2.00	0.5000000	0.0	4.000000	8.0	9.250000	8.333333	3.000000	5.666666	3.0	1.000000	0.500000	1.000000	0.000000	NA	NA	NA
Iceland	4504	M	Agrarian/Center	2021	12.1	8	NA	1.166667	5.833334	3.000000	0.400000	8.333333	2.666667	1.000000	7.666666	8.833333	2.400	1.000000	8.166667	8.500000	9.000000	8.833333	0.6666667	9.500000	9.000000	7.200000	7.800000	9.666667	8.666667	6.333334	6.000000	9.00	7.200000	8.333333	3.333333	5.50	2.6666667	0.0	8.000000	9.5	8.500000	7.666666	3.500000	5.666666	4.0	3.000000	3.000000	5.000000	5.333334	NA	NA	NA

We can also use the subset function to do the same operation above. Note that this function expects a data set as its first argument, and, once we’ve specified that, we can reference columns within that data set without needing to use the $ notation:

subset(ches, family == "Confessional")

country	party_id	party	family	electionyear	vote	seat	epvote	eu_position	eu_salience	eu_dissent	eu_blur	lrecon	lrecon_blur	lrecon_dissent	lrecon_salience	galtan	galtan_blur	galtan_dissent	galtan_salience	lrgen	immigrate_policy	immigrate_salience	immigrate_dissent	multiculturalism	multicult_salience	redistribution	redist_salience	climate_change	climate_change_salience	environment	environment_salience	spendvtax	deregulation	civlib_laworder	womens_rights	lgbtq_rights	samesex_marriage	religious_principles	ethnic_minorities	nationalism	urban_rural	protectionism	regions	executive_power	judicial_independence	corrupt_salience	anti_islam	people_v_elite	anti_elite_salience	eu_foreign	eu_intmark	eu_russia
Greece	419	Niki	Confessional	2023	0.00	0	4.37	2.777778	3.555556	1.000000	6.4	5.666666	6.250000	1.500000	2.833333	9.500000	0.2500000	1.166667	9.300000	9.200000	9.500000	8.888889	0.3333333	9.625000	7.375000	5.750000	2.600000	8.000000	1.3333334	5.000000	1.000	6.000000	4.333334	9.000000	9.200000	9.500000	10.000000	9.600000	9.800000	9.714286	NA	10.000000	7.500000	8.000000	NA	3.50	6.666666	6.00	8.5000000	2.5	2.000000	9.50
Netherlands	1006	SGP	Confessional	2023	2.08	3	3.70	2.750000	3.454546	2.250000	2.5	7.200000	1.000000	2.333333	4.083334	9.416667	0.3750000	0.500000	9.500000	8.000000	8.666667	5.750000	1.5000000	8.600000	6.875000	4.857143	3.625000	6.200000	3.0000000	6.000000	2.000	NA	6.666666	8.250000	9.800000	9.833333	9.800000	9.857142	8.000000	5.666666	8.571428	6.000000	3.500000	5.250000	1.000000	1.00	4.000000	1.50	1.7777778	1.5	3.666667	4.00
Netherlands	1016	CU	Confessional	2023	2.04	3	2.89	4.583334	4.000000	4.000000	1.0	4.333334	1.600000	3.000000	5.666666	6.916666	1.5714285	2.500000	7.250000	4.583334	4.250000	5.166666	3.1666667	5.272728	5.222222	4.111111	4.666666	3.000000	4.7500000	3.800000	4.000	4.000000	4.000000	4.200000	4.600000	6.200000	5.400000	9.000000	4.000000	3.666667	6.333334	5.666666	4.333334	1.600000	0.500000	1.00	2.666667	1.50	0.8888889	3.5	5.000000	3.00
Finland	1409	KD	Confessional	2023	4.20	5	4.12	4.000000	5.111111	3.142857	5.0	7.000000	5.166666	4.200000	5.818182	8.818182	2.2000000	1.333333	8.000000	7.545454	7.181818	4.909091	2.4000001	8.090909	4.800000	5.700000	5.090909	6.400000	3.5999999	5.800000	3.800	6.875000	5.000000	6.500000	5.000000	9.000000	10.000000	9.400000	6.833334	7.200000	5.833334	6.000000	5.428571	3.600000	3.400000	6.00	3.333333	2.25	2.2857144	3.0	3.750000	1.75
Turkey	3416	YRP	Confessional	2023	2.81	5	NA	1.250000	1.538462	0.200000	1.0	6.166666	5.666666	0.800000	6.785714	9.785714	1.0000000	1.250000	8.714286	9.142858	6.750000	6.307692	1.9090909	7.416666	7.416666	4.428571	5.461538	6.000000	0.6666667	6.333334	2.375	3.875000	4.500000	8.285714	9.285714	10.000000	10.000000	10.000000	7.500000	8.571428	5.428571	7.000000	7.000000	8.666667	6.833334	4.75	7.333334	2.40	7.8571429	NA	NA	NA
Switzerland	3607	EVP/PEV	Confessional	2023	1.95	2	NA	3.900000	3.800000	3.333333	4.5	5.090909	5.000000	NA	3.428571	7.250000	0.6666667	2.000000	7.900000	5.636363	4.750000	2.833333	2.4000001	7.375000	5.600000	4.111111	4.333334	5.333334	4.0000000	3.800000	6.000	3.666667	NA	4.400000	8.333333	6.800000	7.000000	8.285714	6.000000	6.600000	6.250000	6.000000	4.000000	4.500000	1.750000	2.00	2.000000	8.20	1.3333334	NA	NA	NA
Switzerland	3608	EDU/UDF	Confessional	2023	1.23	2	NA	1.833333	4.000000	1.000000	3.0	7.222222	5.666666	NA	3.571429	9.250000	0.3333333	1.000000	8.700000	8.750000	8.636364	6.888889	0.8571429	8.727273	7.714286	7.000000	3.600000	6.000000	2.6666667	7.666666	3.000	7.250000	NA	8.000000	8.750000	9.500000	8.833333	8.714286	8.750000	9.000000	7.200000	8.000000	3.666667	7.000000	3.333333	2.50	2.000000	9.20	4.0000000	NA	NA	NA

# OR:
# ches |> subset(family=="Confessional")

Another option we’ll explore more in the future is the filter argument from the dplyr library. The syntax for this is almost identical to the syntax for subset, and it also takes a dataset as its first argument:

library(dplyr)
ches|>filter(family=="Confessional")

country	party_id	party	family	electionyear	vote	seat	epvote	eu_position	eu_salience	eu_dissent	eu_blur	lrecon	lrecon_blur	lrecon_dissent	lrecon_salience	galtan	galtan_blur	galtan_dissent	galtan_salience	lrgen	immigrate_policy	immigrate_salience	immigrate_dissent	multiculturalism	multicult_salience	redistribution	redist_salience	climate_change	climate_change_salience	environment	environment_salience	spendvtax	deregulation	civlib_laworder	womens_rights	lgbtq_rights	samesex_marriage	religious_principles	ethnic_minorities	nationalism	urban_rural	protectionism	regions	executive_power	judicial_independence	corrupt_salience	anti_islam	people_v_elite	anti_elite_salience	eu_foreign	eu_intmark	eu_russia
Greece	419	Niki	Confessional	2023	0.00	0	4.37	2.777778	3.555556	1.000000	6.4	5.666666	6.250000	1.500000	2.833333	9.500000	0.2500000	1.166667	9.300000	9.200000	9.500000	8.888889	0.3333333	9.625000	7.375000	5.750000	2.600000	8.000000	1.3333334	5.000000	1.000	6.000000	4.333334	9.000000	9.200000	9.500000	10.000000	9.600000	9.800000	9.714286	NA	10.000000	7.500000	8.000000	NA	3.50	6.666666	6.00	8.5000000	2.5	2.000000	9.50
Netherlands	1006	SGP	Confessional	2023	2.08	3	3.70	2.750000	3.454546	2.250000	2.5	7.200000	1.000000	2.333333	4.083334	9.416667	0.3750000	0.500000	9.500000	8.000000	8.666667	5.750000	1.5000000	8.600000	6.875000	4.857143	3.625000	6.200000	3.0000000	6.000000	2.000	NA	6.666666	8.250000	9.800000	9.833333	9.800000	9.857142	8.000000	5.666666	8.571428	6.000000	3.500000	5.250000	1.000000	1.00	4.000000	1.50	1.7777778	1.5	3.666667	4.00
Netherlands	1016	CU	Confessional	2023	2.04	3	2.89	4.583334	4.000000	4.000000	1.0	4.333334	1.600000	3.000000	5.666666	6.916666	1.5714285	2.500000	7.250000	4.583334	4.250000	5.166666	3.1666667	5.272728	5.222222	4.111111	4.666666	3.000000	4.7500000	3.800000	4.000	4.000000	4.000000	4.200000	4.600000	6.200000	5.400000	9.000000	4.000000	3.666667	6.333334	5.666666	4.333334	1.600000	0.500000	1.00	2.666667	1.50	0.8888889	3.5	5.000000	3.00
Finland	1409	KD	Confessional	2023	4.20	5	4.12	4.000000	5.111111	3.142857	5.0	7.000000	5.166666	4.200000	5.818182	8.818182	2.2000000	1.333333	8.000000	7.545454	7.181818	4.909091	2.4000001	8.090909	4.800000	5.700000	5.090909	6.400000	3.5999999	5.800000	3.800	6.875000	5.000000	6.500000	5.000000	9.000000	10.000000	9.400000	6.833334	7.200000	5.833334	6.000000	5.428571	3.600000	3.400000	6.00	3.333333	2.25	2.2857144	3.0	3.750000	1.75
Turkey	3416	YRP	Confessional	2023	2.81	5	NA	1.250000	1.538462	0.200000	1.0	6.166666	5.666666	0.800000	6.785714	9.785714	1.0000000	1.250000	8.714286	9.142858	6.750000	6.307692	1.9090909	7.416666	7.416666	4.428571	5.461538	6.000000	0.6666667	6.333334	2.375	3.875000	4.500000	8.285714	9.285714	10.000000	10.000000	10.000000	7.500000	8.571428	5.428571	7.000000	7.000000	8.666667	6.833334	4.75	7.333334	2.40	7.8571429	NA	NA	NA
Switzerland	3607	EVP/PEV	Confessional	2023	1.95	2	NA	3.900000	3.800000	3.333333	4.5	5.090909	5.000000	NA	3.428571	7.250000	0.6666667	2.000000	7.900000	5.636363	4.750000	2.833333	2.4000001	7.375000	5.600000	4.111111	4.333334	5.333334	4.0000000	3.800000	6.000	3.666667	NA	4.400000	8.333333	6.800000	7.000000	8.285714	6.000000	6.600000	6.250000	6.000000	4.000000	4.500000	1.750000	2.00	2.000000	8.20	1.3333334	NA	NA	NA
Switzerland	3608	EDU/UDF	Confessional	2023	1.23	2	NA	1.833333	4.000000	1.000000	3.0	7.222222	5.666666	NA	3.571429	9.250000	0.3333333	1.000000	8.700000	8.750000	8.636364	6.888889	0.8571429	8.727273	7.714286	7.000000	3.600000	6.000000	2.6666667	7.666666	3.000	7.250000	NA	8.000000	8.750000	9.500000	8.833333	8.714286	8.750000	9.000000	7.200000	8.000000	3.666667	7.000000	3.333333	2.50	2.000000	9.20	4.0000000	NA	NA	NA

Q5

Use one of the sub-setting operations above to create a subset of the CHES data that only includes parties that received at least 10% of the vote in the last election:

Frequency tables

R’s table command will generate either one or two-dimensional frequency tables.

table(ches$family)


       Radical Right        Conservatives              Liberal 
                  48                   26                   46 
Christian-Democratic            Socialist         Radical Left 
                  17                   38                   26 
               Green          Regionalist            No family 
                  31                   22                   11 
        Confessional      Agrarian/Center 
                   7                    7

In some cases, it may be more useful to examine the percentages in each group, rather than the raw frequencies. You can use the prop.table function to generate proportions from an existing table object.

table_of_families<-table(ches$family)

prop.table(table_of_families)


       Radical Right        Conservatives              Liberal 
          0.17204301           0.09318996           0.16487455 
Christian-Democratic            Socialist         Radical Left 
          0.06093190           0.13620072           0.09318996 
               Green          Regionalist            No family 
          0.11111111           0.07885305           0.03942652 
        Confessional      Agrarian/Center 
          0.02508961           0.02508961

Crosstabs

Adding a second categorical variable to the table function will generate a cross tab. The first group will be presented in the rows, and the second group in the columns. In this example, I’m creating a new variable called radical_right that is TRUE if a given party is part of the radical right family. Then I’m creating a cross tab that allows me to see how many radical right parties held seats based on the year in which the election occurred:

ches$radical_right <-  ches$family == "Radical Right"


table(ches$electionyear, ches$radical_right)

      
       FALSE TRUE
  2021    39    5
  2022    41   10
  2023    80   17
  2024    71   16

In many cases, I’m interested in getting percentages rather than just raw frequencies. I can use the prop.table function to get proportions from a frequency table. By default, prop.table will just give the cell percentages:

rr_table<-table(ches$electionyear, ches$radical_right)

prop.table(rr_table)

      
            FALSE       TRUE
  2021 0.13978495 0.01792115
  2022 0.14695341 0.03584229
  2023 0.28673835 0.06093190
  2024 0.25448029 0.05734767

…But this isn’t always a useful metric. More often, we’ll want to use a cross tab to answer a question like “what percentage of people in group X have characteristic Y?” For instance: “what percentage of parties in a countries parliament were from the radical right family” To answer this question, I’ll need to calculate:

\[ \frac{\text{Number of rad right wing parties in the data for year i}}{\text{Total number of parties in year i}} \]

Q6

Use table and prop.table function to calculate the proportion of radical right parties in each country’s parliament. You’ll want to look at the help file for the function to figure out how to do this.

# Enter your code here

The Janitor package

There are a number of packages that (ostensibly) help you make nicer looking cross tabs that can be easily exported to a report. Its worth exploring these to see which ones are most useful for you, but one I want to highlight is the janitor package, which allows you to use the tabyl command to create tables:

library(janitor)

t1 <- tabyl(dat=ches, var1=electionyear, var2=radical_right)

t1

electionyear	FALSE	TRUE
2021	39	5
2022	41	10
2023	80	17
2024	71	16

On its own, this isn’t much to write home about, but it provides a lot of nice options for adorning the table with additional statistics:


t1|>
  adorn_totals("row")|> # add totals
  adorn_percentages("row")|> # calculate percentages
  adorn_pct_formatting() # format percentages

electionyear	FALSE	TRUE
2021	88.6%	11.4%
2022	80.4%	19.6%
2023	82.5%	17.5%
2024	81.6%	18.4%
Total	82.8%	17.2%

R-Packages for presentation-ready tables

The janitor package has the advantage of having relatively simple syntax, which makes it great for exploring a data set. However, if you’re looking to include cross tabs in a report or presentation you may want to consider one of the R packages that can create more aesthetically pleasing tables. For this you might want to consider gt, kableExtra, or the modelsummary package.

Aggregation and grouping

We can use aggregate to apply a function to each level of a categorical variable. To do this we’ll use R’s formula syntax, which in general will look something like this:

y ~ x

The variable on the left will be the outcome measure you’re aggregating, the variable on the right will be the category you’re aggregating over. So, if we want to get the average level of corruption salience in each country, we can run this:

# x = the formula, 
# FUN= should be a function you want to run for each group
corruption_salience <- aggregate(x = corrupt_salience ~ country , 
                                 FUN = mean, 
                                 data = ches )
corruption_salience

country	corrupt_salience
Belgium	0.4583333
Denmark	2.6166666
Germany	4.2949736
Greece	5.5648148
Spain	4.7142857
France	2.6000000
Ireland	4.9541667
Italy	5.1000000
Netherlands	1.6666667
United Kingdom	5.8261905
Portugal	8.3333333
Austria	6.0400000
Finland	6.7222222
Sweden	1.8055556
Bulgaria	5.8909091
Czech Republic	5.1817198
Estonia	6.8750000
Hungary	6.6871693
Latvia	5.3833334
Lithuania	5.5749999
Poland	6.3730159
Romania	4.8809524
Slovakia	5.0462963
Slovenia	6.7708334
Croatia	6.8556547
Turkey	6.3571429
Norway	2.2777778
Switzerland	1.9880952
Iceland	4.5000000

dplyr method:

There’s also a dplyr method for this, that may be a little easier to parse. Here, we just use “group by” to declare grouping variable, and then use “summarize” to summarize over members of each group.

library(dplyr) # if you haven't already loaded dplyr
newtable<-ches|>
  group_by(country)|>
  summarize(avg_corruption_salience= mean(corrupt_salience))

Going forward, we’ll mostly use the dplyr commands for this class, but knowing aggregate can be useful for interpreting other people’s code.

Q7

Use either the dplyr commands or aggregate to get each party family’s average position on redistribution issues.

# your code here....

Functions

Remember that R functions are just snippets of code that we can re-use. If I have done a calculation once, I can probably write a function to apply it to more general cases.

For instance, here’s some code that I could use to get the top parties by vote share:

# this gives row number for the highest to lowest value
top_n<-order(ches$vote, decreasing=T)[1:10] 

# this subsets the values of party by the index in top_n:

ches$party[top_n]

 [1] "PL"          "PN"          "ND"          "RN"          "AKP"        
 [6] "PiS"         "GS"          "HDZ"         "Fidesz-KDNP" "LAB"

I could probably make this more generalized by creating a function.

topnFunction <- function(labels, values, n){ # the arguments 
  top_n <- order(values, decreasing= T)[1:n] # the subset
  result<-labels[top_n] # the result  
  return(result) # the return statement
  
}

Now I can use this in a bunch of different scenarios. Maybe I want to get the top parties by seat share instead of vote share, then all I need to do is change the values argument:

topnFunction(ches$party, ches$seat, n=10)

 [1] "LAB" "AKP" "SPD" "PiS" "CHP" "PO"  "CDU" "ND"  "RN"  "PP"

Or I can get more observations by changing n

topnFunction(ches$party, ches$seat, n=15)

 [1] "LAB"         "AKP"         "SPD"         "PiS"         "CHP"        
 [6] "PO"          "CDU"         "ND"          "RN"          "PP"         
[11] "Fidesz-KDNP" "PSOE"        "CONS"        "FDI"         "Grunen"

Q8

How could I modify the function above to return both the party names and the values themselves?

#...your code here