# FPL data frame
<-data.frame("household_size" = c(1, 2, 3, 4, 5, 6, 7, 8),
fpl"poverty_guideline" = c(15650, 21150, 26650, 32150, 37650, 43150, 48650, 54150)
)
# amount to add for each person beyond the max
<- 5500 add_beyond_max
Classwork 2: Control flow and functions
Loops, functions and control flow
Part I. FPL data
The U.S. Department of Health and Human Services publishes new poverty thresholds each year. These thresholds are based on household size, and are used to determine who qualifies for federal benefits such as food stamps or Medicaid.
Here’s what the levels looked like in 2025 for the 48 contiguous states and DC:
Persons in family/household | Poverty guideline |
---|---|
1 | $15,650 |
2 | $21,150 |
3 | $26,650 |
4 | $32,150 |
5 | $37,650 |
6 | $43,150 |
7 | $48,650 |
8 | $54,150 |
9+ | add $5,500 for each additional person beyond the 8th |
We want to create a couple of R functions that will make it easy to run some FPL calculations without having to leave R, but we’ll work through that process a little bit at a time.
To save you some time, here’s code to get that 2025 data in to R:
Question 1.
Write a function that takes a household size and an income level and returns TRUE if that person is below the federal poverty level for 2025. The function should work even if the household contains more than 8 people, so you’ll probably need an “if” statement somewhere to manage how things are calculated when there are 9 or more household members.
# your code here
You can use which
to get the index or indices where a boolean vector evaluates as TRUE. For instance:
<- c("A", "B", "C", "D", "A")
vec
# boolean vector
== "D" vec
[1] FALSE FALSE FALSE TRUE FALSE
# indices where the boolean returns true:
which(vec == "D")
[1] 4
You probably don’t need to use a loop here at all if you use a “which” statement.
Getting FPL data for specific years
HHS also provides an online service, called an API, that makes it easier for programmers to automate the process of retrieving poverty thresholds. We can get FPL data in a machine-readable format by visiting a URL with the following structure:
https://aspe.hhs.gov/topics/poverty-economic-mobility/poverty-guidelines/api/[YEAR]/[STATE]/[HOUSEHOLD_SIZE]
You’ll just replace [YEAR]
with a specific year, replace [STATE]
with US
to get FPL for the lower 48 states, and replace [HOUSEHOLD_SIZE]
with a number to indicate the number of people living in a household.
So, visiting this link:
https://aspe.hhs.gov/topics/poverty-economic-mobility/poverty-guidelines/api/2024/US/3
… would give us the poverty level for a family of 3 in 2024 in the lower 48 states and DC.
Question 2.
Create a function that takes a year and a household size and constructs a valid URL like the one above. This will require you to do some string concatenation. The simplest way to do this will be with paste0
, but you could also do it with sprintf
or glue
(from the glue package).
# Your code here
Using a function to retrieve data
Visit one of those URLs you created in the previous question, you’ll see some data that looks like this (maybe without the indentation, depending on your browser!):
{
"data": {
"year": "2024",
"household_size": "3",
"income": "25820",
"state": "us"
},
"method": "GET",
"status": 200
}
This is a data exchange format called json
(Javascript Object Notation) that’s common for this sort of online service. We’ll return to JSON objects in a later class, but for now it’s sufficient to know that JSON data consists of a set of key:value
pairs, and that JSON objects will often have a nested structure where a key value pair contains more key-value pairs (think of it like a filing cabinet where a file folder might contain additional documents or even have more folders inside it)
We can read this kind of data into R directly from a URL using the jsonlite
package:
library(jsonlite)
<-fromJSON('https://aspe.hhs.gov/topics/poverty-economic-mobility/poverty-guidelines/api/2024/US/3')
fpl
# str tells us more about the structure of an object:
str(fpl)
List of 3
$ data :List of 4
..$ year : chr "2024"
..$ household_size: chr "3"
..$ income : chr "25820"
..$ state : chr "us"
$ method: chr "GET"
$ status: int 200
You’ll notice that R interprets this data as a list object with 3 elements (data
, method
, and status
), but the data
element is itself another list with length 4.
We can use the $
operator to retrieve named parts of this list, and we can chain together multiple $
indices to dig down into those nested lists.
For this question, we’ll just need the value of income
. We can retrieve that by writing:
$data$income fpl
[1] "25820"
Question 3A.
Write a function that takes a year and household size and returns the poverty level (you can use the function you created for question 2 as a jumping off point here.)
# Your code here
Question 3B.
FPL data is only available from 1983 until the current year, and household sizes must be a number greater than zero. To prevent users from trying to retrieve invalid data, add a check in your function that ensures that the values of year
and householdsize
are valid and have it throw an error message if they’re not.
Hint: Use the stop
function to cause code to fail with an error message. You can get the current year using a combination of Sys.Date()
and substr()
# Your code here
Question 4.
Use a loop and the function you defined in Question 3 to retrieve the FPL for families of sizes 1, 2, 3 and 4 for the years 2022, 2024, and 2025. You should have 12 different values: one for each combination of family size and year.
Setting up the for loop here might be a little tricky. One way to handle this is with a nested loop where the outer loop iterates over different family sizes and the inner loop iterates over different years (or vice versa).
Here’s an example of a nested loop that iterates over a vector of letters and over a vector of numbers. Notice that the inner loop runs multiple times for each iteration of the outer loop:
<-c('a', 'b')
letters<-c(1, 2, 3)
numbers
for(i in letters){
for(j in numbers){
print(paste0(i, " and ", j))
} }
[1] "a and 1"
[1] "a and 2"
[1] "a and 3"
[1] "b and 1"
[1] "b and 2"
[1] "b and 3"
Alternatively, you can use expand.grid
to create a data frame with all combinations of one or more vectors and then use a single loop that iterates over each row of the resulting data frame. Here’s an example of using expand.grid
:
<-expand.grid(
valuesletters = c('a', 'b'),
numbers = c(1, 2, 3)
)print(values)
letters numbers
1 a 1
2 b 1
3 a 2
4 b 2
5 a 3
6 b 3
# Your code here
Note on sending requests
In the code above the fromJSON
function is sending a request to the HHS website to retrieve data. Since requesting data from a remote server always carries some overhead, we want to be careful about writing code that sends lots of requests in quick succession. If we were running this function hundreds or thousands of times, there’s a good chance we would encounter significant slow downs, or even find ourselves temporarily blocked from sending additional requests to the HHS servers.
At a minimum, we want to minimize the number of redundant requests we send. For instance: if we had 20 families all with the same household size, we would want to avoid running fromJSON
20 times. Instead, we would want to send a single request for each unique household size, save the result to a variable within R, and then just use our own local copy of the data each time we encountered a family of the same size instead of calling fromJSON
again.
Part II. Using a loop to calculate a Jackknife standard error
The jackknife is a method for calculating a standard error when the sampling distribution of a parameter is unknown. We know a lot about the sampling distribution of means, sums, or proportions because of the CLT, but the CLT doesn’t apply to measures like the median. So how would we get a confidence interval around an estimated median?
The jackknife method provides a means for estimating this distribution using the variability in the sample itself. The process works by creating N simulated data sets from our original sample, where each simulated data set contains all but one of the original observations. The variability from these simulated data sets is then used to model the variability in the population.
Here’s what the jackknife process for calculating a standard error looks like in pseudo-code:
Question 5.
Write code to calculate the standard error of the median of X using the jackknife method:
<- c(7, 10, -8, -6, 1, 10, 9, 10, -1, 4, -1, 1, -6, -1, -4, 1, -2) x
Hint 1: To simplify some of the coding here, I’m providing the R code for the final step of the jackknife algorithm. n
should be the sample size, and v
should be the vector of sample medians that you calculated in your loop.
# n is the sample size. v should be the vector of medians
<- sqrt(((n - 1)/n) * sum((v - mean(v))^2)) jack.se
Hint 2: You can drop a single element from a vector using a negative index. For instance:
<- c(3, 6, 8, 12)
values # dropping the third element of values:
-3] values[
[1] 3 6 12