Since the pandemic, the homeless problem has become visible in some major cities. The U.S. government just reported an 18% increase in the homeless population in 2024. This news brings forth an intriguing statistical question: how do they count the homeless?
Do they conduct a poll with some kind of random digit dialing, and compute the proportion of respondents who admit they are homeless? (Just kidding…)
HUD (Department of Housing and Urban Development) reports the homeless statistic as a count: 772K people are flagged as homeless, which is 18% higher than the count in 2023. How do they know this? Do they go around and find every homeless?
A moment’s thought quickly dispels that as impractical, well nigh impossible. To start with, the number of homeless people changes every moment in time. Is the 772K a snapshot or some kind of average? According to the Bloomberg article, it’s an “annual point-in-time estimate.” So, the bean-counters conduct a homeless tally once a year, on some specific date.
Even that should be impractical. Do they run around every neighborhood in all 50 states to find every homeless person hanging out on the streets that day? To learn more, I found my way to this HUD guide (link).
Reading this guide is like peeling an onion. Each answer leads to more questions; to their credit, the guide covers most of what we need in order to understand the count.
First, they provide a clear definition of homeless. Only people who meet the inclusion criteria are counted.
A homeless person can be “sheltered” or “unsheltered”. Sheltered homeless people are those who checked into shelters temporarily while unsheltered are those sleeping outside, at train stations, etc.
Notably, if some community offers permanent housing to the homeless, they are no longer regarded as homeless. So hypothetically, the homeless problem could be “solved” by giving away free housing, i.e. by affluence. (Before you jump all over this, I put “solved” in quotes for a reason.)
Sheltered homeless is easier to count as most organizations that offer temporary shelters to homeless receive public funding, and thus must satisfy reporting requirements. To obtain the sizes of these populations, it is possible to just extract data from a database. There are still potential problems, such as incomplete or inaccurate reporting.
Unsheltered homeless is counted in a more labor-intensive way by having workers approach them to conduct a survey. The survey isn’t strictly necessary if all we want is a count but since the government also wants to study different subgroups of homeless (e.g. homeless youth), the workers attempt to interview the homeless people they encounter. The guide also discloses that each interviewee receives “incentives” to participate – not shocking as many businesses also pay people to fill out questionnaires.
***
Having reduced the problem to counting the homeless who are sleeping outside, we’re still stuck with the prospect of running around whole regions to find every such person! This is still impractical for any sizable jurisdiction. We therefore need statistics. Counting everyone is called a “census”. Instead of counting everyone, we use statistical sampling to arrive at an estimate of the census tally.
Sampling means we don’t try to count everyone. Instead, we count some of the population, and then use math to extrapolate. How does HUD decide what to count?
This topic is unexpectedly rather complicated. The first sampling dimension is time. We want to describe the extent of homelessness for the whole year of 2024 but we certainly won’t be counting people every day of the year. As mentioned above, HUD only counts unsheltered homeless on a single day of 2024. Naturally, we should wonder why that single day (in late January) is selected as representative.
The HUD guide answers this question directly. It turns out they make no claims that the snapshot tally is “representative”. Even if it’s not the elusive number we’re chasing after, this estimate is still useful – imagine that the homeless condition has worsened generally across the U.S., however you want to define it, we’d expect that deterioration to be reflected in the single-day tally as well.
Further, HUD makes this amusing claim: they picked the last week of January because they expect to find the smallest unsheltered homeless population during that time of the year. Is this a brazen attempt to under-report the total? Here’s how they explain it: In many regions, the cold winter weather drives homeless to check into shelters, rather than sleeping outside. That’s why they expect the relative proportion of unsheltered homeless to reach a seasonal low. Because HUD staff believe that counting sheltered homeless is easier, and less prone to inaccuracy, compared to counting unsheltered homeless, they conduct the counting exercise in January each year.
The other benefit is labor saving. Counting the unsheltered homeless is labor intensive.
By restricting the counting to a single day, we have reduced the workload by several hundred times. But any labor saving comes with a cost – if someone wants to extrapolate this snapshot to the entire year, much more work is involved.
[A further nuance: it’s too hard to coordinate all jurisdictions to count on a single night, or for larger communities, to finish counting on one night. In practice, there is a week during which the counting takes place. Moving from one night to multiple nights creates the risk of duplicate counting as the homeless population is fluid. HUD spends quite a bit of energy to make sure the same person isn’t counted multiple times.]
The task at hand is still too daunting. Just imagine having to run around every corner of Manhattan to count all homeless people on a single night. Thus, the task is further condensed by sampling along a second dimension: geography. I’ll discuss this topic in the next post.
Comments
You can follow this conversation by subscribing to the comment feed for this post.