In the previous post, I explored how the U.S. government counts the homeless population. We learn that they don’t try to count everyone year-round but estimate the count on a single day of the year. But they don’t really count every homeless person everywhere on one day, either. In this post, I look into how they handle the sampling dimension of geography.
Each jurisdiction (dubbed CoC in the HUD Guide) divides its territory into sub-regions. The sub-regions are labeled high- or low-density based on the anticipated number of homeless. The sub-regions are “sampled”: not all sub-regions are counted; those sub-regions without actual counts are estimated. All high-density sub-regions are chosen, plus a selection of low-density regions. This sampling scheme makes a lot of sense, as it expends the most effort in places where most homeless reside.
Elsewhere, the HUD Guide discloses that some regions are completely excluded, e.g. suburban sub-regions that have not reported any homeless people for a few years might be removed in subsequent years. Thus, there are really three sampling “strata”: high-density sub-regions are 100% sampled, zero-density sub-regions are 0% sampled, and the low-density sub-regions are partially sampled (the counts in those low-density areas that are not sampled will be subsequently estimated).
Unfortunately, the authors of the HUD Guide call the above sampling scheme a “Random Sample of Areas”. For example, on page 12, the guide misrepresents the sampling scheme as:
An example of a random geographic sample is a CoC [what I’ve been calling a jurisdiction] that selects some geographic areas (e.g. according to Census tracts or, city blocks) to represent the entire CoC geography. The data collected from the homeless people counted and interviewed in these regions would then be adjusted to represent the overall homeless population.
Without reading the details of the sampling scheme, one is misled to believe that every sub-region has equal chance of being selected, which isn’t an appropriate way of sampling these sub-regions, nor is this the recommended sampling scheme described later in the Guide.
Now, your numbersense might have detected something I slipped past you a few paragraphs ago... how is it that they know which sub-regions have high or low numbers of homeless before they start counting? Seems like there is a chicken-and-egg issue. This is indeed the case: in order to reduce the workload, we need to send more workers to high-density areas but if we know nothing about the homeless population, how can we designate some sub-regions as high-density?
The truth is any statistical procedure involves some guesswork. In this case, because HUD isn't dealing with homeless for the time time ever in 2024, they can use historical data as a guide; in addition, experts working at organizations that work daily with homeless people have valuable insights. Imagine a sub-region that has experienced a recent surge in homeless. The historical data would be misleading in this case but the expert knowledge would likely be informative.
***
Now, consider what happens inside a sub-region. We ask the same question as before: do we just walk around the entire area and count (and survey) every homeless person we encounter? In the HUD Guide, this method of counting is called “complete coverage count”.
Most large jurisdictions probably practice something called “known locations count” method. This involves compiling a list of all locations that homeless people tend to stay at overnight, and visiting those locations. This labor-saving tactic, just like the other one we’ve seen, carries the risk of under-counting. They also must work with experts to figure out all the spaces where the homeless reside.
The Guide also allows something called “service-based” count. This is another fascinating concept: the staff visits social service centers, such as food banks, that homeless people might visit during the day. This count is materially different from the other counts in two ways: the counting happens during the day (not at night); the counting does not happen on the night of the counting day but during a seven-day window that starts after the night of counting.
This "service-based" count is a remedial count; it is used to correct the initial night count. The assumption is that some homeless will be missed on the night of counting but they might show up at social service centers during the day. Deduplication is really important here. They must make sure the same person isn't counted twice. This is the reason why the service-based count doesn't commence until the main counting is finished.
***
If you still think counting people is easy, I'll leave you with one further complication.
Since the unsheltered homeless usually aren't found at their sleeping locations during the day, the counting takes place during the designated night, but the window of counting spans midnight. So, if left to our own devices, some of us will write in one date while the rest of us will write in the other date as the day of the count.
The Guide explicitly stipulates that every worker should use the date of the start of the counting window as the date of the count.
***
I was expecting the whole exercise of counting the homeless to be a mess but what I found instead was a well-written, quite comprehensive documentation of the methodology used by HUD workers to measure the homeless problem.
Recent Comments