One of the persistent delusions about people running numbers is that they produce "objective" data. This delusion is frequently delivered in a self-righteous manner, directed at people who advocate adjusting the raw data. No no no, they say, you can't change the data because you are injecting subjectivity into the sacred raw data!
They claim that any adjustment is like injecting additives to pure, wholesome, raw foods.
In fact, all data are subjective. Even if the data quality is perfect, all data are still subjective because of measurement. This is one of the running themes in my second book Numbersense (link). The first chapter addresses how school administrators can manipulate formulas to gain an edge in school rankings. The formulas are fixed but an endless number of tactics delivers data that result in different values for the same underlying condition.
The latest demonstration of this malfeasance is trending news due to an investigation by a Columbia math professor (link). US News has now suspended Columbia from this year's rankings. What's shocking is their one bad apple framing. As documented in Chapter 1 of Numbersense (link), fraudulent data collection pervades all rankings at all levels of education: in any jurisdiction that matters enough to merit investigation, fraud has been revealed.
One of my favorite examples of fraudulent data collection from that chapter also happens to be an example of a "professional foul", i.e. a foul that is obvious to any outsider but deliberately oblivious to all insiders who partake and benefit from the foul.
This amusing example is the pushing of the "Common App", which lowers the barrier of application for any college applicant. Whereas in the past, each college has its own idiosyncratic application, post Common App, it takes about the same time to apply to one school as 15 schools.
What the emergence of the Common App does is to "raise all boats". All selectivity rates (proportion of applicants admitted) will go up at all schools that participate in the Common App because the denominator (number of applicants) is jacked by the Common App. If schools are increasing their class sizes at all, the rate of increase in enrollment does not catch up to the rate of growth in applicants due to the Common App.
So - is selectivity rate of colleges an objective data point? In my view, it is a subjective metric, just like every other metric, easily manipulated.
***
This post is inspired by a statement I read about Wework - the controversial startup that rents office space without long-term contracts. (Disclosure: I'm a fan of Wework. I think it markets a product that delivers clear value to users without doing harm, unlike many startups that either delivers zero value or hides the harm done behind "convenience" and other positives. Wework's problem has always been figuring out a business model.)
The CEO claims that Wework's occupancy rate in this last quarter matched that of prepandemic levels, defined as late 2019.
Occupancy rate sounds like one of those pure, raw metrics that has an official definition, thus an analyst can report an objective value for it, beyond scrutiny.
Far from it. Let me point out a couple of ways to dress up this number. I'm not accusing anyone of doing anything. The following is merely a demonstration of possibility - as I disclosed above, I hope Wework finds a business model.
Like selectivity rate, occupancy rate is a fraction: the proportion of available office space that has a tenant. Like selectivity rate, let's focus on the denominator, the amount of available offices.
The occupancy rate may be 72% in both Q4 of 2019 and Q1 of 2022, while at the same time the number of tenants is lower in 2022 than 2019. It all depends on the number of offices available. Since the pandemic, Wework has closed locations - incidentally, any business should want to get rid of its lowest performers so this statement isn't about gaming numbers; I'm pointing out how the metric attaining the same value does not contain the same meaning.
Nevertheless, managers can choose to game the metric. If you get rid of the bottom 20% instead of the bottom 10%, you'd have pushed the occupancy rate even higher.
Now, let's take a tour of one of these buildings. You'll notice that some of the offices are being used for storage. There are chairs, desks, etc. stacked up in them. If I am the accountant, I'd remove these offices from "available" space.
There are also offices that have been converted to common areas for users who are just there for the day. An accountant has at least two options: a) remove these offices from available space, or b) count them as available space, and count them as "occupied". To legitimize option b), the accountant can also stipulate that such an office is counted as "occupied" so long as there is at least one user working in that space for x% of the days of the month.
***
If you came here because you're read Numbersense (link), those last two paragraphs might remind you of Chapter 6, in which I discuss the official unemployment rate. That chapter also portrays data's subjective nature. Unemployment levels depend on how the government defines employed and unemployed. In the book, you learn that someone may have only worked for one hour in a month, and the US government counts him/her as "employed". In addition, the government removes people from the employment count, classifying them as "not in labor force", just like the accountant who subtracts offices used as storage space as "not available".
tldr; If you don't know how something is measured, you don't know anything about the data.
Comments
You can follow this conversation by subscribing to the comment feed for this post.