The New York Times wrote about how the "Big Data" industry is trying to transform education (link). This is amusing and creepy by turns.
All of these may be well-intentioned, but what strikes me is how unscientific the arguments are given in favor of these data-driven methods. You'd expect the same data-driven approach to be used to justify their new solutions but you find almost none of that.
Arizona State’s initial results look promising. Of the more than 2,000 students who took the Knewton-based remedial course this past year, 75 percent completed it, up from an average of 64 percent in recent years.
What does it mean by "completing" the course? Is completion the same as competence? How do we know that the students were comparable from year to year? What is the variability of completion rates from year to year? Were there any changes in the admission rules or criteria for completion in the course? Were there any changes in the contents of the course?
Where is the control group? Andrew Gelman has written a number of times about experimentation in education. It would seem like companies like Knewton should take the lead in this type of evidence-gathering.
Mr. Lange and his colleagues had found that by the eighth day of class they could predict, with 70 percent accuracy, whether a student would score a “C” or better.
I don't know what the distribution of grades is at this school (Rio Salado) but grade inflation in US colleges has generally moved most if not all grades to "C" or better so I'd consider a 70 percent accuracy predicting "C" or above to be poor. Also, the issue is not whether one can diagnose the cases but whether there is a solution that would improve the underperforming students' grades. That would depend on the reason for underperforming. There will be cases where the students deserve "C" or worse.
Reading the article, I feel like much more deep thinking is needed to figure out why we would want to change in these ways.
Change is not always good. I have been teaching a course at NYU for many years. About two or three years ago, the course evaluation form went online. It used to be that I'd dedicate 15 minutes of the last class handing out evaluation forms, leave the classroom, and dedicate a student to collect the forms and drop them in the mail. Now, students are reminded by email towards the end of the semester to fill out an online survey.
Not surprisingly, the number of students responding has plunged. It was almost 100% when it was filled in during class. Now it's rarely above 30%. In order to encourage higher response rates, the emails that go out to students (and faculty) have become more frequent and they start earlier and earlier in the semester. The first email that opens the survey window is now sent not long after the midpoint of the course. As a result, students can comment on the class based on half or two-thirds of the experience.
The nature of responses has also changed. I now see mostly extreme opinions. The people who care to write evaluations either love you or hate you. (The irony is that all students think they deserve an A, a standard they don't apply when evaluating professors.) Students who are in the middle don't bother to give feedback.
It is absolutely true that putting the form online is more efficient, saves class time, and creates a data source for future data mining but the quality of the data has drastically declined.
These all go back to the issue of measuring intangible things. It's very difficult to do right. See my related post here.
The following throw-away lines in a Wall Street Journal article about the "return on investment" of getting into college debt (what an idea) are the most important ones:
The report [by the College Board] also doesn't account for dropouts or extra college years. Only 56% of students who enroll in a four-year college earn a bachelor's degree within six years, according to a report last year by the Harvard Graduate School of Education...
PayScale, a Seattle data firm, examines the links between pay and variables like colleges and majors. Its analysis, which also ignores dropouts but accounts for students who take longer to complete their degrees, ...
I cut that off since I've heard enough. How can they get away with ignoring dropouts when they are assessing the return on investment of college debt?
Imagine a cohort of 10,000 students starting college on debt. By year 6, which apparently is when they stop counting, 4,400 have not graduated, either because they dropped out or they are still in school. Both of these groups are likely to have the lowest return on investment of those in the cohort. Most of the dropouts won't be getting college-graduate jobs which pay higher. Those still in school are probably troubled students who if they do graduate later, would also earn less - even if they are equally qualified, they would earn less by value of time.
Given this reality, the analyses by the College Board and by PayScale would "ignore dropouts" as if they didn't ever exist. In other words, they only look at the 5,600 not the 10,000. This means whatever return on investment they compute will be exaggerated.
Technically, this is an example of survivorship bias. The sample being studied does not contain "non-survivors", in this case dropouts, so it doesn't generalize properly.
Also, the data is censored in the sense that the observation window is not enough for us to know what would happen to those people who are in college longer than 6 years. This is a common feature of such data sets; you'd want to do something about it, not just ignore it.
There are in fact many other problems with this type of analysis. Here's another crucial one: the counterfactual for reasoning whether debt is the cause of higher future wellbeing is not having debt. In other words, any such analysis must tell us what would happen if the same students were able to complete college without having to incur debt. Based on what the WSJ reporter said, I don't think this is how they framed the problem.
The LA Times (link) made the following comment as it describes the shameful situation in which the Dean of Admissions at the prestigious Claremont McKenna College (#9 on US News ranking of Liberal Arts colleges) inflated the average SAT scores of incoming students in order to manipulate national college rankings:
The collective score averages often were hyped by about 10 to 20 points in sections of the SAT tests, Gann said. That is not a large increase, considering that the maximum score for each section is 800 points.
Not a large increase? Are they wilfully ignorant, or just ignorant? I hope it's not a quote from CMC President Pamela Gann but an embellishment by the reporter. When one interprets whether 10 to 20 points is a "large" increase or not, one must find the right reference distribution of scores.
The maximum score is 800 but that is for individual scores. The 10 to 20 points manipulation is of the average scores of the freshmen class (about 300 students). The distribution of individual scores is much, much more variable than the distribution of average scores. So while 10 or 20 points for an individual may not be material, shifting the average score by 10 or 20 points is fraud of a massive scale.
Let's take a rough guess. According to the College Board, the standard deviation of individual scores is about 110 points (See the footnote on "Recentering" on this page). This means the standard deviation of the average scores of samples of 300 is 6.4 (this is known as the standard error). A 10-point fraud is about 1.6 standard errors. A 20-point fraud is just over 3 standard errors.
It's easier to visualize the scale of this:
Imagine the college's true SAT score average to be at "Z Score" = 0. Think of that as the median value (50th percentile). A 10-point fraud moves the average to 1.6 on the Z-Score scale, and as the diagram shows, that is moving from 50th to 95th percentile! And according to LA Times, that is the lower bound of the alleged manipulation.
Another way to see the size of this manipulation is to look at the average SAT scores for the top colleges (I found some data here but it's from 2004.) For instance, there is only about a 10-point spread between Columbia, Penn, Duke and Rice. Even a few points will shift the SAT score rankings.
So, after failing ethics, maybe the College is failing statistics too.
PS. [2/1/2012] @rags and I have been discussing what value of standard deviation should be used in the standard error formula. The proper value should be the standard deviation of the SAT scores of typical freshmen at CMC (or similar schools). The number I found is the standard deviation for the entire SAT test taking population so it is an over-estimate. If you find a school-level standard deviation number, please let me know and I'll adjust the computation. I don't think the conclusion would change much though given what we see in the table of average scores by college shown above.
PS. [2/4/2012] If the standard error was over-estimated, then the distribution of average scores would be even tighter than stated. This would make a 10- or 20-point manipulation even more egregious.
I just finished Emanuel Derman's new book, "Models Behaving Badly", which is a good introduction to the philosophy of statistical models. The topic has been swirling in my head after also having read this article by economist Dani Rodrik, who reflected on the recent walkout by some Harvard students of their introductory economics course.
In Rodrik's view, the students were right to protest the economics profession because the economic models being taught in the classroom are too simplistic. He paints a particularly eye-opening - and damning - scenario: in the undergrad classroom, as well as in public, the economist admits no doubts about his ideologies (such as "free trade", "free market") but in his "advanced graduate seminar on ... theory", the same professor would debate with skeptics, leading to a "heavily hedged statement" after "a long and tortured exegesis". The statement would begin with "if the long list of conditions I have just described are satisfied, .."
I could imagine Derman entering that graduate seminar, and declare everything as nonsense. (Derman currently teaches in the Financial Engineering program at Columbia, and previously worked on Wall Street as a "quant" building economic models, after spending his graduate career working with models of the physical world.) "Models Behaving Badly" is about how economic models can go off-track, how frequently they do, and why modelers must behave modestly. Derman would argue that Rodrik's "long list of conditions" are almost never satisfied.
There is a crucial difference between the assumptions made by the Black-Scholes Model and the assumptions made by a souffle recipe. Our knowledge about the behavior of the stock markets is much sparser than our knowledge about how egg whites turn fluffy.
He goes on to argue, perhaps unexpectedly, that the Black-Scholes Model is "the best model in all of economics". He aims his criticism squarely at the sacred cow of financial economics, the "Efficient Market Hypothesis".
Rodrik does not believe that the economics profession needs better models. He claims "Macroeconomics and finance did not lack the tools needed to understand how the crisis arose and unfolded." The fault of the profession was to have trusted the wrong models (ones assuming efficient and self-correcting markets). He believes that this bad choice of models is facilitated by "excessive confidence in particular remedies - often those that best accord with their own personal ideologies."
It isn't clear to me how Rodrik proposes to resolve the ideology problem. In fact, his citation of another economist Carlos Diaz-Alejandro perfectly captures the heart of the issue: "by now [1970s] any bright graduate student, by choosing his assumptions... carefully, can produce a consistent model yielding just about any policy recommendation he favored at the start."
The diesease is more than ideological. Reading behind the lines, I think these models are far too complex for their own good. They cannot be falsified with observed data. They can be made to support any ideology. This leads me to two observations:
Returning to the protesting Harvard students, Rodrik describes the discontent of the undergrad economics syllabus: "it is as if introductory physics courses assumed a world without gravity, because everything becomes so much simpler that way."
In making this analogy, Rodrik is giving economic models the status of models in physics. He's saying that there are simplified models in both disciplines which don't fit reality well, but there are complex models in both disciplines which work well.
Derman would beg to differ. Originally trained as a physicist, he now freely admits that "financial modeling is not the physics of markets". He spends a great portion of the book showing why economic models can never aspire to the status of physics models.
Reading Rodrik's analogy, one senses that he has yet to arrive at Derman's port. Rodrik continues to make parallels between physics and economics. But I know of no introductory physics course that assumes a world without gravity - the major omission is Einstein's relativity. There is, in fact, a huge difference between Newton's theory of mechanics and, say, the Capital Asset Pricing Model. Students who learn Newton's laws can explain how the world works without ever knowing any relativity theory. Newton's theory can stand on its own. Not so the simplistic economics models. As Derman points out, simple economics models are easily invalidated by observed data.
My own view, informed by years of building statistical models for businesses, is more sympathetic with Derman than Rodrik. There is no way that economic (by extension, social science) models can ever be similar to physics models. Derman draws the comparison in order to disparage economics models. I prefer to avoid the comparison entirely.
The insurmountable challenge of social science models, which constrains their effectiveness, is that the real drivers of human behavior are not measurable. What causes people to purchase goods, or vote for a particular candidate, or become obese, or trade stocks is some combination of desire, impulse, guilt, greed, gullibility, inattention, curiosity, etc. We can't measure any of those quantities accurately.
What modelers can measure are things like age, income, education, past purchases, objects owned, etc. Nowadays, we can log every keystroke you type on your smartphone (link). That models are even half-accurate is due to the correlation of these measured quantities with the hidden drivers of our behavior but this correlation is only partial.
Now add to that, the vagaries of human behavior.
The following articles discuss the behind-the-scenes process of preparing data for analysis. It points to the "garbage in garbage out" problem. One should always be aware of the potential hazards.
"The murky world of student-loan statistics", Felix Salmon (link)
At the end of this post, Felix found it remarkable that the government would not have better access to the data. The same sentiment was expressed at a recent presentation by the data team at Bundle.com, in which they described the extraneous strenuous process by which they matched the names of merchants on credit card statements to a database of known merchants. One would think the credit card companies would be able to pass along the merchant identifiers but they don't or can't.
"European debt: the big picture", Simon Johnson (link)
Simon points out that while the New York Times did a fantastic job with this visualization of the European debt linkages, one should notice what wasn't present on the chart, namely, the murky world of derivatives and not knowing it denies us knowledge about the exposure of U.S. banks to this potentially devastating problem.
Krugman points to this plea from Robert Samuelson to save the U.S. Statistical Abstract. Under pressure from Congress to "save money", the Census Bureau will disband the small team that assembles publications on the statistics of the United States. Apparently, this move cuts 24 jobs and saves $2.9 million annually.
This development is disturbing in many ways:
The abstract of my talk is:
Recently, pop-stats books have captured the public's favor, overcoming the negative perception of the subject of statistics. The best known examples include the Malcolm Gladwell series; the Freakonomics franchise; Ian Ayres's Super Crunchers; and the speaker's Numbers Rule Your World. Readers find these books highly accessible as each author finds a way to balance readability and rigor. What can educators learn from this publishing phenomenon? What is the role of pop-stats books in statistics courses? (Partly based on joint work with Andrew Gelman)
A popular argument is circulating out there. You may have heard this one:
college graduates have lower unemployment rate than non college graduates; therefore, if there are more college grads, there will be more jobs and fewer unemployed.
This argument has been made by many, including Bernanke as I discussed here.
Unfortunately, this is a case of mistaking correlation with causation. Producing more college graduates will produce more indebted young people for sure. Producing more college graduates will not by itself create new jobs, unless you're talking about university teaching jobs in certain departments. The argument would make sense if there were an under-supply of college grads today but reports tell us that the unemployment rate among newly-minted college graduates is about 50%.
Warning: I now provide an unsolicited reply to Michael Kogan, who left the following comment on a blog by Paul Krugman:
Can you elaborate on why the structural unemployment story is false? The link you posted assumes the rate of "progress" is constant. But why should it be so? Many new industries today require half-way literate computer-savvy people. The unemployment among people with Master's degrees is very low - yes it is double what it used to be during Clinton years but it is still very low. And if you ask a more specific question: What is the unemployment rate of people with M.S. in Electrical, Computer or Mechanical Engineering from a top 100 University during Clinton years versus today - I bet you will come up with a very similar number - zero or close to it
This seems like a reasonable question, and I'm sure Michael is not the only one wanting answers.
To paraphrase his question: does the fact that people with Ivy-league technical degrees have close to zero unemployment rate (assuming he is right) prove that we have a "structural unemployment" problem?
In my book, I caution against carelessly throwing around labels (age groups, racial groups, etc.). This is a prime example.
The fallacy in this type of argument can be seen if we use different labels. Substitute "Ivy elite with technical degrees" with "white folks"; substitute "non elites" with "colored folks". The fact is still true: white folks have lower unemployment rate than colored folks. Now, are we going to say the key to solving unemployment is to make colored people white?
Pointing to the difference in unemployment rates between college grads and non college grads and concluding that we should produce more college grads is not logically different from wanting to make colored people white. One is impossible, so is the other.
Here is another view that shows why the math can't work out. The belief is that we can take people from the bucket labelled as "A LOT OF PEOPLE" and move them to the bucket called "A tiny Elite of people..." If we do so, these people will magically find jobs. The more people we shift over, the more people will find jobs!
However, the employment status is determined by a lot of factors, of which the supply of college degrees is but one. There is a fixed number (small) of Ivy-League schools and only so many technical degrees granted (even smaller) per year from these schools. The solution looks nice on paper but it is not practical. It also ignores the fact that many of those who aren't going to college probably chose not to go to college. Just ask Bill Gates.
As I said before, producing more college grads will have two predictable outcomes: lower average salary for all college graduates, and more young people burdened with school loans. I think these same economists will label these "unintended consequences". Should we be implementing policies that have predictable "unintended" bad outcomes?
The following recent news items interest me because they relate back to previous posts on this blog.
The NYT reports, in an article titled "Many with New College Degree Find Job Market Humbling", that "Employment rates for new college graduates have fallen sharply in the last two years, as have starting salaries for those who can find work. What’s more, only half of the jobs landed by these new graduates even require a college degree."
This confirms that our economic policy makers have lost touch with reality. It wasn't long ago that Ben Bernanke put his weight behind the argument that the key to reducing our current unemployment crisis is more college education. I discussed this issue here, as well as here. Incidentally, the NYT also found data to support my other point about pushing youngsters into debt: in this article, they noted that for the first time ever, student debt outpaced credit card debt.
Further down the article on the dire job market for college grads, it says "Among the members of the class of 2010, just 56 percent had held at least one job by this spring, when the survey was conducted.". So, a rough estimate of the unemployment rate among new graduates is 44 percent! That's why in this post, I pointed out the fallacy of using the overall unemployment rate for college grads (5%) to talk about new college grads. This is almost a 10-fold error.
The NYT reporter didn't get the memo because the chart used to illustrate the story refers to the overall unemployment rate for college grads. This number includes people who graduated for college 20-30 years ago and are in mid-career.
The Dealbook section of the NYT reports that the Treasury is winding down bailouts, and "trying not to lose money". To believe that it is possible to "not lose money" is to believe in fairies. In this post, I consider why banks can never repay taxpayers in full. To recap:
As the irrepressible Dean Baker constantly reminds us, our nation suffered from a mammoth housing bubble, which wiped out $8 trillion of equity from "homeowners" when it burst. This $8 trillion is not monopoly money. One can't wish it away. If the government bailed out the banks (100% on the dollar), and then the government also "broke even", who suffered the loss?
Dealbook is written by business journalists who should know a thing or two about running businesses. When banks move in to rescue corporations on the verge of bankruptcy, they make loans at extremely high interest rates (say, 15%), and demand upfront payment of deal fees (I'm sure there are many other onerous terms.) On the day that the bailout was negotiated, the government already lost a huge amount of money by setting low interest rates, and not receiving upfront "deal fees".
While such interest and fees may appear unseemly, almost feeling like extortion at the weakest hour, they are compensation for the rescuer taking enormous risks. As I pointed out before, if one waits till the risks pass, and it turns out these companies (banks) survive, then in hindsight, it would seem like not receiving just compensation for taking risk is okay. But that is hindsight; the lender could lose everything if the bankrupt companies fail to recover. No one with any business sense would take such an arrangement without proper compensation.