Two cautionary tales appeared in press recently, serving notice to all "data scientists" (as statisticians are fancifully called these days). It's hard work to earn the status of a "science".
Via the New York Times comes the story of Dr. Robert Spitzer (link). As a young psychiatrist in the 1970s, he successfully pushed the profession to narrow the definition of homosexuality as a disorder. He observed wrily that many gay people are happy, and therefore only those who are depressed should be diagnosed.
In 2001, he presented new findings that show that homosexuality can be "cured" by reparative therapy. This was his method:
He recruited 200 men and women, from the centers that were performing the therapy, including Exodus International, based in Florida, and Narth. He interviewed each in depth over the phone, asking about their sexual urges, feelings and behaviors before and after having the therapy, rating the answers on a scale.
He then compared the scores on this questionnaire, before and after therapy. “The majority of participants gave reports of change from a predominantly or exclusively homosexual orientation before therapy to a predominantly or exclusively heterosexual orientation in the past year,” his paper concluded.
He strenuously defended the study for years, after it got published in his friend's journal without going through the typical peer-review process. (The article was published with commentaries by peers, which according to the NYT, were "merciless".)
At 80, he is coming forward to apologize and retract the study. Bravo to him for doing this. But one wonders how the industry of science failed to expose this failing much sooner. Is it because of the stature of the researcher? Is it conformance? Is it because he circumvented the usual peer-review process? ...
The reporter said the biggest problem with the study was self-interested subjects lying about sensitive issues like these. Actually, no. The biggest problem is the absence of a control group - gay men and women who did not receive such therapy. It boggles my mind that a study done in 2001 would have only cases and no controls. The case-control methodology has been in use since the 1950s/60s.
If you think that was bad, hold your nose before you read this Wall Street Journal article about cancer studies (link).
Here is a sample of the stinky sentences (my italics in all cases):
After publishing a paper on a rare head-and-neck cancer, [Dr. Mandic] learned the cells he had been studying were instead cervical cancer...
Dr. Mandic entered a largely secret fellowship of scientists whose work has been undermined by the contamination and misidentification of cancer cell lines...
Cell repositories in the U.S., U.K., Germany and Japan have estimated that 18% to 36% of cancer cell lines are incorrectly identified.
Dr. Tarin has spent 25 years working with that cell line--or so he thinks. A body of research suggests that MDA-MB-435 isn't breast cancer; many scientists now believe...[it's] melanoma... Dr. Tarin disagrees.
The prevailing attitude [among scientists] is that the other lab's cell line may be contaminated but not mine.
Nearly 40 years later, ... found 1,000 citations of the same contaminated cancer lines revealed in Dr. Gartler's 1966 findings, which have since been replicated many times using more advanced techniques. "They [the scientists] are either crooks or stupid."
As data scientists like to say, "garbage in, garbage out". But who among us is courageous enough to voluntarily consign decades of our own research to the dustbin?