In the popular science genre, one often comes across "published in a peer-reviewed journal" as a certificate of authenticity. Given that the authors of such reports or books typically do not have the technical chops to understand the materials deeply, it's not a surprise that they require third-party validation. However, "published in a peer-reviewed journal" is pretty weak.
I just read a paper published in a peer reviewed journal that made me cringe. This is not any journal but the California Management Review, which according to Wikipedia, is "along with other publications such as Harvard Business Review and MIT Sloan Management Review, among the most influential and viable sources of contemporary business research". It has an impact factor of 1.667.
The paper is called "Organizational Blueprints for Success in High-Tech Startups: Lessons from the Stanford Project on Emerging Companies" by James Baron and Michael Hannan (pdf here). This paper was referenced in the New York Times Upshot article titled "Yes, Silicon Valley, Sometimes You need More Bureauracy", and my friend Xan G. was unhappy with the way the reporter presented the findings of the paper. I am not convinced by the original paper either.
First, let's talk about Xan's complaint. This sentence neatly summarizes the reporter's point of view:
Yet a human resource department is essential. The two [researchers] found that companies with bureaucratic personnel departments were nearly 40 percent less likely to fail than the norm, and nearly 40 percent more likely to go public — data that would strike many Silicon Valley entrepreneurs as heresy.
Going back to the source, we find the following (type of) chart from which the reporter extracted the evidence:
This chart shows the six "organizational blueprints" of which the Engineering blueprint is treated as the reference level. Only the Autocratic blueprint has higher likelihood of failure than Engineering. Four out of five non-Engineering blueprints did better than Engineering, and among those four, Bureaucratic was the worst! So how can the reporter conclude that this chart supports more bureaucracy (which is defined as having a human resource department) for Silicon Valley? Why not go for Commitment blueprint and have a 100% lower rate of failure instead of 40%?
Despite the name "Bureaucratic", the researchers never said that having an HR department is a defining characteristic of this blueprint. Having HR departments is compatible with several of the other blueprints. In fact, on page 14, the researchers stated: "Commitment and Star firms tended to be the fastest to bring in HR expertise."
What's more, the reporter described the Engineering blueprint as "the norm", which is not a term that the researchers used. The researchers used the word "modal" (page 11) which means the most frequently used but this usage is contradicted by Figure 3, in which the largest piece of the pie chart is the "Aberrant" type (which probably maps to "Non-type") 33% against 31% Engineering.
Next, the original paper in CRM is an instance of "story time". The data they have collected is nice but very limited; they really stretched in making their stories way beyond what their analysis could support. Most of the conclusions I'd consider based on theory rather than evidence. Besides, in simplifying the technical content to suit CRM's target audience, so much is lost that what remains is impossible to interpret.
Just look at Figure 6 above. The reader might guess that there is some kind of regression model being run with likelihood of failure as the response variable and the organizational blueprint as a predictor. The Engineering blueprint is selected as the reference level and given a value of 0%. What does it mean by the Commitment blueprint being 100% less likely to fail? Does this mean no companies using Commitment have ever failed?
If you'd like to know what the actual failure rates are, you will not find them within the 26-page article. If Engineering failed at a one-percent rate, then Bureaucratic failed at 0.6 percent, hardly a concern. You'd note that there is no indication of standard errors, error bars, or sample size. (They referenced a more technical paper in the footnote but I didn't see actual failure rates there either.)
There are other problems with the CRM article. The authors said they have a sample of "nearly 200 technology start-ups" but made no mention of how these start-ups were selected. In the entire paper, not a single company's name was mentioned.
Knowing how the sample is selected matters a lot here. There may be survivorship bias, for example, in that companies which failed fast are not in the sample.
Much is made about firms that shifted their organizational blueprint as it aged. A Founder's blueprint is contrasted with a CEO's blueprint. On page 15, we are told that "only 18 of the 165 firms in Figure 4 changed from one pure model type to another; of these, 14 moved between Engineering and Bureaucracy, the two closest pure type models". So somewhere along the way, we lost 35 firms in the sample without explanation.
Then on page 16, the researchers said: "One obvious question to ask is: Do changes in HR blueprints accompany changes in senior management within startups? The answer is yes." Later, on page 21, they asserted "we found compelling evidence that changing the HR model is destabilizing to high-tech start-ups".
The level of confidence in those statements is at odds with the sample of 18 firms that changed the blueprints, of which 14 moved between the two closest types.
Let's do a quick calculation. With 6 basic types, there are 30 possible A->B shifts, in which B->A is regarded as different from A->B. There were only 18 observed shifts, 14 of which ended up in the same A->B pair so there can at most be five unique shifts in the data. And yet they are able to draw conclusions about changes in HR blueprints in general?
The subject of the research is worth investigating. I can't help but think that the research would have been better without the pretense of being data-driven. Most if not all the conclusions are mostly supported by interviews, and not so much by the data anyway.
I am annoyed with the presentation of a series of charts that have no meaning--all relative values without disclosing the reference level; the presentation of statistical results without mentioning sample sizes or error bars; and the presentation of results from manually gathered data without even naming one company and describing the sampling methodology.