Andrew and I warned you about "power poses" in Slate some time ago (link).
Breaking news is that Dana Carney, a co-author of the paper that claimed the benefits of the power pose, has now confirmed that she no longer believes in the power pose. She is actively discouraging researchers from this "waste of time and resources."
Here is her statement (PDF link), which is well worth reading in full. This is a courageous statement.
The statement discloses a variety of tricks used to game p-values so that they meet the publishable 0.05 threshold. Everyone suspects someone else is playing such tricks but it's rare when someone actually confesses to them.
The highlights are:
Initially, the primary DV of interest was risk taking. We ran subjects in chunks and checked the effect along the way. It was something like 25 subjects run, then 10, then 7, then 5. Back then this did not seem like p-hacking. It seemed like saving money (assuming your effect size was big enough and p-value was the only issue)
Unfortunately, I have witnessed this type of p-hacking in industry all too often. In fact, many, many people run so-called A/B tests until they reach significance. There are many problems with what Carney described above. Imagine an effect size that is small (close to zero). As the samples accumulate, the measured effect will fluctuate around zero. If you wait long enough, the measure will hit p=0.05 by chance and then you stop. Further, they were reducing the sample size as the experiment continues - which means they are introducing more sampling variability, which means it is more likely that the measure will hit extreme values by chance!
It's tough for me to believe that she wasn't aware that stopping when you hit p=0.05 is p-hacking but that's what she is saying.
For the risk-taking DV: One p-value for a Pearson chi square was 0.052 and for the Likelihood ratio it was 0.05. The smaller of the two was reported... I had found evidence that it is more appropriate to use "Likelihood" when one has smaller samples and this was how I convinced myself it was OK.
She's focused here on the researcher degree of freedom issue. The larger problem is the magic dust that seems to sprinkle off p=0.05. If that is the chosen threshold for significance, and my result is right on the cusp, I would be very skeptical of this result. I don't think 0.052 is the better number - they are both bad.
The self-reported DV was p-hacked in that many different power questions were asked and those chosen were the ones that "worked".
Many A/B testing platforms come with a battery of hundreds of metrics automatically computed for each test. No further comment needed.
As of today, the TED talk on "power poses" is still going strong. It has accumulated 36 million "views" and the official description does not mention Dana Carney's retraction.
One thing, though: power pose still works for horses: http://amycuddy.com/horses-can-benefit-from-power-posing-too/
Posted by: Andrew Gelman | 09/29/2016 at 03:21 PM
I particularly like this sentence: "Kathy and I discovered that trainers have been getting their horses to power pose for a long time — for more than two thousand years, in fact"
which we can juxtapose with this sentence from their response to the Ranehill, et. al. critique: "Participants in Ranehill et al.’s study held the poses 300% as long as participants
in Carney et al.’s study. Duration and comfort of poses are very likely to be moderators."
Posted by: Kaiser | 09/29/2016 at 05:37 PM
I was recently reviewing an ethics application and it seemed like they were going to do a similar analysis, so obviously there are people who don't realise what they are doing. To someone who hasn't had an education in mathematical statistics. It probably makes sense to them. I have a 100 subjects and it doesn't give a clear result, so I will get a 100 more and see if that works.
It actually would make an interesting question for students to work out what is the actual distribution of the test statistic and what therefore is actual p value compared to the calculated.
Posted by: Ken | 10/03/2016 at 03:21 AM