You can follow this conversation by subscribing to the comment feed for this post.

As to point 4, as I recal from every statistics course I ever took, all you need to achieve statistical significance is a sufficiently large N. It was always stressed that practical significance was up to the researcher to decide based on considerations other than the P-value. If organic consistently has half as much pesticide contamination, but they are both measured at some number below the threshold considered a safety risk then they are both equally safe because neither of them pose a risk to health. Where is the problem with this sort of interpretation?

I expect the reasons for not running RCT is that they cost a lot, can take a long time to return results and the results may still be inconclusive. A food study is going to be complicated by people having frozen foods or takeaways or eating out, at either commercial or friends. Plus people just decide that they don't want to eat what they are supposed to. Happens a lot in studies of diet on cholesterol.

Joshua: Practical significance is really important but like everything in statistics, there is a fine line between use and abuse. In the case you described, the proper conclusion (if we accept the interpretation) is that both food groups satisfy the safety standard. It isn't right to conclude that both food groups are the same because the statistical test showed they had different levels of bacteria. It always should arouse suspicion when the researcher essentially changed the judging criteria after seeing the results.
From the researcher's perspective, this is also dangerous territory. It's very easy to lapse into finding stories that fit any conclusion, which explains away the unwanted cases.

I fail to see how previously determined safe upper limits of exposure are "chang[ing] the judging criteria after seeing the results." The threshold for practical significance was determine experimentally, presumably by a different group, and in advance of the trial that detected statistically significant differences below the level of practical significance. How would you prefer that practical significance be determined in this case?

I suspect that Ken is correct about the RCT. Anecdotally, a professor of mine related how she participated in a 6 month dietary comparison study, and most of her peers flat out lied about their eating habits because they got tired of following the prescribed nutritional regimen (and they were being paid to participate). That's one of the perks of being an animal researcher. My animals eat exactly what I give them, and nothing else.

As to point 7, the original impetus of the organic movement was about food safety concerns and health concerns. That many conflate organic with free range or local is actually a flawed connection. Organic does not require free-range, nor is it necessarily local. Organic fruits not native to the shoppers geographical region are by definition 'not local' regardless of their status as organic. In California the companies that grow most of the available organic food and those that grow most of the non-organic food are the same companies. They simply target part of their production at different markets.

Joshua: If the safety limit is the judging criterion, the null hypothesis should be bacteria from organic food <= threshold, not bacteria from organic food = bacteria from non-organic food. A test is only as good as how you conceptualize it. To set it up in the latter way and then interpret it in the former way is changing the metric after the fact.
If safety limit is the criterion, then every single test in this study should adopt this setup but it doesn't sound like that is the case here.

There are two different goals here.

First is to characterize any differences between the results of the two different production strategies. This goes to the largely data-less argument that one is "Better" than the other. Based on this review, Organic food has less pesticide residue than conventionally produced food. That is an important question to understanding the downstream effects of production changes, regardless of what the safe exposure level is.

The second, and to my mind independent, question is whether each is "Safe". People are only concerned about pesticide residues because of safety concerns. The EPA is responsible for determining safe exposure levels and making recommendations as to the safe upper limit of exposure. Whether something possess unsafe concentrations of a specific pesticide residue is determined by comparing an analyzed value to a table value. To my knowledge there is not statistical test to determine if the sample concentration is statistically different from a table value. Maybe I'm wrong and you can correct me here.

This study answered question one. The authors then compared the values to the threshold and found them, regardless of source, to be below the threshold. If the EPA receives evidence that they were not conservative enough in their safety limit and revises it downward, then the difference may turn out to be practically significant. In which case it is good that the researchers presented this difference even if it is not of practical significance at the moment.

Cost is one reason not to do an RCT, compliance or lack thereof is another. I imagine that cumulative effects over longish periods of time are of the most interest here and that exacerbates both. I can't imagine a study design that wouldn't be susceptible to huge noncompliance over long periods and measuring that would be tough. Did you have a design in mind?

Ken and Jared: Cost and non-compliance are always there for any kind of RCT so I wouldn't consider that as a valid reason for not running RCT. While RCT results may be less than perfect because of noncompliance, as you rightly pointed out, I'd still trust that result more than any number of observational studies with convenience samples.
One possibility is to start with animal studies, looking at animals that have short generational cycles. I'd also trust such studies if they are RCTs more than any number of observational studies with convenience samples.
Can anyone trust an observational study with convenience samples that purports to measure long-term cumulative effects when you have no control over anything during the period of "observation"?

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

(Name is required. Email address will not be displayed with the comment.)

## NEW BOOTCAMP

See our curriculum, instructors. Apply.
Business analytics and data visualization expert. Author and Speaker. Founder of Principal Analytics Prep, MS Applied Analytics at Columbia. See my full bio.

## Next Events

Oct: 31 Webinar on Data Visualization, online at JMP

Nov: 1 NYU unCOMMON Salon Public Lecture, New York, NY

Nov: 8 Tufts Gordon Institute: A Conversation with Kaiser Fung, Facebook Live

Nov: 8 Tufts TGI Careers & Networking Night panel, Somerville, MA

Nov: 26 Data Visualization New York Meetup, New York, NY

Nov: 27 NYPL Data Analytics Resume Workshop, New York, NY

Nov: 30 Purdue School of Engineering Seminar, West Lafayette, IN

Dec: 1 Purdue Mathematics, Data Science, and Industry Conference, West Lafayette, IN

See here

## Future Courses (New York)

Summer: Statistical Reasoning & Numbersense, Principal Analytics Prep (4 weeks)

Summer: Applied Analytics Frameworks & Methods, Columbia (6 weeks)

## Junk Charts Blog

Graphics design by Amanda Lee

## Search3

•  only in Big Data