This is the second post on the immigration paradox study, first discussed on the Gelman blog. My prior post on the graphing aspect is here; this post focuses on the statistical aspects. I am working backwards on Andrew's discussion points.
Which difference is most interesting?
5. Agree with Andrew; they should publish similar analyses on other minority groups as soon as possible. One thing that strikes me when looking at the interaction plot is that the U.S. born non-Latino whites have a much higher incidence of mental illness. The difference between different subgroups of Latinos paled in comparison to the difference between non-Latinos and the Latinos. This latter difference is particularly acute among the U.S. born than the immigrants. The importance of the Latino analysis hinges upon whether the "paradox" is also found among other minority groups.
(Chris P also pointed this out in his comment on the previous post.)
Disaggregation, Practical Significance, and the Meaning of Not Significant
2. Andrew is also right in expressing moderate skepticism about this sort of disaggregation exercise. He
connects this to the subtle statistical point that "the difference
between significant and not significant is not significant." A related but less obtruse issue is that as one disaggregates
any data, the chance of seeing variations that stray from the average
gets higher and higher. This is because the sample size is decreasing,
and so the statistical estimates are less reliable.
(To give a flavor of the scale, there were a total of 2500 Latinos in the sample, with 500 Puerto Rican Latinos. The analysis drilled down to the level of different types of mental disorders, subgroups of Latinos, and also adjusted for demographics. The details of the demographic adjustment are not available but in any case, one should be concerned about whether there were sufficient numbers of say, male immigrant Puerto Rican Latinos age 18-25 with income < $10,000 living in a rental apartment, for such an elaborate exercise.)
Expanding on this point further, one observes that the measured gap between U.S. born and immigrant Puerto Rican Latinos was about 5%. But this 5% is probably of considerable practical significance since the base rate of incidence is about 30% (I say probably since I am not an expert in mental illness). The current statistical analysis judged this to be insignificant -- if the sample size were larger, this difference could conceivably be statistically significant, and also practically significant.
doesn't the significance test deal with the small sample size problem?
Yes, if the authors merely described the Puerto Rico result as
inconclusive. Here, as is done very commonly, insignificance is
equated to "no difference": they said
differences were found in lifetime prevalence rates between migrant and
U.S.-born Puerto Rican subjects.
In reality, a difference of 5% was found in the sample that was analyzed. The statistical procedure found that this difference could have been a result of chance -- notice "could", not "must". If the measured difference was 0.5% on 30%, then I might be willing to accept a finding of "no difference"; when it was 5% on 30%, I would like to see a larger sample analyzed.
The Meaning of Paradox
1. Andrew was perplexed by why the phenomenon is known as a "paradox". I had the same issue until I read the paper. The authors were a bit sloppy in the abstract. In the paper itself, they explained that the conventional wisdom has it that immigrants should be more likely to have mental illness because of the stress from the immigration process, and yet the statistics showed the exact opposite. That is the paradox.
I was a little shocked to see the data tables that gave all the estimates of the various effects at the various subgroup levels: shocked because the authors were allowed (or asked) to include only the p-values that were below some unspecified level (which I surmised is 10% although a 5% significance level is used to judge significance as per convention). This is publication bias within publication bias. P-values that are not significant still provide valuable information and should not be omitted. They did provide confidence intervals but for each subgroup separately, rather than for the difference -- and as they noted, such intervals by themselves are inconclusive when they overlap moderately.