« Chance to ask me a question this Friday | Main | Willing the data to fit your model »


Feed You can follow this conversation by subscribing to the comment feed for this post.

Floormaster Squeeze

I work with BMI data and related health implications. In our work we see a slightly u shaped impact on the dependent (health outcomes) variable. Because BMI is not that important to the work (we do use it to adjust the impacts) I have simply used BMI categorical variables instead of a falsely linear continuous variable. Does that make sense?


Removing the data altogether does seem odd. Why not model the interactions with smoking and disease?


"Somehow, the field of evolutionary psychology has attracted many crazies."


Yes. Yes it has...


FMS: You are asking about discretizing predictor variables, which is often debated. My standard answer to this is look at the analysis both ways, discretized and not. If they tell you a similar story, then it is okay to discretize as you are not losing any valuable information. While you might think linearizing is arbitrary, discretizing is another kind of arbitrary! What you're doing is to impose a step function on the curve. That is fine so long as you set the right bounds.

Shampshire: Maybe it wasn't enough to prove their theory :)

Meic Goodyear

Several studies have concluded that life expectancy is greatest in the slightly overweight group. I believe the standard BMI defintions were developed before the second world war, when it's thought that most of the population were mal(under)nourished. The categories need re-visiting, but there's a huge vested interest in some parts of the public health industry. Having spent their careers propagating one set of beliefs many are reluctant to accept they need to change their message.


Meic: and you're right, it's not that the BMI metric is bad, we can use the metric but interpret it differently.

All: The Typepad spam filter has been churning out false positives lately. If your comment doesn't show up, that means I have to fish it out of the spam folder. My own comment above was deemed "spam".

Floormaster Squeeze

Thanks for the response. You are right that it is objectively arbitrary and good make things worse; I think it works for our adjustments better.

Using BMI linearly for us just means weaker or smaller impacts (heavier, worse outcomes generally). I am sure it has some value in our adjustments. However, as noted in the Nature discussion above, the Overweight category generally has as good (sometimes slightly better) outcomes as the Normal weight. The categories allow us to adjust for the worse outcomes of the Underweight (in our data there are very few people in this group) as well as the slight worse outcomes of the Obese and the markedly worse outcomes of the Morbidly Obese (we use the standard BMI categories and cut-offs).

Also in one of our outcomes the Obese have it slightly better/"pretty close" to Normal and Overweight and the categories allow the differences the Morbidly Obese have be more stark (linearly I believe this relationship is nearly flat).


FMS: Your reasoning seems sound. You need to look at the un-discretized analysis to make sure that there are indeed three groups and get an idea of where the boundaries are. The advantage of discretizing is in the presentation.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Business analytics and data visualization expert. Author and Speaker. Founder of Principal Analytics Prep, MS Applied Analytics at Columbia. See my full bio.

Next Events

Apr: 3 How 3 Women Became Data Scientists (Panel), Principal Analytics Prep, NYC

Apr: 25 Analytics Careers Advice, Stevens Institute of Tech, Hoboken, NJ

Apr: 26 Data Visualization Guest Lecture, NYU, New York, NY

May: 2 New York Marketing Association Big Data Workshop, NYC

May: 5 NYPL Analytics Careers Talk, NYC

May: 8 Data Visualization Seminar, Denver, CO

Past Events

See here

Future Courses (New York)

Summer: Statistical Reasoning & Numbersense, Principal Analytics Prep (4 weeks)

Summer: Applied Analytics Frameworks & Methods, Columbia (6 weeks)

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee


  • only in Big Data