From bad to worse
How to read a graph

Break it down, build it up

Thought of the day:

While commuting today, I wondered why we use the term "data analysis" or "data analyst".  I recalled that in chemistry class, we learnt that analysis means breaking things down while synthesis means building things up.

With regards to data, typically we try to collect data at the most detailed level and we build up messages and stories from the little pieces.  We don't break things down.  We can't break things down, in fact, if the data come to us in aggregated form.  (Think ecological fallacy.)

So why don't we say "data synthesis" rather than "data analysis"?


Feed You can follow this conversation by subscribing to the comment feed for this post.


I want some of what you've been smoking.


Probably similar to the reason we say freedom-fighter and fire-fighter.


Because "synthesis" means "making things up" :-)


This is actually a valid question.

Analyzing data means to find out about the structures in a dataset, or more statistically spoken, to separate signal from noise by applying models and distribution assumptions. This is usually a process where you start to take your dataset apart.

Once you succeeded with this process, you could actually sample a synthetical dataset using this model assumption. Here you really build up again, which directly relates to the synthesis.

More generally speaking: an analysis generates knowledge, a syntheses applies knowledge.

Jon Peltier

Derek - That's exactly what I was thinking.

When I worked as a metallurgist, I was involved with terms like "forge", "fabricate", and so forth. When my group all got Word with its first rudimentary thesaurus, we were very amused that most of the synonyms related to creating false statements.

Analysis means developing understanding about a data set. When you start to apply the analysis to a broader system, you are synthesizing a model.

Kelly O'Day

I liked your comment so much that I visited your site. I am extremely impressed with your work.

Your books look great, and your papers are a gold mine. I particularly like your paper on Trellis Displays vs. Interactive Statistical Graphics, link. The fact that you wrote it in 1996, 12 years ago, is both impressive and disturbing to me.

We Excel chart users are way behind in charting technology, and falling behind at a faster rate.

Adrien Rochereau

well I would say that we are analyst, as we have to drill down into the data and go as deep as needed to find the facts.

from the big picture to the small details is what makes an analyst.

making a sumary of it is only the end result but the biggest part is always to find the details.

(if you dont know the details, you cannot explain the big picture)

Jan Schultink

Although I typically try to get the full detailed data set, I usually start with the big picture, then asking myself how I would like to break it down, ignoring cuts that do not seem insightful. I guess I would call myself an analyst.

There are different type of people though: the Myers Briggs "S" (sensor who loves details first then builds up) and "N" type (Intuitive, big picture first and stops breaking down when the goal is reached). I usually score N, but am not a slam dunk case. Background reading:


Maybe I had "data mining" in mind... the proverbial case of having a lot of data and trying to figure out what it all means. This concept is seeping into statistical modeling via variable selection, regularization, etc.


I've always thought that:

Analysis = "state of knowledge"
Synthesis = "state of wisdom"?

So to me, in "BI/BA" terms, even as I pull and correlate data I'm still "breaking it down" just in interesting ways. It isn't until I start performing what the industry calls "predictive analytics" that I can approach synthesis.


Data synthesis is usually referred to as "modeling". I'd say "data analysis" when I'd be analyzing models built from data - not all models are synthesized from data.

Georgia Sam

I think it differs by occupation. I analyze demographic, survey, & test score data. When I speak of analyzing data, I'm not referring to a process that begins with aggregated data. My analyses begin with "raw" data -- one record per individual, or whatever the unit of analysis is. When I take aggregated data & put it into an understandable & useful form for a particular audience, I usuually call that simply reporting or presenting the data. "Synthesizing" is not a word I use often.


Georgia Sam: that's why I raised this issue. for any of us doing "data mining", the data already exist in the most disaggregated form, and the "modeling" process actually synthesizes the data. There is no analysis in the sense of breaking things down.

Free MP4 player

Seems like a good idea. Data synthesis sounds great:)

The comments to this entry are closed.