What's a histogram?

Almost all graphing tools make histograms, and almost all dataviz books cover the subject. But I've always felt there are many unanswered questions. In my talk this Thursday in NYC, I'll provide some answers. You can reserve a spot here.

***

Here's the most generic histogram:

Salaries_count_histogram

Even Excel can make this kind of histogram. Notice that we have counts in the y-axis. Is this really a useful chart?

I haven't found this type of histogram useful ever, since I don't do analyses in which I needed to know the exact count of something - when I analyze data, I'm generalizing from the observed sample to a larger group.

Speaking of Excel, I felt that the developers have always hated histograms. Why is it much harder to make histograms than other basic charts?

***

Another question. We often think of histograms as a crude approximation to a probability density function (PDF). An example of a PDF is the famous bell curve. Textbooks sometimes show the concept like this:

Histogram_normal_pdf

This is true of only some types of histograms (and not the one shown in the first section!) Instead, we often face the following situation:

Normals_histogram50_undercurve

This isn't a trick. The data in the histogram above were generated by sampling the pink bell curve.

***

If you've used histograms, you probably also have run into strange issues. I haven't found much materials out there to address these questions, and they have been lingering in my mind, hidden, for a long time.

My Thursday talk will hopefully fill in some of these gaps.


My talk next week on histograms

Next Thursday (March 14), I'll be presenting at the Data Visualization New York Meetup, hosted by Naomi and Cameron. The event is in-person at Datadog's office. You can reserve your spot here.

Kfung_dataviznewyorkmeetup_mar2024

This talk is brand new, based on some work inspired by a blog post by Andrew Gelman. One of Andrew's correspondents asked about a particular type of histogram. While exploring this topic, I filled some of my own gaps in knowledge about this deceptively simple chart form. I'll be sharing this story.

Bits and pieces have appeared before on my blog. See this, this, and this for background.

If you're attending the talk, come up and say hi.

To register, click here.


Monday after Thanksgiving, see you at the Data Visualization Meetup

My little analytics training startup, Principal Analytics Prep, is proud to sponsor the next meeting of the Data Visualization New York Meetup, organized by the indefatigable Naomi Robbins. The Meetup, to be held on November 26, 2018 (Monday), headlines NYU professor, Kristen Sosulski, who will discuss the “business case for data visualization”. Click here to register.

Naomirobbins_bookcoverNaomi is a long-time friend. I reviewed and recommended her book on Creating More Effective Data Graphics in 2008. This is still a useful reference to some key concepts, presented in a clean, easily digestible format.

***

The keynote speaker, Kristen, has just published Data Visualization Made Simple, described as a “top book for computer science students.” I am very much looking forward to her talk because the subject speaks to the core of the mission of Principal Analytics Prep: placing data science & analytics in the context of the entire enterprise.

That’s why my bootcamp and training programs emphasize the Three Pillars: computing, statistics, and business. Our instructors are practitioners with 10 to 30 years of learning from real-world implementation of models and systems used by forward-thinking, data-driven organizations.

The Spring 2019 cohort of our Certified Data Specialist bootcamp is open for applications. If you're looking to transition your career into data science and advanced analytics, check us out. Here are some of the great things our alums have said about the program. Take advantage of the early admit deadline by December 10, 2018. Click here for more information.

***

Kristen_bookcoverI'm excited to hear what Kristen has to say about using data visualization in the business world. Kristen's website is here. She teaches at NYU's Stern School of Business, and describes herself as a computer scientist. Here's the link to the Amazon page. Unfortunately the page is not very informative about the book's content. The table of contents is minimalist ("The Design", "The Audience", etc.). I will report back on what she spoke about after the Meetup.

 

 


Webinar Wednesday

Lyon_onlinestreaming


I'm delivering a quick-fire Webinar this Wednesday on how to make impactful data graphics for communication and persuasion. Registration is free, at this link.

***

In the meantime, I'm preparing a guest lecture for the Data Visualization class at Yeshiva University Sims School of Management. The goal of the lecture is to emphasize the importance of incorporating analytics into the data visualization process.

Here is the lesson plan:

  1. Introduce the Trifecta checkup (link) which is the general framework for effective data visualizations
  2. Provide examples of Type D data visualizations, i.e. graphics that have good production values but fail due to issues with the data or the analysis
  3. Hands-on demo of an end-to-end data visualization process
  4. Lessons from the demo including the iterative nature of analytics and visualization; and sketching
  5. Overview of basic statistics concepts useful to visual designers

 


Upcoming talks and workshops, NYC, Seattle, Philadelphia, St. Louis

The following talks/events are all free and open to the public. If I'm in your neighborhood, please come by and say hi.

Kaiser_fung_talks_feb_mar_2018

 

You can register or learn more about the above talks at the following links:

Feb 20, 2018 (tonight, NYC) - Principal Analytics Prep Open House, with me and Tina Lowry, talking about the data analytics field and how to get a job in this space. More information here.

Feb 22, 2018 (Seattle) - A talk about best practices in data visualization for business presentations. Sign up here.

Feb 28, 2018 (NYC) - Analytics Resume Workshop, jointly hosted by Principal Analytics Prep and New York Public Library Job Search Central: we provide free advice on improving your resume to appeal to analytics and data science hiring managers. Register at our Meetup group here

March 7, 2018 (Philadephia) - A talk about best practices in data visualization for business presentations. Sign up here

This last event, part of the Midwest Digital Marketing Conference, has a small fee.

March 26, 2018 (St. Louis) - Workshop on data visualization: simple things you can do to make even Excel charts better! Sign up here (scroll to the bottom of the page).


February talks, and exploratory data analysis using visuals

News:

In February, I am bringing my dataviz lecture to various cities: Atlanta (Feb 7), Austin (Feb 15), and Copenhagen (Feb 28). Click on the links for free registration.

I hope to meet some of you there.

***

On the sister blog about predictive models and Big Data, I have been discussing aspects of a dataset containing IMDB movie data. Here are previous posts (1, 2, 3).

The latest instalment contains the following chart:

Redo_scorebytitleyear_ans

The general idea is that the average rating of the average film on IMDB has declined from about 7.5 to 6.5... but this does not mean that IMDB users like oldies more than recent movies. The problem is a bias in the IMDB user base. Since IMDB's website launched only in 1990, users are much more likely to be reviewing movies released after 1990 than before. Further, if users are reviewing oldies, they are likely reviewing oldies that they like and go back to, rather than the horrible movie they watched 15 years ago.

Modelers should be exploring and investigating their datasets before building their models. Same thing for anyone doing data visualization! You need to understand the origin of the data, and its biases in order to tell the proper story.

Click here to read the full post.

 

 


Hello to St. Louis readers

Stlouismo

I'll be hosting a Data Visualization workshop at the Digital Media Marketing Conference in St. Louis, Missouri on Thursday. Here is the link to their website.

The workshop is arranged from three themes: Appreciating, Conceptualizing, and Improving. There will be several hands-on exercises.

If you are a reader in St. Louis, and would like to meet up, email me.

***

Posting this week will be light because of various commitment. I may put something up later this week.

One of my students pointed me to this Medium article about a NYT chart. Well worth reading.

 


Numbersense, in Chinese and Japanese

This is a cross-post on my two blogs.

The new year brings news that my second book, Numbersense: How to Use Big Data to Your Advantage has been translated into Chinese (simplified) and Japanese. Here are the book covers:

Chinese_edition_cover

In Chinese, the title reads: "Say No to Fake Big Data". Captures the sentiment of the book pretty well, I must say.

Numbersense_japanese_cover

I have no idea what the Japanese title means. Perhaps a reader can help me out here.

***

The Japanese version is available here or here.

The Chinese version is here.

The English version is here.


The snow made me do it - California, here I come

Sunnysandiego_aforestfrolicCalifornia readers: here's a chance to come meet me. I am giving talks in San Diego (Feb 3) and San Mateo (Feb 5) next week, courtesy of JMP. Free registration is here

These talks are related to two ongoing projects of mine: the first project is to create a theory of data visualization criticism. How can we use precise language to describe our reactions - good and bad - to data visualization work? The second project is surrounding how to find stories from a mass of data.

 

I'd love to meet some of you on the West Coast who are fans of the blog. Please also forward this announcement to your friends or colleagues who might be interested.