I had the pleasure of attending the final presentations of this year's graduates from Parsons's MS in Data Visualization program. You can see the projects here.
A few of the projects caught my eye.
A project called "Authentic Food in NYC" explores where to find "authentic" cuisine in New York restaurants. The project is notable for plowing through millions of Yelp reviews, and organizing the information within. Reviews mentioning "authentic" or "original" were extracted.
During the live presentation, the student clicked on Authentic Chinese, and the name that popped up was Nom Wah Tea Parlor, which serves dim sum in Chinatown that often has lines out the door.
Curiously, the ranking is created from raw counts of authentic reviews, which favors restaurants with more reviews, such as restaurants that have been operating for a longer time. It's unclear what rule is used to transfer authenticity from reviews to restaurants: does a single review mentioning "authentic" qualify a restaurant as "authentic", or some proportion of reviews?
Later, we see a visualization of the key words found inside "authentic" reviews for each cuisine. Below are words for Chinese and Italian cuisines:
These are word clouds with a twist. Instead of encoding the word counts in the font sizes, she places each word inside a bubble, and uses bubble sizes to indicate relative frequency.
Curiously, almost all the words displayed come from menu items. There isn't any subjective words to be found. Algorithms that extract keywords frequently fail in the sense that they surface the most obvious, uninteresting facts. Take the word cloud for Taiwanese restaurants as an example:
The overwhelming keyword found among reviews of Taiwanese restaurants is... "taiwanese". The next most important word is "taiwan". Among the remaining words, "886" is the name of a specific restaurant, "bento" is usually associated with Japanese cuisine, and everything else is a menu item.
Getting this right is time-consuming, and understandably not a requirement for a typical data visualization course.
The most interesting insight is found in this data table.
It appears that few reviewers care about authenticity when they go to French, Italian, and Japanese restaurants but the people who dine at various Asian restaurants, German restaurants, and Eastern European restaurants want "authentic" food. The student concludes: "since most Yelp reviewers are Americans, their pursuit of authenticity creates its own trap: Food authenticity becomes an americanized view of what non-American food is."
This hits home hard because I know what authentic dim sum is, and Nom Wah Tea Parlor it ain't. Let me check out what Yelpers are saying about Nom Wah:
- Everything was so authentic and delicious - and cheap!!!
- Your best bet is to go around the corner and find something more authentic.
- Their dumplings are amazing everything is very authentic and tasty!
- The food was delicious and so authentic, and the staff were helpful and efficient.
- Overall, this place has good authentic dim sum but it could be better.
- Not an authentic experience at all.
- this dim sum establishment is totally authentic
- The onions, bean sprouts and scallion did taste very authentic and appreciated that.
- I would skip this and try another spot less hyped and more authentic.
- I would have to take my parents here the next time I visit NYC because this is authentic dim sum.
These are the most recent ten reviews containing the word "authentic". Seven out of ten really do mean authentic, the other three are false friends. Text mining is tough business! The student removed "not authentic" which helps. As seen from above, "more authentic" may be negative, and there may be words between "not" and "authentic". Also, think "not inauthentic", "people say it's authentic, and it's not", etc.
One thing I learned from this project is that "authentic" may be a synonym for "I like it" when these diners enjoy the food at an ethnic restaurant. I'm most curious about what inauthentic onions, bean sprouts and scallion taste like.
I love the concept and execution of this project. Nice job!
Another project I like is about tourism in Venezuela. The back story is significant. Since a dictatorship took over the country, the government stopped reporting tourism statistics. It's known that tourism collapsed, and that it may be gradually coming back in recent years.
This student does not have access to ready-made datasets. But she imaginatively found data to pursue this story. Specifically, she mentioned grabbing flight schedules into the country from the outside.
The flow chart is a great way to explore this data:
A map gives a different perspective:
I'm glad to hear the student recite some of the limitations of the data. It's easy to look at these visuals and assume that the data are entirely reliable. They aren't. We don't know that what proportion of the people traveling on those flights are tourists, how full those planes are, or the nationalities of those on board. The fact that a flight originated from Panama does not mean that everyone on board is Panamanian.
The third project is interesting in its uniqueness. This student wants to highlight the effect of lead in paint on children's health. She used the weight of lead marbles to symbolize the impact of lead paint. She made a dress with two big pockets to hold these marbles.
It's not your standard visualization. One can quibble that dividing the marbles into two pockets doesn't serve a visualziation purpose, and so on. But at the end, it's a memorable performance.