On social media, one comes across much casual commentary on outputs of AI tools like ChatGPT of the type “Look, it wrote this beautiful poem” or “Look, it generated this amazing summary of the document”. Recently, I took a forensic look at examples of text summarization in order to provide a balanced view of the strengths and improvement directions for such tools.
The blog post is broken up into four parts, which are released all today so you can binge if you want.
Part 1: Exploring the length of summary, and selection of contents
Part 2: Exploring selection of contents, grammar, and word choices
Part 3: Exploring word choices, and hallucinations
Part 4: Exploring hallucinations
This post is Part 3 of the series.
***
WORD CHOICES
At the end of the previous post, I noticed that the three LLM platforms treated the phrase "next year" differently. In some summaries, the phrase was directly quoted while in others, the LLM inferred what year "next year" is. But the inference was sometimes wrong.
The one summary that correctly substituted "next year" with 2024 came from ChatGPT. Score one for ChatGPT?
It’s hard to say. Here’s why. I didn’t include the date of the article in the prompt, so it’s not clear how any LLM could figure out what year “next year” is. I dug into the article, and literally found just one clue in the 1,416 words - a sentence that reads “ESPN’s operating income for fiscal 2023 fell 1.7% to $2.8 billion”.
Strictly speaking, this little clue is insufficient to conclude that “next year” is 2024. Because it’s “fiscal year”, not “calendar year”, it’s possible that “fiscal 2023” runs from December 2023 to December 2024. In other words, “fiscal 2023” does not rule out next year being 2025.
Further, we’ve subtly assumed that the article is being read in year 2023. Imagine that someone reads this same article in 2050. The only foolproof way of determining what “next year” means is to know the date the article was written, which is not found in the text itself.
This is an example of why sometimes it’s better to leave the original words alone.
***
Another word change found in most of these summaries is turning “annualized savings” into “annual savings”. As any financial analyst can tell you, they are not the same things; if so, we’d not need two different words!
Ditto with swapping “operating income” with “profit”.
One of the summaries substituted “board seats” with “board influence” when discussing what the activist investor was seeking. There are many ways to influence the board of directors, only one of which involves pushing for board seats.
Claude got creative when dealing with the reasons for better results in the Experiences segment. In one of the summaries, it uses “travel demand” instead of “demand for in-person entertainment experiences” in the original. In the other summary, it states that “demand rebounds for in person entertainment” while the original sentence said “Disney… invested heavily … in the hopes of capitalizing on rising demand for in-person…” One can call this a mild form of “hallucination”.
HALLUCINATIONS
In the LLM community, the term “hallucination” is used to describe “fake” details invented by the LLM tool that are not true. In the last section, I pointed to a case in which the article said Disney expected a rising demand for in-person entertainment experiences, which isn’t the same thing as observing a rebound in demand. It’s a subtle issue of expectation vs observation but extremely important to any reader who cares about the accuracy of the summarization task.
The good news is that by and large, the problem of hallucination is not evident in my test. No numbers were invented, and the numbers seem to have survived paraphrasing.
For example, ChatGPT printed this:
Disney has made strides in its streaming sector, reducing losses from $1.47 billion to $387 million in the recent quarter,
instead of the original text:
The business, which also includes Hulu and ESPN+, lost $387 million in the most recent quarter, down from $1.47 billion a year earlier.
That's nice.
With respect to hallucinations, I found a few sentences to explore further. I'll cover these in the next post of the series.
Recent Comments