On social media, one comes across much casual commentary on outputs of AI tools like ChatGPT of the type “Look, it wrote this beautiful poem” or “Look, it generated this amazing summary of the document”. Recently, I took a forensic look at examples of text summarization in order to provide a balanced view of the strengths and improvement directions for such tools.
The blog post is broken up into four parts, which are released all today so you can binge if you want.
Part 1: Exploring the length of summary, and selection of contents
Part 2: Exploring selection of contents, grammar, and word choices
Part 3: Exploring word choices, and hallucinations
Part 4: Exploring hallucinations
This post is Part 2 of the series.
***
In the previous post, I classified ideas from the original Wall Street Journal article into three groups: Essential ideas, Primary ideas and Secondary ideas. Primary ideas are those considered important by a specific platform, showing up in both runs, but not essential, meaning that at least one platform excluded them in both runs.
Because Mistral uses fewer words, its summaries must have ignored some of the primary ideas retained by Claude and ChatGPT. What are these?
- The entertainment giant said on Wednesday in its earnings report that it is seeking $7.5 billion in annualized cost efficiencies, up from the $5.5 billion it targeted at the beginning of this year.
- Disney […] generated sales of $21.2 billion for its fiscal fourth quarter
- The company’s main streaming service, Disney+, added 6.9 million “core” subscribers…
- Overall, Disney+ ended the quarter with 150.2 million global subscribers,
- ESPN’s operating income for fiscal 2023 fell
- Excluding ESPN, Disney’s traditional TV networks saw revenue fall
Mistral omissions relative to the other platforms fall into three types: overall performance indicators (revenues, operating profit, etc.), specific business divisions (ESPN, TV networks), further details on key business divisions (Disney+ subscriber numbers). Of these, Mistral treated overall performance indicators as "secondary" while completely omitting news about ESPN, TV networks or Disney+ subscriber nambers.
Ponder the first bullet for a second. Five of the six summaries did not mention “earnings report” or “Wednesday,” that is to say, they excluded the context of the article. The one summary that did came from ChatGPT, in which it said, in its last sentence, “Disney’s stock rose after hours following the earnings report, even as it deals with possible new activist campaigns for board influence.” I’m giving ChatGPT a merit, even though I suspect the inclusion of “earnings report” is more accidental than intentional.
Score one for Mistral for being the only platform that considered narrowing streaming losses as a primary idea. Both Claude and ChatGPT dropped what was described in the opening line as one of the “two areas … crucial to the company’s future”. While all summaries addressed the reduction of losses and the projected breakeven in the streaming business at some point, only Mistral elevated the streaming business challenges as a key area of concern.
GRAMMAR
One of the most exciting areas of progress in text processing is the computer’s ability to write grammatical sentences. The six summaries were largely free of grammatical mistakes. Here are several minor mistakes:
ChatGPT #1 wrote:
“Disney+ lost subscribers in India and is considering selling its unit there after failing to secure cricket streaming rights.”
In the original paragraph, it’s clear that Disney is the entity selling the service in India while Disney+ is the entity that lost subscribers.
In compound sentences, ChatGPT appears to have trouble keeping track of the subjects. Another example is
“The company is also seeking strategic partners for ESPN as it transitions to a direct-to-consumer platform.”
What is the “it”? In reality, the company is transitioning ESPN to something. (Besides, the article did not ever associate the strategic partner search with the direct-to-consumer transition.)
ChatGPT #2 also got off rail when talking about the India service. It included
“Despite losing subscribers in India due to lost cricket streaming rights, Disney is considering selling its India unit.”
Instead of “despite”, it should have read “After” or “As a result of a subscriber attrition in India due to …”
An inference mistake was also found in one of the Mistral summaries, which included
“The company's Experiences segment, which includes theme parks, cruise ships, and merchandise licensing, also saw strong growth.”
Note the erroneous insertion of “also”. The previous two sentences were about the losses in the subscription business so “also” sounds out of place. Either omit it or use something like “however”.
WORD CHOICES
Another dimension in which the platforms differ is the degree of paraphrasing (vs direct citation). Mistral summaries tend to lift entire phrases out of the original article, with few changes to wording and phrasing. At the other extreme, ChatGPT likes to rewrite as much as it can.
I favor quotation as word changes often create subtle (and sometimes unwanted) shifts in meaning. However, there are real concerns about plagiarism and copyright that must be addressed with this type of writing aid.
Let me just give a colorful example of word changes to illustrate the risks.
Recall one of the essential ideas of the article:
“The company […] reiterated that it believes streaming will break even by September of next year.”
The Mistral summaries pretty much print this line verbatim. Both Claude’s summaries proactively substituted “next year” with “2023”. But that is the wrong inference! Next year is 2024.
ChatGPT gives different results in each summary: in the first run, it retains “next year” like Mistral; in the second run, it substitutes “next year” like Claude. However, ChatGPT got the year right.
Score one for ChatGPT?
Not so fast, as I'll explain in the next post of the series.
Comments
You can follow this conversation by subscribing to the comment feed for this post.