The New York Times covers several software companies whose business is to analyze databases of documents and emails in support of litigation. Large corporations are often sued and they have to submit troves of documents and emails to lawyers who charge hourly rates to inspect and summarize the information. Bills could run up to millions in large lawsuits. These technology upstarts claim to be able to replace humans with computers.
I recommend reading this article... with a critical eye. Don't expect help from the journalist (John Markoff) who seems blind to, or incapable of, evaluating the limitations of the technology. The tone and content of this article is typical of anything in the technology pages of our news media: one senses awe, and unbounded optimism, as if the articles were press releases issued by the companies making the technologies.
Here are a few points to ponder:
- Computers apparently don't make any mistakes. People can get tired and overlook things, we are told. But when a computer tells us that the "sentiment" of someone's email is "positive", it never makes a mistake. The truth is all statistical models are imperfect, and mistakes range from acceptable to numerous. Without proper studies of the false positive and false negative rates of these computer algorithms, there is no way for readers to judge whether these technologies represent progress or not.
- Errors are particularly frequent in analyzing "unstructured" data such as text. Time of day is a "structured" type of data while the sentiment expressed in an email is "unstructured". If the computer tells you that the CEO never sends emails after 9 pm at night, there is little reason to doubt the accuracy of that computation. If the computer tells you that emails about the CEO became increasingly "negative", how much confidence can one have in the accuracy of this statement? What is a negative email? If the same email contains both positive and negative comments, how is it determined if it is positive overall? How does the computer recognize sarcasm or irony? A simple comment "good job" can be positive; it can also be negative (if the job is accounting tricks); it can be ironic; it can be completely irrelevant.
- Much noise is made about technology displacing workers, and how this might affect the economy by producing lower-skill jobs. Two things are curiously absent from the article. First, the new technology generates new jobs in new companies and many of these are high-skill jobs like software development, product management, algorithm design, and so on. Second, the computers can surface information but this software does not replace the need to have human beings (analysts) to look at the output, and interpret the information. This key point is often lost: reports and dashboards are useless until someone looks at them. Business analysts also are high-skill jobs.
- Finally, I have direct experience with this sort of legal discovery processes. A giant amount of billings involves people manually scanning all printed documents collected from your office, including every copy of identical documents you might possess. If you distribute copies of a presentation to 20 people in a meeting, and these 20 people are all considered people of interest, all 20 copies will be scanned eventually. These billings I suspect won't be replacable by computers.
I am not saying these companies are hawking vaporware. For some tasks, computers clearly can do a much better job than humans. I just think that our technology reporters can serve us better by covering both the promise and the limitation of new tools. They can start by interviewing users of such software, both satisifed users and unhappy users.
Actually, de-duplication software has gone a long way to eliminate the issue you mention with reviewing 20 copies of the same document. It's easy to group them together (even taking OCR errors into account if they were paper originally) and review them in much less than 20 times the time for one copy.
The Markoff article got quite a bit wrong, not least on the "computers don't make mistakes" front. But the basic message that artificial intelligence technology has made massive inroads into document review, and is displacing some entry level legal jobs, is certainly true.
Posted by: Dave Lewis | 03/06/2011 at 11:33 PM
Dave: your point is relevant but different from mine. I'm talking about the manual labor of xeroxing and/or scanning those 20 copies taken from 20 offices, multiplied by the number of large-group meetings held at said company through many years.
The other howler I didn't even bother to mention was Markoff's claim (or more likely, his parroting someone else's claim) that the IBM computer mimics how humans play Jeopardy.
Posted by: Kaiser | 03/07/2011 at 12:11 PM
Physical copies? How quaint. Most stuff will exist in electronic form and the de-duping software will work pretty well.
Forgetting the overblown claims -- if this can just filter down the mountain of documents to the molehill of stuff that's interesting, a huge amount of hours go away. This is all good.
Posted by: zbicyclist | 03/07/2011 at 05:25 PM
zbicyclist: They were hoping to find hand-written notes. And you're making a molehill out of a mountain of a job to locate "interesting"-ness. I wish them good luck.
Posted by: Kaiser | 03/08/2011 at 12:59 AM
I work for a big4 accounting firm and have, in the past, worked on litigation support, building adhoc databases and simple applications to create virtual case files. It frightens me if the article is accurate regarding how "hands off" the lawyers/paralegals/support have become. Do computers have a place? Of course. Firstly, I think that using computers to prioritize rather than analyze is a very good thing - you might producing damning evidence much sooner leading to a settlement and thereby lower legal fees. And secondly, software can be very helpful in identifying patterns that humans don't see - but this analysis should be in addition to human review, not instead of. The comment by Tom Mitchell makes me chuckle. Ever since I was a university undergrad in the 1980's, we have been on the cusp of a 10-year explosion in the real world applicability of AI.
Posted by: Robert Waters | 03/16/2011 at 05:31 AM