Happy Lunar New Year! And greetings to Orlando people who are coming to my dataviz seminar this morning.
***
What’s going on with digit recognition, one of the signature applications of machine learning?
Before self-driving cars, before image recognition, before machine translation, there was digit recognition: computers are trained to read and recognize hand-written numbers. This problem shares several of the key components of problems tailor-made for machine learning methods:
- The correct answer is unambiguous for each item (i.e. image of a digit). The author of the digit has a particular number in mind.
- The range of possible answers across all items is finite. In a decimal system, each image can only be one of 0, 1, 2, ... , 9.
- The end-user only cares about how accurately the digit can be predicted. Causality is not of interest here.
- A massive dataset of labeled images, i.e. images that have been correctly recognized, used to train computers is easily obtained.
- Live application generates more data, which feeds back into the system in a positively-reinforcing manner.
This digit-recognition technology is widely deployed. Check deposit at the ATM machine is one obvious example. In 2016, about 16 billion checks are deposited in the U.S. (source). So what’s wrong with the current state of art?
This snapshot I took at an ATM illustrates the problem:
Recently, I noticed that the ATM has refused to recognize the digits on several checks, asking me to enter the amounts manually instead.
From this evidence, I infer the following:
- Still after these years, the error rate is higher than these banks could absorb. Assuming 10 billion checks read each year at ATMs, even a 0.01% error rate amounts to 1 million errors per year, or about 2,800 errors per day!
- Banks would rather err on the side of caution – when in doubt, ask users to enter the amount. This behavior implies humans make fewer errors than machines, even after including mischief as a source of human error.
- What would a teller do if s/he can't make out the scripted digits? The human would look at the handwritten words "six thousand," solving the problem. Apparently, the ATM does not have handwriting recognition technology, or perhaps its accuracy rate is not high enough. It's a harder problem, though of a similar nature.
***
Why are the banks risk averse? As a victim of one of these errors, I think I understand. Last year, I spent four or six weeks chasing after $20. In this case, the machine read the 2 as a 0. I didn't catch the mistake while at the ATM, but later noticed it on the bank statement.
I learned that convenience comes at a price. The bank's process to verify the amount and correct the mistake is convoluted. It's like missing that exit on the highway, and you now have to go five miles before the next exit. It's a pain for the bank as well as for the client.
One reason why cheques are being less used. The banks are continuing to increase the ease of doing electronic transactions which will eventually mean the end of cheques and most cash use.
Posted by: Ken | 02/10/2019 at 02:55 AM
Just found your blog and site...
Great and informative reading; thank you. And Happy Lunar New Year!
Posted by: Ian Holder | 02/11/2019 at 07:00 PM