This past week, the New York City subway lines were suffering from egregious delays. The worst was waiting almost half an hour for a 6 train at Canal Street. The ventilation was so poor inside the station one felt being steamed alive.
Previously, I have written about, and praised, the count-down clocks in the subways (link). NYC was late to the game but better late than never!
***
When I entered Canal Street, the display only showed the impending arrivals for the downtown trains. I was going uptown. The count-down clocks on the uptown platform displayed no information. After waiting about 20 minutes, all of a sudden, the clocks woke up, showing the next train was arriving in 5 minutes.
You could sense the excitement on the platform. Everyone woke up from the heat stupor. An announcement drafted through the station, adding a measure of assurance.
Three minutes passed. The clock now went from "2 min" to "Delay".
Eventually, a train approached. And it went right past us. This local train was supposed to stop at Canal Street but it didn't, without warning.
The clock acted as if nothing happened. A new entry showed up, indicating that the next train would arrive in 6 minutes.
And this time, the train did stop, to the relief of dozens of sweating passengers.
***
Imagine you are the analyst who pulled down the data to analyze subway wait times at Canal Street. The dataset contained all notices that showed up on the count-down clocks. This is an event log, so when there are no notices, there will be no entries.
For the half hour that I waited, the following entries would have been added to the log:
- Train arriving, 5 min
- Train arriving, 4 min
- Train arriving, 3 min
- Train arriving, 2 min
- Train arriving, Delay
- Train arriving, 6 min
- Train arriving, 5 min
- Train arriving, 4 min
- Train arriving, 3 min
- Train arriving, 2 min
- Train arriving, 1 min
- Train arriving, 0 min
The analyst is in danger of making erroneous "insights" such as:
- that the longest wait during this period was 6 minutes
- that there were two pickups during this period
- that the passengers who finally got on board the train waited a maximum of 6 minutes
***
Further, I am worried that the operators are gaming the system. Why was there no display when I first showed up? It should have said next train arriving in 20 minutes. If I saw that, I would have walked to the other platform and took the other line. The count-down clocks are most useful in dealing with outlier situations like this and so the outlier data should never be suppressed.
In outlier situations, they are probably weary about committing to any particular time, even if it's long. The uncertainty might be too great.
In which case, maybe we can get them to report confidence intervals on the arrival times?
Posted by: Mike | 07/18/2016 at 08:17 AM
The simple answer to why they didn't display anything was that they had no data. If there is a problem then they have to wait an indeterminate amount of time for it to clear. Then they realised that the first train through will be packed so rather than making it an all stations, they made it an express so it didn't stop but did get it out of the way.
Hopefully when they calculate delays they use the difference from when a train should have arrived until when the next arrived, but it wouldn't surprise me if trains just disappeared.
Posted by: Ken | 07/26/2016 at 02:39 AM
Always assume anyone compensated as a result of a metric is likely to try to game it.
Posted by: zbicyclist | 07/26/2016 at 11:48 AM