All data scientists are taught they must use validation datasets to evaluate their models. This advice doesn't often mean much until one of our models fails in real life. It's one of those things that don't come off well in text. So, I'm glad to present you a case study - inspired by some recent analysis I did of Covid-19 data from the epicenter of Italy's epidemic.
We are about to get inundated with mathematical models making projections of cases and deaths in the U.S. So it's a good time to learn about a key aspect of building good models. Good models are defined as those that explain the past well, as well as predict the future reasonably well.
A model that explains the past but fail to predict the future is not useful. What I explain in this new video is called "back-testing" for people who work in finance.
The key graphic is this:
Notice a divergence in the two lines after March 9. That corresponds to a split in the time-line shown as yellow and green blocks. I explain the huge importance of these two sentences in the video.
Click here to go directly to the video.
***
You can access other videos in my Data Science: The Missing Pieces playlist. For this series, I select the little things that data mining / statistics / machine learning textbooks fail to explain adequately, areas that many students have trouble understanding completely.
In video #6, I explain the idea behind standardizing your data.
In video #3, I discuss the concept of "not statistically significant".
In video #1, I talk about functions, as a computer programming concept.
***
Please join my Youtube channel to get notified of future videos. Feedback is appreciated; as you can see, it takes a lot more effort to make one video than write one blog post. Happy to take requests for topics you'd like to cover in futrue videos as well. Just leave a comment below.
Comments
You can follow this conversation by subscribing to the comment feed for this post.