Just posted a short video that explains one of the techniques used to work with observational data (or found data). This type of data is extremely common in the Big Data world. The data are being collected by some operational process, and in the indelible words of the late Hans Rosling, they are a bag of numerators without denominators. In this case, the database is for car crash fatalities. You only have the reported crashes linked to deaths but that database does not contain any information about crashes that do not have fatalities or safe driving.
Just like most scientific studies, the original researchers made a claim of statistical significance, i.e. they found something out of the normal. (There was an excess of fatalities on April 20.) However, a second research group took a different look at the data and demonstrated that what happened was more common than first thought.
How do statisticians measure how common something is? One takeaway is how to define the reference (control) group. Another takeaway is replications, repeating a style of analysis over different slices of the data.
Click here to see the video. Don't forget to subscribe to the channel to see future videos.
For a long-form discussion of what is covered in the video, see this blog post.
Comments