Teaching Statistics With Baseball

A baseball on the dirt near home plate
(Image credit: Pixabay)

Teaching statistics with baseball can be a homerun. 

For students who already like baseball, the sport offers a lot of learning opportunities, says Benjamin S. Baumer, an associate professor in the Statistical & Data Sciences program at Smith College in Northampton, Massachusetts. Baumer has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets.

“For baseball specifically, but other sports as well, there are a lot of ways in which the game is a natural experiment,” Baumer says. “You get lots and lots of trials with very good data.” 

This data can help students learn how to use statistics to analyze events and make predictions, and baseball fans might be more interested in looking at numbers this way. However, Baumer cautions that not every student has an inherent interest in the game, so you don’t want to force it on all students and strike out with your lesson. 

“Public health is something that you may or may not be interested in, but you probably have some context for understanding it because that affects all of us, but baseball is not like that,” he says. So teachers should be inclusive in their lesson plan and not assume students will be interested in the sport or will be knowledgeable enough about it to help them understand statistics. “If you have that domain knowledge, it's gonna help because it helps to contextualize the data," he says. However, for students who do not watch baseball, using the sport as an example can make things more difficult to understand. 

For example, batting average is one of baseball’s best-known baseball statistics, and it's an indication of probability. If a player gets three hits in ten at-bats, that’s a batting average of .300, and predicts that a player is likely to get a hit 30 percent of the time they come to bat.

With this in mind, Baumer says there are many ways in which baseball can be effectively incorporated into statistics lessons for all students. 

Lahman’s Baseball Database 

This is an open-source database of batting and pitching statistics from 1871 to 2020, and as such, provides students with a wealth of data to explore. “I think the database is just really interesting for helping students understand what relational data is and how it works, independent of baseball,” Baumer says. “In this context, it's okay if you don't know anything about baseball, because you can still understand how to join these two tables together.” 

Analyzing Baseball Data with R 

Baumer co-authored the second edition of this textbook and says baseball can be a great tool for teaching about the programing language R. While this might not work for every student, teaching about R in terms of baseball can be effective for sports fans. 

Teaching Statistics Using Baseball First Edition 

This book by Jim Albert explores how to build statistics lessons around baseball, and Baumer says it remains a great resource for educators looking to do that. 

Baseball Shows How New Data Gives Us New Conclusions 

Baseball can also provide concrete examples of how knowledge improves over time based on new and better data. “There are a couple of examples where we believed this to be the case. And then we studied it, and we thought that it wasn't the case, and then we got better data, and we studied it again, and it turned out it was the case after all,” Baumer says. 

For instance, the thinking on the ability of catchers to frame pitches has come full circle. Scouts, coaches, and players, long believed that the position a catcher held his glove after a pitch might increase the likelihood that it would be called a strike, but then statisticians began to question that for several decades beginning in the 1970s. “In the ‘70s, ‘80s, ‘90s, the data that we had about pitch location simply wasn't granular enough to have any effect show up,” Baumer says. “Many people studied catcher defense, and most of them concluded that there wasn't that much to it in terms of pitch framing. Then we got better data, and it turned out there was a lot to it.” 

This type of reassessment occurs in health and many other fields. Baseball can provide an easy-to-follow example of how and why this happens and can get kids asking important questions about the limitations of data and better experiment design. 

Correction, 4/8/22: The original version of this story incorrectly stated that Baumer advised using baseball to teach about the correlation coefficient, which is often denoted with the letter R. However, Baumer was talking about using baseball to teach the programing language R.

Erik Ofgang

Erik Ofgang is a Tech & Learning contributor. A journalist, author and educator, his work has appeared in The New York Times, the Washington Post, the Smithsonian, The Atlantic, and Associated Press. He currently teaches at Western Connecticut State University’s MFA program. While a staff writer at Connecticut Magazine he won a Society of Professional Journalism Award for his education reporting. He is interested in how humans learn and how technology can make that more effective.