Analyzing Test Match Bowlers: My Cricket Analytics Project

Posted July 15, 2025

In this project, I'm building a data-driven system to compare and rank Test match bowlers across eras. The goal is to go beyond simple stats and use normalization, era adjustment, and clustering to find the best all-round performers in cricket history.

What I'm Doing

Collecting a comprehensive dataset of Test bowlers and their career stats.
Normalizing key metrics (average, strike rate, economy, wickets per match, 5-fors, 10-fors, career span).
Adjusting for era so bowlers from different periods can be compared fairly.
Defining an "ideal" bowler using the top 10% in each metric.
Scoring each bowler by their distance from this ideal using Euclidean distance.
Using k-means clustering to find natural groups of similar bowlers.

Why This Matters

Cricket stats are often skewed by era, match conditions, and longevity. By normalizing and clustering, I hope to provide a more objective, nuanced view of bowling greatness.

See the Code

The full analysis is in test/analyze_bowlers.py in test repository. Check it out if you're interested in cricket analytics or want to try to improve it.

View the code and data on GitHub →