ncaa26
A predictive modeling pipeline for NCAA Division I basketball tournament outcomes that derives custom team ratings directly from raw box scores and tournament data without relying on external rating systems. Built with XGBoost and calibrated probability models, it achieved top-20% performance on the Kaggle March Machine Learning Mania competition by engineering features like ELO ratings, Simple Rating System scores, efficiency metrics, and momentum indicators. The project demonstrates spec-driven development practices with automated feature engineering, cross-validation, and model orchestration entirely in Python.
NCAA26 is a comprehensive predictive modeling pipeline for NCAA basketball tournament outcomes that builds custom team ratings from raw box scores without external rating systems, achieving top-20% performance on Kaggle. The project demonstrates spec-driven development with Claude Code, featuring engineered metrics like ELO ratings, SRS scores, and momentum indicators orchestrated through automated Python pipelines.
- ✓Beats KenPom ceiling on men's data (CV Brier 0.1920 vs 0.1927) using only raw box scores without external rating dependencies
- ✓Exemplary spec-driven development workflow with numbered pipeline scripts, scenario tests, and clear separation of concerns across ingestion/features/models/competition modules
- ✓Comprehensive feature engineering including haversine distance calculations for tournament travel, momentum indicators, and sophisticated SRS implementation via stable linear solve
- →Add unit tests and integration tests beyond scenario tests to ensure code reliability and facilitate refactoring
- →Implement configuration management system to replace hardcoded parameters scattered across modules and enable easier experimentation