The Idea Sparked
So, I was w.atad atching some hockey, you know, Sharks versus Penguins. I thought, "Hey, why not try to predict the winner using some data?" Seemed like a fun way to kill time and maybe even learn something. I’m no sports analyst or anything, just a regular dude who likes messing around with data.
Data Gathering – The Grunt Work
First things first, I needed data. I scoured the internet for past game stats. Sites like ESPN and a bunch of hockey stats websites became my best friends. I was looking for things like:
- Goals scored
- Shots on goal
- Power play success rate
- Penalty minutes
- Face-off win percentage
I tried to grab as much historical data as I could get my hands on, going back a few seasons. Manually copy-pasting this stuff into a spreadsheet was a real pain, but you gotta do what you gotta do.

Data Cleaning – The Necessary Evil
Once I had the data, it was a mess. Dates formatted all kinds of different ways, missing values, typos… you name it. I spent a good chunk of time cleaning it up, making sure everything was consistent and in the right format. This part is never fun, but crucial.
Feature Engineering – Making Things Interesting
Now for the fun part – feature engineering! I started creating new columns based on the existing data. For example:
- Goal differential (goals scored minus goals allowed)
- Win percentage over the last 10 games
- Head-to-head record between the two teams
I figured these might give the model a bit more to chew on than just raw stats.
Model Building – Time to Get Nerdy
I decided to go with a simple logistic regression model. I know, not super fancy, but it's easy to understand and implement. I used Python with scikit-learn. Here's roughly what I did:
- Split the data into training and testing sets.
- Trained the logistic regression model on the training data.
- Made predictions on the testing data.
- Evaluated the model's performance using metrics like accuracy and precision.
I messed around with different features and hyperparameters to see what would give me the best results. It was a lot of trial and error.
The Results – Not Too Shabby
Okay, so the model wasn't perfect, but it was surprisingly accurate. I think it got around 65-70% of the games right on the test set. Not enough to quit my day job, but still, pretty cool. I even tried to predict a few upcoming games, just for kicks.
Lessons Learned – The Takeaways
This whole thing was a learning experience. I realized:
- Data cleaning is the most time-consuming part (and the most important).
- Feature engineering can make a big difference in model performance.
- You don't need a super complex model to get decent results.
Overall, it was a fun project and a good reminder that you can learn a lot by just diving in and getting your hands dirty. Maybe I'll try a more sophisticated model next time, or even add some external data sources like weather conditions or player injuries. Who knows? It's all about experimenting and having fun!