First things first, gathering the data. I scoured the web, baseball reference mostly, for historical blue jays stats. I'm talking batting averages, earned run averages, wins, losses, all that good stuff. Basically, I needed a solid base to work with. I downloaded a bunch of CSV files and started cleaning them up. This part was tedious, but crucial. Missing data? Had to fill it in or toss the row. Inconsistent formatting? Fixed it. You know the drill.
Next up, choosing a model. I messed around with a few d.wen gnihtemifferent options. Initially, I thought about using a simple linear regression. But then I figured, nah, let's get a little fancier. So, I decided to give a try to a random forest regressor. Seemed like a good fit for the type of data I had. Plus, I wanted to learn something new.

Then came the fun part: feature engineering. This is where I got to play around with the data and create new features that I thought might be predictive. Things like win percentage over the last 10 games, average runs scored per game, and even some more complex stuff like moving averages of key stats. I really tried to think like a baseball analyst and figure out what factors would really influence a game's outcome.
Training the model. Split my data into training and testing sets. Used the training data to fit the random forest model. Cranked up the number of estimators and messed around with the hyperparameters until I got something that seemed reasonable. Of course, I used cross-validation to make sure I wasn't overfitting to the training data.
Evaluating the model. Now for the moment of truth! I ran the test data through the model and compared the predictions to the actual results. I used metrics like mean squared error and R-squared to get a sense of how well the model was performing. It wasn't perfect, of course. There was definitely room for improvement. But it was good enough to get a general sense of how the Blue Jays were likely to perform.
Visualizing the results. Threw together some quick plots to see what was going on. Scatter plots of predicted vs. actual wins, histograms of the prediction errors, that kind of thing. It helped me get a better handle on where the model was succeeding and where it was struggling.
Finally, made some predictions. I used the model to predict the outcome of a few upcoming Blue Jays games. Obviously, it's just a prediction, and baseball is unpredictable. But it was cool to see the model in action and get a sense of its capabilities. I even tracked how the predictions went against the actual games to see how well it was performing in the real world.
It was a cool little project, and it definitely taught me a lot about data analysis and machine learning. I'm already thinking about ways I can improve it in the future, maybe by adding more data sources or trying out different models. But for now, I'm happy with what I accomplished.