The Eagles are going to win the Super Bowl this weekend1. At least according to my open-source LLM-powered pick-em’s bot, which performed better than 94.5% of players in ESPN’s 2024 Pick-Em’s competition this year.2
To quote my bot directly:
The Eagles are the logical choice
At the end of this post, I’ll share the agent’s full in-depth reasoning behind that statement. First, let’s talk about what improvements we made to the 2024 pick-em’s bot versus the 2023 version (which ranked in the top 15%).
<think>
Thinking is all the rage now with DeepSeek R1 being in every headline, but our pick-em’s bot has been thinking for two football seasons now! For the 2024 season, we really ramped it up though.
In 2023, we used a pattern that unconditionally wrapped all of our prompts in Chain-of-Thought, splitting out the bot’s analysis and conclusion as two different steps.
In 2024, we evolved this substantially – with massive credit to the broader open-source LLM community. As test-time compute and reasoning were becoming popular, the community started sharing prompts that encouraged deeper thinking from existing models. These prompts contained explicit “thinking steps” and “thinking budgets” and permission to alter course, backtrack, etc. I’ve seen many variants, but I believe the original specific prompt I forked came from Harish, an AI security engineer at Cisco.
We did make some tweaks to this prompt though. Since we leverage structured output, we change many suggestions in the prompt into mandates. For example, we can force the LLM to think for 15 steps3, whether it wants to or not. And we can force each thought to follow a very structured pattern.
For each of those 15 steps, we force the LLM to:
Come up with a topic to think about.
Spend time thinking on that topic.
Reflect on the current approach and reasoning up until this point.
Give the current thought a score based on how good it thinks the current line of thinking is.
Make a decision to continue, backtrack, or adjust its thinking in the next thought.
Rinse and repeat 15 times.
Incorporating this into our prompt allowed us to immediately leverage test-time compute and approximate the same behavior as reasoning LLMs, even though API access to gpt-o1
,at the time, was still months away.
Even better, unlike gpt-o1
, this approach gives us full transparency into the thinking process (much like DeepSeek-R1).4
You can read the full prompt and schema we use here.
Structured Output
One of the biggest superpowers that foundation models started supporting this year, imo, is structured output. It’s likely the most underutilized tool that LLM practitioners have at their disposal.
OpenAI’s implementation is not without gotchas – with limitations around schema depth, field nullability, additional properties, and others – but it allows us to control the type and the order that the LLM generates its output.
That is, it is not just a means to get data back in a certain format. It has a second important property: OpenAI’s LLMs generate their output in the same order that you define the fields in your schema. This allows you to hardcode a workflow that the LLM must follow.
In our pick-em’s bot, we use it to force the LLM to think for 15 steps. We also require that the final answer is the last thing it generates.
Without structured output forcing a final answer at the end, the LLM might produce its conclusion up front and spend the entire reasoning chain justifying that conclusion rather than genuinely exploring alternatives (akin to rolling a D20 in DnD and then coming up with the scenario that justifies the roll).5
Over the holidays, my co-founder wrote a post that touched on structured output. In a different post, he does some fun shenanigans to distort the LLM’s probabilities.
Model Upgrades
One other major thing that changed between years is the foundation model itself. In 2023 we used gpt-4
and switched to gpt-4-turbo
when that became available in the middle of the season.
This year, we were 100% on gpt-4o
from start-to-finish. It’s a far more capable model than its predecessor, and is actually likely the high-order bit in the performance improvement this year.
What’s Next?
Who knows what 2025 will bring for LLMs, but the thing that is top of mind right now for me is around measuring the robustness of the predictions. I’m thinking things like:
Self-consistency across multiple models (e.g. running the analysis with OpenAI, Gemini, Anthropic, LLaMa, DeepSeek and taking the majority vote)
Measuring how many times the LLM backtracks in its thinking versus continuing. I suspect that results where the thinking chain is a series of "continues” would be more robust than chains that have many “backtracks”, but it’d be good to measure that.
Also, 2025 is shaping up to be a big year for local LLM models, so maybe I’ll be running this year’s bot locally?
Finally, I’m curious if the deep research LLMs popping up right now will obviate the need for the pick-em’s agent. Will they be able to retrieve similar team stats, news articles, etc. and do the same analysis that our bot orchestrates explicitly today?
We’ll have come a long way if next year’s pick-em’s bot is nothing but a series of questions sent to a general purpose deep research LLM.
Full Analysis
Here is the full line of reasoning that the bot followed to come to the conclusion that the Eagles will win.6 Note that in the LLM’s context, by the time we get to this point we’ve already given it summaries of all the relevant news articles and stats we crawled on ESPN.
Step 1: Offensive Comparison
Thinking
The Philadelphia Eagles have a stronger offensive performance compared to the Kansas City Chiefs. The Eagles have a higher net total yards per game (367.2) compared to the Chiefs (327.6). Additionally, the Eagles excel in rushing yards per game (179.3) compared to the Chiefs (105.3). This indicates a more balanced offensive strategy, with the ability to both pass and rush effectively. The Eagles also score more points per game (27.2) than the Chiefs (22.6), suggesting a more potent offense overall.
Reflection
The offensive stats clearly favor the Eagles, especially in terms of rushing and overall scoring. This gives them an edge in controlling the game tempo and potentially wearing down the Chiefs' defense.
Reward: 0.9
Next Step: Continue
Step 2: Defensive Comparison
Thinking
Defensively, the Philadelphia Eagles also have an advantage. They allow fewer net total yards per game (278.4) compared to the Chiefs (320.6). The Eagles are particularly strong against the pass, allowing only 174.2 passing yards per game, while the Chiefs allow 218.8. This could be crucial in limiting the effectiveness of Patrick Mahomes and the Chiefs' passing attack. The Eagles also allow fewer points per game (17.8) compared to the Chiefs (19.2), indicating a more robust defense overall.
Reflection
The Eagles' defensive stats suggest they are better equipped to handle the Chiefs' offensive threats, particularly in the passing game. This defensive strength could be a deciding factor in the game.
Reward: 0.85
Next Step: Continue
Step 3: Turnover Analysis
Thinking
The Eagles have a superior turnover ratio (+11) compared to the Chiefs (+6). This indicates that the Eagles are better at taking care of the ball and creating turnovers. The Eagles have fewer giveaways (15) compared to the Chiefs (14), and they have more takeaways (26) than the Chiefs (20). This ability to win the turnover battle could be crucial in a high-stakes game like this, as turnovers can quickly change the momentum.
Reflection
Turnovers are often a critical factor in determining the outcome of games. The Eagles' ability to generate turnovers and protect the ball gives them an additional edge.
Reward: 0.8,
Next Step: Continue
Step 4: Special Teams Analysis
Thinking
Special teams performance is relatively even between the two teams. The Chiefs have a slight edge in kick return yards per return (27.1) compared to the Eagles (26.5), and in punt return yards per return (9.9) compared to the Eagles (9.2). However, neither team has scored a touchdown from kick or punt returns, indicating that while special teams may play a role, it is unlikely to be a decisive factor in this matchup.
Reflection
While the Chiefs have a slight advantage in special teams, it is not significant enough to outweigh the Eagles' advantages in other areas. Special teams are unlikely to be the deciding factor in this game.",
Reward: 0.7,
Next Step: Adjust
Step 5: Impact of Key Players
Thinking
Patrick Mahomes is a key player for the Chiefs, known for his ability to make big plays, especially in crucial moments. However, the Eagles' defense is well-equipped to limit his effectiveness, particularly with their strong pass defense. On the other hand, Jalen Hurts has shown the ability to manage games effectively and make plays when needed, especially with his dual-threat capability. His ability to avoid turnovers and contribute to the rushing attack could be pivotal.
Reflection
While Mahomes is a significant threat, the Eagles' defense is well-suited to counter his strengths. Hurts' ability to manage the game and contribute in multiple ways gives the Eagles a balanced attack.
Reward: 0.8
Next Step: Continue
Step 6: Recent Performance and Momentum
Thinking
The Eagles have been gaining significant support from bettors, indicating confidence in their recent performances. The shift in betting lines towards the Eagles suggests that they are perceived as the stronger team heading into this matchup. Additionally, the Eagles have shown resilience and the ability to perform under pressure, as highlighted in recent articles.
Reflection
The momentum and confidence in the Eagles' recent performances add to their overall advantage. This psychological edge could play a role in their performance during the game.
Reward: 0.85
Next Step: Continue
Step 7: Critique of Initial Analysis
Thinking
While the Eagles have advantages in several statistical categories, it's important to consider the Chiefs' experience in high-pressure games, particularly with Mahomes at the helm. The Chiefs have been in multiple Super Bowls recently, which could give them an edge in terms of experience and composure. However, the Eagles' statistical advantages and recent momentum seem to outweigh this factor.
Reflection
The Chiefs' experience is a valid consideration, but the Eagles' statistical advantages and momentum are compelling. The analysis still leans towards the Eagles as the likely winners.
Reward: 0.75
Next Step: Adjust
Step 8: Final Decision
Thinking
After considering all factors, including offensive and defensive stats, turnover ratios, special teams, key players, and recent performance, the Philadelphia Eagles appear to have the upper hand. Their balanced offense, strong defense, and ability to generate turnovers give them a comprehensive advantage over the Chiefs.
Reflection
The decision to favor the Eagles is well-supported by the data and analysis. While the Chiefs are a formidable opponent, the Eagles' strengths in multiple areas make them the likely winners.
Reward: 0.9
Next Step: Continue
Step 9: Summary of Analysis
Thinking
The Philadelphia Eagles have a stronger overall profile compared to the Kansas City Chiefs. Their offensive and defensive stats are superior, particularly in rushing and pass defense. The Eagles also have a better turnover ratio, which is crucial in high-stakes games. While the Chiefs have experience and a strong quarterback in Mahomes, the Eagles' balanced attack and recent momentum make them the favorites.
Reflection
The analysis is comprehensive and considers all relevant factors. The Eagles' advantages in key areas are clear, making them the logical choice for the winner.
Reward: 0.9
Next Step: Continue
Step 10: Final Reflection
Thinking
Reflecting on the entire analysis, the Philadelphia Eagles are the more well-rounded team. Their strengths in both offense and defense, combined with their ability to control turnovers, give them a significant edge. The Chiefs' experience and Mahomes' talent are notable, but the Eagles' overall profile is more compelling.
Reflection
The analysis has been thorough and considers all angles. The Eagles' advantages are clear, and the decision to choose them as the winners is well-founded.
Reward: 0.9
Next Step: Continue
Step 11: Conclusion
Thinking
Based on the analysis, the Philadelphia Eagles are the likely winners against the Kansas City Chiefs. Their superior stats in key areas, combined with recent momentum and betting trends, support this conclusion.
Reflection
The conclusion is consistent with the analysis and reflects a well-rounded consideration of all factors. The Eagles are the logical choice based on the data.
Reward: 0.9
Next Step: Continue
Step 12: Final Answer
Thinking
The Philadelphia Eagles are chosen as the winning team based on their superior offensive and defensive stats, better turnover ratio, and recent momentum. The analysis supports this decision comprehensively.
Reflection
The final answer is consistent with the analysis and reflects a thorough consideration of all relevant factors. The Eagles are the logical choice.
Reward: 0.9
Next Step: Continue
Step 13: Re-evaluation of Key Points
Thinking
Re-evaluating the key points, the Eagles' advantages in rushing, pass defense, and turnovers are significant. These factors are critical in determining the outcome of the game. The Chiefs' experience is a factor, but not enough to outweigh the Eagles' strengths.
Reflection
The re-evaluation confirms the initial analysis. The Eagles' strengths are decisive, and the choice remains valid.
Reward: 0.9
Next Step: Continue
Step 14: Consideration of External Factors
Thinking
External factors such as weather, injuries, and home-field advantage could play a role. However, there are no significant injuries reported, and the game is played in a neutral location, minimizing these factors' impact.
Reflection
External factors do not significantly alter the analysis. The Eagles' advantages remain compelling.
Reward: 0.85
Next Step: Continue
Step 15: Final Thought
Thinking
The Philadelphia Eagles are the likely winners based on a comprehensive analysis of offensive, defensive, and turnover stats, as well as recent performance and momentum. The Chiefs are a strong team, but the Eagles' balanced attack and defensive prowess give them the edge.
Reflection
The final thought aligns with the entire analysis. The Eagles are the logical choice, and the decision is well-supported by the data.
Reward: 0.9
Next Step: Continue
Final Thought
The Philadelphia Eagles are favored to win against the Kansas City Chiefs based on superior offensive and defensive stats, better turnover management, and recent momentum. While the Chiefs have experience and a strong quarterback, the Eagles' balanced attack and defensive strengths make them the likely winners.
Final Reflection
The analysis has been thorough and considers all relevant factors. The Eagles' advantages in key areas are clear, making them the logical choice for the winner.
Conclusion
Winning Team: Philadelphia Eagles
I was born and raised in Philly. If my bot hadn’t chosen the birds, I’d have deleted it, tbh.
My agent correctly predicted 70% of matches. So, if you’re a frequentist, I guess there’s a 30% chance that the Chiefs win.
Why 15? I tried 20 steps and our schema had 125 fields. OpenAI’s max is 100.
Fortunately, according to a recent AMA with Sam Altman, the success of transparency in open-source reasoning LLMs has forced them to rethink hiding their own thinking tokens.
See also: Autoregressive Models
It hopefully goes without saying, but don’t wager any money based on anything you’ve read in this post.