Scope
This article reports probabilistic model performance metrics. It is not betting advice and does not promise outcomes. The goal is to evaluate predictive signal and calibration quality.
Quick read in 30 seconds
- across 474 matches, the model gets about 302 top outcomes right
- the pure Elo baseline gets about 262 right
- practical gap: about 40 extra correctly classified matches
- probability quality also improves (log loss 0.87 vs 1.02)
So the gain is not only cosmetic. It translates into more matches correctly read while keeping probabilities more statistically coherent.
1) Evaluated volume: 474 usable matches
- 474 Champions League matches analyzed
- multiple seasons included
- group stage and knockout stage covered
- 100% of matches considered usable under the protocol
For an inter-league competition with large tactical diversity, this sample size is enough to extract stable performance patterns.
In practical terms, this reduces the risk of over-reading short hot/cold streaks. The sample includes very different match profiles: structural mismatches in group stage, tighter game states in knockouts, and varying tactical identities across leagues.
2) 1X2 accuracy: clear uplift versus pure Elo
- Top-prediction accuracy: 63.7%
- Pure Elo baseline: 55.3%
- Absolute gain: +8.4 points
Reaching the 63-64% range on a multi-league elite tournament is strong. The uplift confirms the model is not just a dressed-up Elo table and adds real predictive information.
Another way to read it: per 100 matches, Elo gets around 55 right, while the model gets around 64. That is roughly 9 additional correct calls per 100 matches, or about 40 extra over the full 474-match benchmark.
3) Log loss: better probability quality too
- Foresportia average log loss: 0.87
- Pure Elo log loss: 1.02
The model improves more than hard classification. Probabilities are better calibrated against observed outcomes, which is critical for a probabilistic AI product.
Concretely, log loss punishes overconfident wrong calls the most. A model can look decent on hit rate yet still be poorly calibrated if it overstates favorites. Moving from 1.02 to 0.87 signals better probability discipline.
4) Where the model performs best
Elo gap quartiles
- very low gap: about 52%
- low gap: about 59%
- high gap: about 68%
- very high gap: about 74%
As expected, performance rises when structural strength differences increase. The key point is that the model still holds up in middle-gap matches, rather than only working on obvious favorites.
Entropy quartiles
- low entropy (clear match): about 72%
- mid entropy: about 64%
- high entropy: about 55%
- very high entropy: about 49%
This is the most actionable product insight: low entropy aligns with clearer probability structure, while high entropy flags higher variance environments. Exposing entropy helps users read match risk better.
Home, away, draw
- home wins: about 69%
- away wins: about 61%
- draws: about 42%
Draws remain the hardest class, which is standard in 1X2 modeling. The stronger home-win performance suggests the model captures contexts where structural and situational edges align.
Competition phase
- group stage: about 65%
- knockout phase: about 60%
Global reading: the model captures strong structural imbalances well and remains useful in tighter contexts.
The 5-point drop in knockouts is expected: cautious game plans, score management, two-leg dynamics, and higher tactical variance. The relevant signal is that performance remains robust instead of collapsing.
5) Flat-stake simulated ROI and caution
- naive simulation with fixed stake (1 unit)
- odds proxy: 1 / predicted probability
- average outcome: ROI around +3% to +5%
This is consistent with slight internal underpricing in some spots. It remains a simplified stress test, not a market execution result and not betting advice.
To make that number tangible: with 100 selections at 1 unit each, this corresponds to roughly +3 to +5 units in this theoretical setup. It is a calibration signal, not a real-world return forecast.
6) What this validates for Foresportia strategy
- Champions League is structurally model-friendly (Elo dispersion, fewer middle-tier profiles)
- Elo + Poisson + xG + calibration produces measurable incremental signal
- entropy can be exposed as a practical product-side risk indicator
- no aggressive inter-league recalibration is required to maintain robustness in UCL
Product conclusion: this benchmark shows Foresportia is not a pure Elo proxy and provides better-calibrated probabilities with measurable predictive gain.
Strategic summary: Elo provides structural hierarchy, Poisson + xG refine scenario shape, and calibration converts model output into readable probabilities. The observed uplift comes from this combined stack, not from one isolated feature.