Article · Goal engine · Transparency

Inside Foresportia

Goal markets rebuilt: why Foresportia started from scratch

Published on April 14, 2026

BTTS Over / Under Calibration Brier Score Transparency
BTTS and Over Under goal markets rebuilt by Foresportia AI engine
i

TL;DR

Goal markets (BTTS, Over/Under) are back on Foresportia — not because we flipped a switch, but because we built a separate engine dedicated to goals, decoupled from the match-outcome model. This article walks you through the why, the how, and the numbers — with full transparency.

Why goal markets were pulled in the first place

For a long time, Foresportia's BTTS and Over/Under probabilities were computed from the same engine that powers the 1X2 (home win / draw / away win) predictions.

That sounds logical — after all, goals determine the result. But in practice, the more we tuned the engine to predict who wins, the more we injected outcome-specific adjustments: Elo corrections, draw recalibration, dynamic confidence shifters. Each of those made the 1X2 more accurate, yet they quietly warped the underlying goal distribution.

The consequence was subtle but damaging: goal-market probabilities still looked plausible, but they no longer tracked reality. A displayed 65 % for BTTS might only materialise 50 % of the time. Rather than ship misleading numbers, we took them offline and set out to fix the root cause.

How goal markets are actually computed

The maths behind every goal market starts with one object: a score grid. Each cell in the grid holds the probability of a specific scoreline — 0–0, 1–0, 2–1, and so on.

Let $P(i,j)$ be the probability that the home side scores $i$ goals and the away side scores $j$. Every common goal market is simply a sum over the right cells:

$$P(\text{BTTS}) = \sum_{i \ge 1}\;\sum_{j \ge 1} P(i,j)$$

In plain English: add up every cell where both teams find the net at least once.

$$P(\text{Over 2.5}) = \sum_{i+j \;\ge\; 3} P(i,j)$$

Here, we sum every cell where the total reaches 3 goals or more. Under 2.5 is the complement: every cell where the total stays at 2 or below.

The same logic extends to Over/Under 1.5, 3.5, clean sheets (one team concedes zero), win-to-nil, and so on. Every goal market is derived from the same grid — just by summing different regions.

The coupling problem

In the old setup, the score grid was built by the 1X2 engine and then reweighted so its marginals matched the published match-outcome probabilities.

That guaranteed internal consistency: if we displayed "60 % home win", the most probable scorelines were compatible. But the consistency was misleading for goals.

Every time we improved the 1X2 with outcome-centric tweaks — ranking signals, Elo corrections, a sharper draw prior — the reweighted grid shifted in ways that had nothing to do with how many goals would actually be scored. The engine was optimised to answer "who wins?", not "how many goals will there be?".

This is the key insight: what helps predict the winner does not automatically help predict the number of goals. Coupling the two led to a model that improved on one task while quietly degrading on the other.

The fix: a dedicated goal engine

Rather than patching the old system, we separated the two responsibilities entirely:

  • A match-outcome engine, optimised to predict who wins (1X2)
  • A goal engine, with its own score grid, optimised to model the distribution of goals

The goal engine shares some useful signals with the match engine — advanced form, league-level scoring pace, a dampened home advantage — but it does not inherit the outcome-specific adjustments: no draw recalibration, no dynamic confidence shifters, no Elo overrides designed to sharpen match results.

The principle is straightforward: predicting who wins and predicting how many goals are scored are two different problems. They deserve two different engines, each free to learn from the signals that actually matter for its own task.

Going further: the selective filter

A dedicated goal engine is necessary but not sufficient. A raw probability — even from a well-designed model — is not always actionable.

On top of the goal engine sits a selective filter. Its job: work out whether the model genuinely "knows something" about a given match, or whether it is producing a bland, baseline probability that carries little information.

In practice, the filter cross-references several signals to gauge how confident the goal engine really is. It only publishes a probability when the signal is strong enough to be meaningful.

The trade-off is deliberate: fewer matches covered, but markedly better reliability when a number is shown.

Measuring quality: a quick guide to the metrics

Before diving into the results, here is a plain-language tour of the four metrics we use. No data-science background required.

Brier Score (average error)

The Brier Score captures the average squared gap between what the model announced and what actually happened:

$$\text{Brier} = \frac{1}{N}\sum_{n=1}^{N} (p_n - y_n)^2$$

where $p_n$ is the announced probability and $y_n$ is 1 if the event occurred, 0 otherwise.

  • Lower is better.
  • A Brier of 0 is perfect — every call was spot-on.
  • A Brier of 0.25 is what you get by always predicting 50/50 — zero information.

LogLoss (penalises overconfidence)

LogLoss is harsher than Brier when the model is confident and wrong. Announcing 90 % and being wrong costs far more than announcing 55 % and being wrong.

  • Lower is better.
  • It complements Brier by zeroing in on costly overconfidence.

ECE (Expected Calibration Error)

ECE checks whether probabilities mean what they say. Group all predictions in bins (say, everything between 55 % and 65 %), then compare the average announced probability to the actual hit rate.

  • Lower is better.
  • A high ECE means the displayed percentages are misleading: a stated 60 % does not correspond to a 60 % success rate.

BSS (Brier Skill Score)

BSS benchmarks the model against a naive baseline — for instance, always predicting the league-wide average hit rate for that market.

  • Positive BSS = the model outperforms the baseline.
  • Negative BSS = the model does worse than simply predicting the average.
  • This is the toughest test: a positive BSS means the model genuinely adds information.

Results: three approaches compared

We evaluated three approaches on the same set of recent matches:

  • Legacy system: goal markets derived from the 1X2 engine (reweighted grid)
  • Dedicated engine alone: new goal grid, no selective filter
  • Dedicated engine + selective filter: the full system as published
Market Method Brier LogLoss ECE BSS
BTTS Legacy 0.268 0.731 0.105 -0.073
Dedicated engine 0.282 0.768 0.131 -0.129
Engine + filter 0.248 0.689 0.039 +0.008
Over 2.5 Legacy 0.267 0.732 0.100 -0.071
Dedicated engine 0.288 0.788 0.152 -0.156
Engine + filter 0.236 0.666 0.065 +0.052
Under 2.5 Legacy 0.267 0.732 0.100 -0.071
Dedicated engine 0.288 0.788 0.152 -0.156
Engine + filter 0.236 0.666 0.065 +0.052

Key takeaways

  • Engine + selective filter dominates across all three markets and all four metrics.
  • It is the only approach with a positive BSS — the only one that genuinely outperforms a naive baseline.
  • The legacy and standalone-engine approaches both have negative BSS: on this sample, they were worse than always predicting the market average.
  • The full system's ECE of 0.039 for BTTS is excellent: when it says 60 %, observed reality is very close to 60 %.

The real test: do high-confidence calls actually hit more often?

Aggregate metrics are useful, but the question that matters most is: when the model is confident, does the hit rate follow?

This is where the gap becomes stark.

BTTS

  • Legacy → almost no high-confidence predictions. Above 60 %, only 2 matches.
  • Dedicated engine alone → at the 65 % threshold: 48 % hit rate. Not actionable.
  • Engine + filter → at 60 %: 65 % hit rate across 20 matches.

Over 2.5

  • Dedicated engine alone → 65 % threshold: 39 % hit rate. Calibration broken.
  • Engine + filter → 60 %: 69 %. At 65 %: 80 %.

Under 2.5

  • Legacy → 60 % threshold: 51 %. Overconfident.
  • Engine + filter → 55 %: 78 %. At 60 %: 82 %.
i

What this means

The old system was unable to make hit rates climb as the displayed probability climbed. Showing 70 % was no better than showing 50 %. The new system finally produces probabilities that genuinely discriminate — higher confidence corresponds to higher observed accuracy.

Staying honest: what the model does not do yet

The numbers above are encouraging, but intellectual honesty demands a clear statement of limits.

  • The sample is still young. Results come from a few hundred matches. Enough to validate a direction, not enough to declare the problem solved.
  • Strongest evidence is on BTTS, Over 2.5 and Under 2.5. For other markets (Over 1.5, Over 3.5, clean sheets, etc.) proof is thinner at this stage.
  • Coverage is intentionally limited. The selective filter does not publish on every match. That is by design: fewer but more reliable predictions beats high volume with low signal.
  • The model is a work in progress. It provides a solid foundation for use, but it will keep evolving.

What comes next

The new architecture opens concrete avenues for further improvement:

Per-market calibration

Learn a dedicated calibration curve for each market (BTTS, Over 2.5, Under 2.5, etc.) so that a "70 % displayed" always corresponds to a "70 % observed".

League-level adjustments

Some leagues are structurally high-scoring (Eredivisie), others are tactically tight (Serie A). Tuning goal-engine parameters per league is a natural next step.

Automated quality monitoring

Integrate continuous tracking of Brier, ECE and hit rates by threshold directly into the production pipeline, so any quality drop on a market or league is flagged immediately.

Expanding to more markets

Next candidates: Over/Under 1.5, Over/Under 3.5, clean sheets and win-to-nil.

Conclusion

Goal markets are back on Foresportia — not because the match-outcome engine got better, but because we stopped making goals depend on it.

The recipe:

  • A dedicated goal engine with its own signals
  • A selective filter that only publishes when the signal is clear
  • Transparent evaluation with metrics you can verify

The model is not perfect. The sample is still young. But for the first time, when the system displays a high probability on BTTS or Over/Under 2.5, the observed hit rate follows. That is the foundation everything else can be built on.

Quick FAQ

Why were goal markets taken offline?

Because they were derived from the 1X2 engine. Improving match-outcome predictions inadvertently distorted goal estimates. The displayed probabilities no longer reflected reality.

What counts as a good Brier Score?

Lower is better. Below 0.25 (the coin-flip baseline) the model adds information. The current system sits at 0.236 for Over 2.5 and 0.248 for BTTS.

Why doesn't the filter cover every match?

Because the model does not have a reliable signal for every fixture. Rather than publishing a tepid default probability, we prefer showing nothing when the signal is too weak.

Will goal markets keep improving?

Yes. The current system is a solid but evolving foundation. We are working on per-league calibration, additional markets and automated quality monitoring.

Top reads today

From concepts to practice — explore today's matches.

See today's match reading