Quick summary
A threshold is a simple rule: only matches whose probability exceeds a given value (e.g. 60%) are retained. Lowering the threshold increases coverage (more matches), but usually increases noise. Raising the threshold reduces volume but tends to improve accuracy. The goal is not to “guess”, but to choose a coherent trade-off.
Coverage vs accuracy: the real dilemma
A common belief is that selecting only the highest probabilities is enough. In practice, raising the threshold selects clearer matches according to the model, but it also reduces opportunities and increases variance.
- Coverage: number of matches kept after filtering.
- Accuracy: proportion of correct outcomes among selected matches.
- Variance: degree of fluctuation due to small sample sizes.
A simple five-step method
- Define the scope: by league or globally, over a sufficiently long period.
- Test multiple thresholds (50%, 55%, 60%, 65%, 70%).
- Compare curves: where does coverage collapse? where do gains flatten?
- Stabilize: choose a threshold that holds over time, not a short spike.
- Adapt by league: identical thresholds do not behave the same everywhere.
To understand why probability reliability matters: probability calibration .
Practical rules
- Avoid overly aggressive thresholds if you want representative samples.
- Avoid very low thresholds if you seek stability.
- Always consider both success rate and match count.
- Do not search for a universal threshold: leagues differ structurally.
Concrete scenarios
Moving from a 55% to a 65% threshold typically reduces match volume while improving accuracy. The key question is whether the gain justifies the loss of coverage.
- Learning / exploration: moderate threshold.
- Selective analysis: higher threshold, fewer matches.
- Multi-league context: league-specific thresholds.
A useful safeguard: confidence index .
Checklist before fixing a threshold
- Is the sample size sufficient to avoid overfitting?
- Is the threshold stable across months or seasons?
- Are thresholds adapted by league?
- Is uncertainty clearly visible (form, absences, weak signals)?
- Is the meaning of “60%” properly documented as an expected frequency?
Conclusion
Choosing a threshold is not about finding a magic number. It is about defining a consistent balance between volume and stability, ideally league by league, and maintaining it over time.