Glicko 2Edit

Glicko-2 is a probabilistic rating system designed to quantify the strength of players in competitive environments where results are uncertain and players’ true strengths evolve over time. Developed as an improvement over the classic Elo framework, Glicko-2 adds a more dynamic treatment of uncertainty and change, making it a practical tool for both traditional board games and modern online competition. At its core, the system captures not just an overall rating, but also how confident we are in that rating and how much a player’s strength is expected to change as they compete.

The system was created by Mark E. Glickman as an evolution of the original Glicko framework. By introducing a volatility parameter in addition to the rating and rating deviation, Glicko-2 better models real-world performance: players can become significantly stronger or weaker over time, and the model adjusts accordingly rather than treating all players as if their skill were fixed.

Core concepts

  • Rating (mu): A numerical measure of a player’s estimated underlying strength. Higher values indicate stronger play.
  • Rating deviation (phi): An uncertainty measure around the rating. A larger phi means less confidence in the rating, often due to inactivity or limited data.
  • Volatility (sigma): A parameter that captures how much a player’s true strength is expected to change over time. Higher volatility signals greater potential improvement or decline.

These components work together to produce updated estimates after a set of games. The algorithm evaluates the expected score against each opponent and combines that information to update mu, phi, and sigma. The result is a rating that reflects both observed performance and the confidence in that estimate.

How the update works

  • Expected score (E): For each game, the model computes the probability that the player would win, draw, or lose against the opponent given the current mu and phi. This is compared to the actual result.
  • Rating deviation and variance: The uncertainty around outcomes is aggregated across all games in the rating period, producing a measure of how much information the results provide about the player’s strength.
  • Volatility adjustment: The sigma value is adjusted to reflect how much the player’s strength is believed to have changed since the last update. If a player has been consistently improving, volatility may rise, prompting larger updates to mu.
  • New rating (mu', phi', sigma'): The combination of expected scores, variance, and volatility yields a new set of parameters that represent the player’s updated strength and associated uncertainty.

The process is designed to be transparent in its logic while being robust to the noise inherent in competitive play. The time component matters: inactivity increases phi, signaling that the player’s strength is less certain without recent data, while active play tends to reduce uncertainty as more information accumulates.

Inactivity and time factors

Glicko-2 explicitly accounts for the window of time between results. If a player sits out for a period, the system widens the rating deviation to reflect growing uncertainty about their current form. When the player returns to competition, the prior data plus new results are integrated to produce fresh estimates. This approach helps prevent a player’s rating from becoming stale during lulls in competition, while also discouraging complacency by ensuring that recent results carry appropriate weight.

Adoption and impact

  • Chess and other strategy games: Glicko-2 has been adopted by various competitive communities and online platforms seeking a more responsive and informative rating system than Elo alone. By modeling drift and change, it provides a more nuanced picture of a player’s progression and fading strengths.
  • Online gaming and e-sports: The system is well-suited to settings where players compete asynchronously and the pace of match results can vary, allowing ratings to adapt quickly to genuine changes in skill while maintaining comparability across players with different schedules.

These strengths align with a merit-based framework that rewards demonstrable improvement and consistent performance, while recognizing the realities of changing skill with time.

Controversies and debates

  • Transparency vs. complexity: Supporters argue that Glicko-2’s mathematics are straightforward and well-founded, offering a principled way to measure skill with explicit uncertainty. Critics may point to the algorithm’s complexity and the fact that it relies on parameter choices (e.g., how volatility is updated) that users do not see in the same way as a simple wins-and-losses record. From a perspective that prioritizes accessible evaluation, the concern is that opaque mechanics could make players skeptical about how ratings are derived, even if the results are statistically sound.
  • Manipulation and gaming the system: Any rating system based on results can be subject to strategic behavior. Sandbagging (losing intentionally to depress one’s own rating to gain an eventual edge) or collusion among players to trade results can distort true strength signals. Proponents counter that a well-calibrated model, data auditing, and platform governance can reduce such distortions, while the core premise—rewarding proven performance—remains intact.
  • Activity bias and inflation risk: Critics worry that heavy activity could stress-test the volatility parameter, producing volatile ratings that swing with short-term performance. On the other hand, opponents argue that the system’s responsiveness is precisely what makes it useful in dynamic competition, ensuring the rating reflects current form rather than outdated achievements.
  • Incidentals of categorization and fairness: Some observers worry that any rating metric, including Glicko-2, may embed biases related to access to competition, scheduling advantages, or disparities in opportunity. A practical stance is that a transparent, well-documented framework helps ensure fair competition by quantifying strength in a consistent, repeatable way, while acknowledging that no single metric perfectly captures all dimensions of skill.

From a pragmatic, market-oriented viewpoint, a robust rating system like Glicko-2 is valuable because it emphasizes measurable performance, continuous improvement, and competitive merit. Critics who focus on broader social critiques may argue for additional considerations beyond pure measurement, but the core advantage remains: a principled, quantitative method for tracking who performs best under pressure, subject to transparent updates as players compete over time.

See also