How to ‘beat’ xG using simpler quantities

Introduction

This week I’m want to test a variety of different metrics to compare their predictive power. I have always focused on the predictive and I think providing value as an analyst is almost solely a function of predictive analysis than descriptive. For this reason I am less interested in a descriptive idea such as xG (a measure of how often a chance is scored) but rather what is the correct value of a chance today to predict the success of the team tomorrow.

Most xG models use complex calculations to arrive at values for each shot. A lot of data is used such as shot type, distance from goal, distance from centre line and even parameters such as speed of attack or defensive coverage. This could present a problem for analysts – this data may not be readily available, or they may not have the expertise required to turn the data (e.g. X/Y coordinates) into information (xG value). In this post I am going to be comparing the predictive performance of some simpler shot count metrics with some xG metrics.

Today the the metrics I will be comparing are as follows

Understat.com npxG difference (non penalty expected goal difference)

Understat.com xPts

(Expected points is how many points a team would be have scored on average for the season so far based on simulations of the xG value of their chances.)

Total shot difference per game

(The difference of shots attempted and shots conceded)

Total shot difference per game (game-state adjusted)

(Teams that lead have a shot difference per 90 of at least -5 (so teams that trail have a shot difference of at least +5). This makes leading/trailing teams appear worse/better than their real level. I compare total minutes leading to minutes trailing for each team to adjust for this. For example, a team that has 5.5 more 90s leading vs. trailing in the first half of the season will on average have had their shot difference reduced by ~28 shots owing to game state. As this is due to game state and not their ability I add 28 shots on to their total shot difference. I have written a few articles so far on game state before on my blog so check them out if you want to read more:)

https://syzygyanalytics.co.uk/2025/08/01/soccer-team-rating-iii-game-states-i/ (part 2 linked at the bottom)

Total shot difference (game-state & big chance adjusted)

(A shot that Opta has tagged as a big chance counts double)

Total shot difference (game-state, big chance & goals adjusted)

(A goal is worth a bonus shot)

Market implied team ability

(The betting odds at kick off are potentially an excellent measure of every team’s ability as a lot of smart money has shaped these markets. This makes the team ratings we can derive from betting odds an interesting metric for comparison)

My own team rating metric

I have developed this over about 10 years and included it for the sake of comparison.

***Thanks to Joseph Buchdale at footballdata.co.uk for the betting market odds, whoscored for shot data and understat.com for xG data.***

I will be measuring the performance of the above 8 metrics in 2 different ways.

How well does performance in the 1^st half of the season predict goal difference in the 2^nd half of the season and
How well does performance in the 1^st half of the season predict betting odds at kick off for the 2^nd half of the season.

Results

Figure 1

Figure 2

Interpreting the results

Figures 1 and 2 shows R² values when each metric for the 1^st half of the season is correlated to each team’s goal difference in the 2^nd half of the season.

xPts and npxGD hold similar value. Personally, I slightly prefer npxGD as every chance is credited with its xG value. xPts counts similar chances different because a chance created in a close game is worth more than a chance created in a game where one team is dominant (think about how many xPts a 0.5xG chance is worth in a game with an xG total of 3.0xG – 0.2xG vs. a game that’s 0.9xG – 0.9xG). This is potentially a complex point and not the topic of today’s article.
It is up to you whether you consider figure 1 (goal difference) or figure 2 (market derived ratings) as a better indicator of team strengths. I think goal difference is more susceptible to randomness but it’s also more authentic.
As the 1^st half metrics are based off 17-19 games (quite a big sample), we see goal difference alone predict to a respectable level.
Each adjustment we make to the raw shot difference is valuable. Each extra adjustment may have diminishing returns. This is because playing in leading game states makes it easier to create a higher proportion of big chances. Similarly, more big chances mean more goals. I tried making the adjustments in a different order and found the game adjustment was the best single adjustment followed by big chances then goals.
All 3 adjustments combined outperform the understat xG metric.
My metric shows what extra is still possible (although calculation complexity is on a par with xG, for example it includes shot location).

Conclusion

The conclusion here is fairly clear – you can make a very solid metric with simple quantities like shots, goals, big chances, minutes leading/trailing. This test was based off a sample of 17-19 games, if we were to use a smaller sample, xG would fare even worse against these shot count metrics. I think xG is a fairly disappointing performer in terms of complexity vs. performance and arguably even outright performance.

Please send feedback if you found this interesting today, have any questions or any requests on what content you’d like to see!

recent posts

about

Like this:

Leave a ReplyCancel reply

recent posts

about