Developing A Sport Trading Strategy Part 3: Exposing value bets with Poisson Distribution

In the last post, we worked out the 'expected' result of a game using averages, in this post we are going to look at taking those averages and feeding them into a Poisson Distribution Equation to get the probability of different scorelines occurring. This is a great way of finding value bets, whereby the odds don't match the probable outcome, so if you're not interested in developing a sports trading strategy, you should still take a look at this post and the last one.

So carrying on from the last post, we worked out that a game between Hoffenheim and Wolfsburg had a scoring average of:
Hoffenheim = 2.000 (Home)
Wolfsburg = 0.411 (Away)
Resulting in a scorline of 2-0
If this is the average outcome, then what is the likely out come of other scores?
In the last post I said we would look at Poisson Distribution, but we ended up spending more time then I expected looking at how to get the 'likely' scoreline of a game using historical data. This time we have a whole post focusing on Poisson Distribution.

Poisson Distribution

First its a good idea to understand what Poisson Distribution is used for. Poisson Distribution helps us to predict the probability of certain events happening when you know how often the event has occurred. Essentially it gives us the probability of a given number of events happening.

How can we apply this to football?

Well, this is what we were working towards in the last post. In order to use this system you need to know what the 'expected' number of goals is. So looking at the equation again:
p(k:λ) = (λx e-k) / x!
We can now modify this to get the implied probability of each goal for each team. So starting with Hoffenheim:
λ = 2.000 goals in 90 minutes
to work out 0 goals we can do:
0 goal  p(k = 0) =  9%
1 goal  p(k = 1) =  21%
2 goals p(k = 2) = 26%
3 goals p(k = 3) = 20.94%
4 goals p(k = 4) = 12.60%
I would stop at 4 but you can continue if you wish. Remember more goals means you cover more of the book. Then we do the same for the away team, however we change the lambda to represent the away teams number of goals like so:
λ = 0.411 goals in 90 minutes
0 goal  p(k = 0) =  55%
1 goal  p(k = 1) =  32%
2 goals p(k = 2) = 9.53%
3 goals p(k = 3) = 1.86%
4 goals p(k = 4) = 0.27%
There's a pretty strong chance Wolfsburg will score 0 goals, but there's also a strong second favourite with 1 goal. However Hoffenheim has a pretty even spread between 1,2 and 3, with 2 edging it by a few percent.

Using this information we can build a matrix to determine the implied probability of the correct score

 Correct Score Probability Hoffenheim 0 1 2 3 4 5 Wolfsburg 9.01% 21.69% 26.10% 20.94% 12.60% 6.06% 0 55.73% 5.02% 12.08% 14.54% 11.67% 7.02% 3.38% 1 32.58% 2.94% 7.07% 8.50% 6.82% 4.10% 1.98% 2 9.53% 0.86% 2.07% 2.49% 1.99% 1.20% 0.58% 3 1.86% 0.17% 0.40% 0.48% 0.39% 0.23% 0.11% 4 0.27% 0.02% 0.06% 0.07% 0.06% 0.03% 0.02% 5 0.03% 0.00% 0.01% 0.01% 0.01% 0.00% 0.00%

We work out the rest of the correct score combinations by multiplying Wolfsburg's goal probability for that  score, with Hoffenheim's probability for that score. For example 1-0 would be:

Let h be Hoffenheim's probability
h = p(k = 1)

Let w be Wolfsburg's probability
w = p(k = 0)
h * k = 0.12
Converted to percentage = 12.08%.

Therefore there is an implied probability of 12.08% that this scoreline will occur based on historical data.

These correct scores represent the implied probability up to 90% of the book, we determine this by getting the sum of all scores represented in the matrix. The rest of the book is divided up over higher score combinations like 6-0, 0-6, 6-1, 1-6, 6-2 etc.

Note that some of the probabilities are 0.00%, really this is at around 0.003%, I only show two decimal places, to many numbers for my eyes otherwise

We can now use this implied probability to determine the odds (minus the overround)  -described in a previous post- of the correct score.

We can do this by using the equation:
Let d be the decimal
= (1 / p)
In less nerdy terms, the decimal equals 1 divided by the probability, which is the percentage shown in the matrix above. Applied to all outcomes, we get the following decimal odds table:

 Correct Score Decimal Odds Hoffenheim 0 1 2 3 4 Wolfsburg 11.10 4.61 3.83 4.78 7.94 0 1.79 19.92 8.27 6.88 8.57 14.24 1 3.07 34.06 14.15 11.76 26.30 24.36 2 10.50 116.53 48.41 40.23 50.14 83.33 3 53.87 597.91 248.42 206.42 257.29 427.59 4 368.54 1000+ 1000+ 1000+ 1000+ 1000+
Code driving my table scratches out odds over 1000. I'm not interested in them.

We can see that, the probable score line is likely to be Hoffenheim 2, Wolfsburg 0 we can compare these odds to book makers to find out if a particular scoreline is under or over valued, giving us the ability to find value bets on the scoreline. However, keep in mind we have not factored in the bookmakers overround

This is great, we now have enough information to determine the overs/unders market, the 1x2 market and a whole heap of other markets.

Whats next?

I hope this has been useful. Some additional tips, Google sheets and excel have a Poisson function, provided you give it the right variables (as described in the previous posts) it will do a lot of the leg work for you.

Right then. So we have our strategy, we have a spread sheet backed up by historical data to get un biased probabilities (Which could be tweaked to factor in team form, play style, weather, foul data etc... But lets save that for another post). Now we need to back test our strategy by firstly testing the unders market against old data. Then dry running against real data to get more accurate test results. We can then analyse this information to tweak entry and exit points as well as determine the feasibility of our bank management and limitation processes we've put in place. Till next time!