Conditional Logistic Regression for Traders

Dr. Silverman introduces logistic regression modelling using the example of horse racing and discusses the benefits and applications of conditional logistic regression.

Categories: Horse Racing, Prices, Professional, Statistical models, STN Journal, The basics

Logistic Regression (Or the “Logit Model”) is a fundamental tool for modelling and predicting the outcome of 0/1 events. (Win or lose).  In this article, I’m going to take things a bit further and explain how the conditional logistic regression model is very applicable to a lot of contests where there can be more than two outcomes.  (i.e. a horse race.)

First, I’ll quickly review linear regression, move to a logistic regression, and then finally cover conditional logistic regression.  We’ll use a fictitious horse race for all our examples here.


All models start with some assumptions and beliefs about how the world works.  These assumptions will become important later, but we’re going to skip them for now.  At a basic level, we’re trying to predict something based on data.  The thing we’re trying to predict is formally called the “dependent variable”, and the data we’re using to make that prediction are called the “independent variables” or “factors”.  In traditional statistical notation, “Y” represents the dependent variable and “X” represents the factors.  So, we want to learn how Y is related to X.   More formally, we want to know how much of the variance in Y can be explained by the variance in X.  Variance is a VERY, VERY important concept, but beyond the scope of this article.  I’ll address it in the future.  Also, X can be a represent a single variable, or a matrix of hundreds of variables.  For this article, in the interest of simplicity, we’ll just use one.

Simple Linear Models

As the name implies, linear models assume that the relationship between Y and X is linear.  The notation we use is:

 Y x B e

The B in this formula represents the weight of X. (Formally called the “coefficients”)  In simple terms, B represents, “How much does a change in X cause a change in Y.”  We then use some relatively simple math (built into Excel, R, Matlab, etc…) to solve for the best B in our model.  This is a common theme in all regression:  A model is defined, and then we use some mathematical or computing techniques to find the best weights for the model.  Often, different factors, transformations of data, or model structures are tried to find the one that best fits the empirical data.  One important note, and more of an advanced topic, is that no model fits the data perfectly.  What we are estimating is known as “BLUE”: Best Unbiased Linear Estimator.

The plot below illustrates this perfectly.  The red dots are the data and the black line is the best linear estimator.  Even someone with no math background can quickly see that this nicely represents the relationship between X and Y.  However, notice that the line doesn’t pass through many of the red points.  So, while the line represents the relationship well, it is actually wrong for any individual point.  (This is what the e at the end of the equation represents: the wrongness or “noise”.)  The amount of wrongness will become a very important factor in predictions of future events, and is something I’ll delve into in a future article.

Graph 1

Logistic Model

Linear models are fine when you want to predict something numeric and continuous such as speed, time, weight, etc.  However, they don’t work well when you want to look at phenomena that have a binary outcome such as: win/lose, live/die, complete/fail, etc.  With binary outcome events, what we are most interested in is the probability of an event happening given the data.  Something called an “inverse logit” can represent this relationship well.  I’m going to skip the formal derivation and math here, but a quick Google search will provide more than you want to know.  The form of logistic regression, using the same nomenclature as above is:

Equation 2

This will give us a smooth curve, demonstrating how the probability of Y happening is a function of X.  The plot below demonstrates how the logistic model fits the data.  Notice that some points are outside of the curve.  That is another example of “wrongness” that all models have.

Graph 2

Conditional Logistic Regression

Finally, we’re at the point of this article.  Hopefully, you have a general understanding of regression models by now.

One area I’ve studied a lot is that of horse racing.  I’ve modeled horse races using a number of advanced methods, but the fundamental structure remains the same.  What we ultimately want to know is the probability of a horse winning a race.  If the public has mispriced that horse, then we have a betting opportunity with positive expected value.

A subtle, but critical distinction needs to be made.  We don’t care about the “probability of the horse winning”; we care about the “probability of the horse winning THIS race”.  Of course, this is a horse race, so we have to estimate his probability of winning relative to all the other horses in the race.  That probability depends on all the other horse’s performances as well.  For example, if I race my neighbor down the street there is a 90% chance that I’ll win.  If I race Usain Bolt, there is a .00001% chance that I’ll win.  So, winning is relative to the other competitors.

This is where conditional logistic regression (CLR) comes in.  The “Conditional” part is that winning probabilities are relative to the competitors in the race.  Additionally, to follow the laws of probability, all probabilities for a race must sum to 1.0.

Transforming a list of any values, so that they sum to 1.0 is a trivial mathematical function, just divide them by the sum.  For example   1,2,3,4,5  – just divide each number by 10 and you get 0.067, 0.133, 0.200, 0.267, 0.333.   However, this will NOT let us learn the best factor weights.  To do that, we need a formal statistical model that we can fit use correct mathematical techniques.  The equation is:

Equation 3

Each horse has a “strength”, represented by the exponential of the linear function.  (The top half of the fraction) The strengths are then summed up over all the other strengths in the same race. (Bottom half of fraction)   Looking closely, it is easy to see that, this is similar to the toy example I provided above.  The tricky part is learning the weights.  There is no closed form analytical solution for this.  An iterative technique, often gradient descent, is used to find the best weights.  Some software packages will handle this well for basic models with a reasonable number of factors.  Fancier varieties of this model will require custom computer code to be written.  (I use C++ and GPU parallel computing to fit this general form with 186 factors over 40,000 races.)


While brief, this article demonstrated the rationale for both logistic regression and conditional logistic regression.  The goal was not to create working models, or explain model fitting procedures, but to give you a general understanding of the three models and when to apply them.  For events with a single possible outcome, use logistic regression.  For events with multiple possible outcomes, use conditional logistic regression.

In future articles, I’ll discuss variable screening, transformation, prediction variance, and a host of other tools needed to properly fit a predictive model.

Join the discussion

Home Forums Conditional Logistic Regression for Traders

This topic contains 13 replies, has 6 voices, and was last updated by   al sandoz 1 month ago.

  • Author
  • #834
      Sports Trading Network 

    Dr. Silverman introduces logistic regression modelling using the example of horse racing and discusses the benefits and applications of conditional logistic regression.

    [See the full post at: Conditional Logistic Regression for Traders]

  • #835

    Hi Noah,

    This is a great article. I just wondering how this can be used in handicapping Basketball, in In-play model. Wherein, handicap changes every time made a point(s) relative to time left and the pre-match handicap.



  • #836

    Thanks for the article it is something I have wrested with for some time.

    For horse racing I presume you are putting horses that finish 2,3 ,4 etc in separate strata (case controls?).
    But what do you put as the Y value? (1 for the winner and zero the rest?)
    How do you deal with different number of runners in each race (just take the first 6 finishers, say?)

    I have philosophical reservations of calling the winner of a race as a Y=1.
    The horse with the best past data (the favourite ) only wins 3 times in 10.
    If the race were to be imagined as run 10 times, the winners would be different in 7 of those imaginary races.
    Can you give the real race winner a Y of 0.3?

    If you use 186 factors then there must be huge inter-correlation errors as there may be only about 5 truly independent race factors that count.

    I am asking from a position of not knowing any answers – just unresolved questions.

    Best wishes,

  • #837


    I don’t think that a CLR model would be your best choice for Basketball. Generally, for in-play predictions, I would look at a Bayesian model of either points scored or score differential.

    Perhaps that would make a good topic for a future article.

  • #838


    1) You’re confusing the strata direction. Each *race* is a strata. Remember, with a CLR model, the probabilities for each strata must sum to 1.

    2) The goal of a CLR is to accurately model win probabilities. So, the winner of a race is Y=1, and the losers of the race are Y=0. We are trying to learn the proper weighting of factors to best estimate the probability of Y=1. So, we’re not not modeling a horse, but modeling the factors.

    3) There is high covariance with many of the factors. My actual model using something called a LASSO to manage that, but that topic was too complicated for this article. Search on Google and you can probably find a paper I published with all the gory details.

    Good luck!

  • #839

    What would the CLR equation look like with more than one variable?

  • #840


    The forumula with multiple variables is exactly the same. You just add everything up before passing into the logit transform. For example: Y = x_1 * b_1 + x_2 * b_2 + x_3 * b_3 + e

  • #841

    Dr. Silverman,

    I’m having real trouble figuring out which R package to use to do conditional logistical regression… do you have a preference? Many of the ones I see have some fairly arcane explanatory pdf’s, and the glm package I can’t see how it can be used to do that particular form of logistical regression.

    Thank you for your time. I read your study you refer to, was great stuff (especially the use of GPU cores!).

  • #842

    Hi Noah,
    Interesting article. I have a sports-betting model that uses linear regression. It churns out ratings for tennis players. I was wondering, does it make sense to run a logistic regression on the ratings difference of previous matches and use that to calculate the probability the players winning? Something I can do with Solver in excel, but doesn’t seem to work well when I’ve tested it. Or would it make sense to try and calculate probability directly from data using Logistic regression?

  • #1512
      Dr. Noah Silverman 


    Try looking at the “survival” package. It has a function named “clogit”

  • #1513
      Dr. Noah Silverman 


    There isn’t a simple answer to your question. What you’re discussing relates to model design, and the possible use of ensemble models. It might make an interesting future article. Thanks for bringing it up.

  • #2042
      Craig Higginson 

    Quite a few of the images links appear to be broken .. some of them are either graphs or equations which I think would be handy to see. Is there any chance these could be fixed

  • #2835
      adam cox 

    In the model, you are using a binomial outcome, that is, first place =1, other places = 0 for your coding.
    Instead of resorting to say a second model for second place = 1, and all other places or zero could you set-up a multinational model? In this way, prediction odds could be inferred such as odds of horse being 3rd or better place, 2nd or third, worse than 3rd place etc?


  • #3398
      al sandoz 

    In order to give a few ideas for those wishing to model using the multinomial logistic regression , below are some extracts from a piece of code
    I wrote using the R language ( through the R Studio interface)
    This code is inspired by a Smartsigger article written by Dr Alun Owen to which I am much indebted.


    win (“yes”, “no”) is the dependant variable
    noChev = horse number, raceid = race identification (1, 2,….,8000)
    8000 races are used for training.

    below the result:

    Coefficients :
    Estimate Std. Error t-value Pr(>|t|)
    sexe2 -4.0103e-01 4.4405e-02 -9.0312 < 2.2e-16 ***
    sexe3 -1.4602e-01 3.7419e-02 -3.9021 9.535e-05 ***
    age -2.7558e-01 2.7615e-02 -9.9794 < 2.2e-16 ***
    nbVic 4.6575e-02 6.3964e-03 7.2814 3.304e-13 ***
    nbPlac -2.0636e-02 4.8074e-03 -4.2926 1.766e-05 ***
    pcVict 7.3494e-01 1.1072e-01 6.6377 3.185e-11 ***
    deferre1 3.9375e-01 1.1473e-01 3.4320 0.0005991 ***
    deferre2 8.4097e-02 1.3288e-01 0.6329 0.5267977
    deferre3 7.6946e-01 1.1356e-01 6.7758 1.237e-11 ***
    derPlace11 5.8635e-01 4.2478e-02 13.8034 < 2.2e-16 ***
    derPlace12 3.1694e-01 4.2785e-02 7.4078 1.283e-13 ***
    derPlace13 1.7902e-01 4.4969e-02 3.9810 6.864e-05 ***
    derPlace14 1.0673e-01 4.8369e-02 2.2066 0.0273404 *
    derPlace21 9.4670e-02 4.1326e-02 2.2908 0.0219758 *
    derPlace22 1.3285e-01 4.2088e-02 3.1564 0.0015972 **
    derPlace23 5.3403e-02 4.4769e-02 1.1929 0.2329273
    derPlace24 3.2888e-02 4.7528e-02 0.6920 0.4889619
    derPlaceAvecD 4.2558e-02 7.1123e-03 5.9837 2.181e-09 ***
    pcVictJockDriv 9.1752e-01 2.6846e-01 3.4178 0.0006314 ***
    pc3PremJockDriv 1.8384e+00 1.9750e-01 9.3083 < 2.2e-16 ***
    pcVictTrainer 1.2045e+00 3.0174e-01 3.9919 6.555e-05 ***
    pc3PremTrainer 7.5247e-01 1.9036e-01 3.9528 7.725e-05 ***
    nbjDerC -5.0150e-03 4.2957e-04 -11.6745 < 2.2e-16 ***
    derDeferre11 -3.6594e-01 5.4456e-02 -6.7198 1.820e-11 ***
    derDeferre12 -3.0667e-01 9.4199e-02 -3.2556 0.0011315 **
    derDeferre13 -5.4594e-01 4.6576e-02 -11.7216 < 2.2e-16 ***
    derDeferre21 2.6222e-01 1.1184e-01 2.3445 0.0190518 *
    derDeferre22 3.9472e-01 1.3433e-01 2.9385 0.0032982 **
    derDeferre23 2.2824e-01 1.1124e-01 2.0519 0.0401826 *
    nbjSansDeferre 3.8234e-04 1.4803e-04 2.5828 0.0098006 **
    crif2 4.7437e-02 5.3816e-03 8.8148 < 2.2e-16 ***
    rapgainsNbC 1.1755e-05 6.0246e-06 1.9512 0.0510332 .
    rapDerCoteNbP -6.9145e-02 8.3078e-03 -8.3229 < 2.2e-16 ***
    DiffAlloc 1.5420e-06 7.1182e-07 2.1663 0.0302863 *
    DiffReu1 -1.0855e-01 3.8078e-02 -2.8507 0.0043624 **
    ochrD1 4.2473e-01 3.8289e-02 11.0925 < 2.2e-16 ***
    ochrD2 2.7920e-01 4.0970e-02 6.8147 9.446e-12 ***
    ochrD3 2.8409e-01 4.1711e-02 6.8109 9.696e-12 ***
    ochrD4 1.9616e-01 4.3937e-02 4.4646 8.021e-06 ***
    ochrM1 2.4849e-01 4.1686e-02 5.9609 2.509e-09 ***
    ochrM2 2.0392e-01 4.3236e-02 4.7164 2.401e-06 ***
    ochrM3 1.5436e-01 4.5850e-02 3.3666 0.0007609 ***
    ochrM4 1.1590e-01 4.8042e-02 2.4124 0.0158490 *
    ogains1 2.5529e-01 4.1213e-02 6.1943 5.853e-10 ***
    ogains2 1.7777e-01 4.2159e-02 4.2167 2.479e-05 ***
    ogains3 9.9420e-02 4.3388e-02 2.2914 0.0219387 *
    ogains4 1.1688e-01 4.3399e-02 2.6932 0.0070778 **
    opcJoDr1 -1.0721e-02 5.3944e-02 -0.1987 0.8424667
    opcJoDr2 7.8654e-02 4.6202e-02 1.7024 0.0886820 .
    opcJoDr3 1.6089e-01 4.3982e-02 3.6581 0.0002541 ***
    opcJoDr4 1.7657e-01 4.3966e-02 4.0160 5.921e-05 ***
    ocrif21 7.1151e-02 5.8123e-02 1.2241 0.2208984
    ocrif22 1.3088e-01 4.6470e-02 2.8164 0.0048566 **
    ocrif23 1.1489e-01 4.4533e-02 2.5798 0.0098856 **
    ocrif24 1.3130e-01 4.4721e-02 2.9360 0.0033248 **
    orapgainsNbC1 8.2299e-02 4.9943e-02 1.6479 0.0993812 .
    orapgainsNbC2 1.4059e-01 4.3553e-02 3.2279 0.0012470 **
    orapgainsNbC3 1.1790e-01 4.2735e-02 2.7589 0.0057994 **
    orapgainsNbC4 -1.4084e-02 4.5170e-02 -0.3118 0.7551841
    oDiffAlloc1 2.1629e-01 4.3969e-02 4.9191 8.694e-07 ***
    oDiffAlloc2 1.4822e-01 4.5561e-02 3.2531 0.0011414 **
    oDiffAlloc3 1.4053e-01 4.7122e-02 2.9823 0.0028613 **
    oDiffAlloc4 1.5389e-01 4.8095e-02 3.1998 0.0013752 **

    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Log-Likelihood: -17661

    I only retain the relevant ** and *** coefficients.

You must be logged in to reply to this topic.