3 Binary Models – Logit and Probit

Binary dependent variables are frequent in social science research…

  • … why does somebody vote or not?
  • … why does a country go to war or not?
  • … why does a legislator vote yes or no?
  • … why do some countries have the death penalty and other not?

3.1 The Linear Probability Model

The linear probability model relies on linear regression to analyze binary variables.

\begin{eqnarray} y_i & = & \beta_0 + \beta_1 \cdot x_{1i} + \beta_2 \cdot x_{2i}+ ... + \beta_k \cdot x_{ki} + \varepsilon_{i}\\ Pr(y_i=1|x_1, x_2, ...) & = & \beta_0 + \beta_1 \cdot x_{1i} + \beta_2 \cdot x_{2i}+ ... + \beta_k \cdot x_{ki} \\ \end{eqnarray}

3.1.1 Advantages

  • We can use a well-known model for a new class of phenomena
  • Easy to interpret the marginal effects of \(x\) variables

3.1.2 Disadvantages

The linear model needs a continuous dependent variable, if the dependent variable is binary we run into problems:

  • Predictions, \(\hat y\), are interpreted as probability for \(y=1\)
    [\(\rightarrow\) \(P(y=1) = \hat y = \beta_0\)+\(\beta_1 x\), can be above 1 if \(x\) is large enough]{}
    [\(\rightarrow\) \(P(y=0) = 1- \hat y = 1 - \beta_0\)+\(\beta_1 x\), can be below 0 if \(x\) is small enough]{}

  • The errors will not have a constant variance.
    [\(\rightarrow\) For a given \(x\) the residual can be either (1-\(\beta_0\)-\(\beta_1 x\)) or (\(\beta_0\)+\(\beta_1 x\))]{}

  • The linear function might be wrong
    [\(\rightarrow\) Imagine you buy a car. Having an additional £1000 has a very different effect if you are broke or if you already have another £12,000 for a car.]{}

Predictions can lay outside \(I=[0,1]\)

Residuals if the dependent variable is binary:

3.2 Building a Model from Probability Theory

  • We want to make predictions in terms of probability
  • We can have a model like this: \(P(y_i=1)={F(\beta_0 + \beta_1 x_i)}\) where \(F(\cdot)\) should be a function which never returns values below 0 or above 1
  • There are two possibilities for \(F(\cdot)\): cumulative normal (\(\Phi\)) or logistic (\(\Delta\)) distribution

3.3 Logit and Probit

  • We now have a model where \(\hat y \in [0,1]\)
    \(\rightarrow\) All predictions are probabilities

  • We have two possible models to use
    [\(\rightarrow\) The logit model is based on the cumulative logistic distribution (\(\Delta\))]{}
    [\(\rightarrow\) The probit model is based on the cumulative normal distribution (\(\Phi\))]{}

We will use logit more often because we can write \(\Delta(x) = \frac{1}{1 + \exp(-x)}\),
while probit models are tricky: \(\Phi(x) = \int_{-\infty}^{x}\frac{1}{\sqrt{2\pi}}\exp(\frac{-(x)^2}{2}) dx\)

3.4 Logit Model

The logit model is then: \(P(y_i=1)=\frac{1}{1 + \exp(-\beta_0 - \beta_1 x_i)}\)

For \(\beta_0 = 0\) and \(\beta_1=2\) we get:

3.4.1 Logit Model: Example 1

  • We can make a prediction by calculating: \(P(y=1) = \frac{1}{1+\exp(-\beta_0 - \beta_1\cdot x)}\)

3.4.2 Logit Model: Example 2

  • Depending on where we add £10,000 we get a different marginal effect because of our different functional form (s-shaped)

3.4.3 Logit Model: Example 3

  • A positive \(\beta_1\) makes the s-curve increase.
  • A smaller \(\beta_0\) shifts the s-curve to the right.
  • A negative \(\beta_1\) makes the s-curve decrease.