Wednesday, August 31, 2011

Percentage Change in Logistic Regression Coefficients



Here's a rather dry topic for all you stats geeks out there.


This is something that has bothered me for years, yet I cannot find a citation to back me up. Hopefully, there is someone more versed in logistic regression than I who can comment.


I frequently read articles that interpret coefficients from logistic regression as percentage change. The formula typically used is:



Where:

% indicates percentage change, and

eb is the odds ratio (exp(b))

(Taken from Pampel (2000), Logistic Regression: A Primer. Sage)


So, if your findings looked like this (from Pampel, p. 35):


Table 2.1

Partial SPSS Logistic Regression Results: Variable Coefficients

Variable

B

S.E.

Wald

Df

Sig

R

Exp(B)

Education

-.2085

.0382

29.8742

1

.0000

-.2153

.8118

Age

-.0341

.0067

26.1222

1

.0000

-.2003

.9665

Marital Status

-.3746

.2112

3.1443

1

.0762

-.0436

.6876

Sex (f=1)

.0964

.2126

.2056

1

.6502

.0000

1.1012

Constant

3.3666

.6478

27.0112

1

.0000



DV: Self-reported smoking (0:no, 1:yes)



You could use the percentage change formula to conclude:

· Each additional year of education decreases the odds of smoking by 18.82%

(.8118-1)*100 = 18.82%

· Each additional year of age decreases the odds of smoking by 3.35%

· Being married decreases the odds of smoking by 32% (but not statistically significant)

· Being female increases the odds of smoking by 10% (but not statistically significant)


And this is fairly common in the literature. But here’s my question:

Odds are constrained between 0 and + infinity. So, the maximum decrease in odds can only be 100%, but the increase in odds could be infinitely large. It doesn’t make any sense to me to be able to compare these two values.


In addition, the researcher’s decision about coding greatly affects the percentage change. For example, assume a variable indicating gender is measured as M=1, F=0 and the resulting coefficient (B) is -.8 with an exp(B) of .449. We would conclude:

  • Being male decreases the odds of the DV occurring by 55.1%.


If we switch the coding to F=1, M=0, the resulting B becomes .8 with an exp(B) of 2.25. We would conclude:

  • Being female increases the odds of the DV occurring by 125%.


Either I’m missing a fundamental assumption of the distribution of odds, or this doesn’t make a lot of sense. I’ve been spending years telling my students not to do this (rather, they can just talk about the exp(b) and leave out discussion of % change), but it would be nice to have some literature to back me up. Or, have someone tell me I’m completely wrong. Any ideas?


1 comments:

herkydahawkeye said...

Basically, it is because it is a relative probability. Just because X = 0, it doesn't mean X has no predicted probability in relation to Y. From http://www.upa.pdx.edu/IOA/newsom/da2/ho_logistic.pdf

"When they are both dichotomous (IV and DV), the odds ratio is the probability that Y is 1 when X is 1 compared to the probability that Y is 1 when X is 0."

Therefore, you can't have a relative predicted probability greater than 100% less than another, because the underlying predicted probabilities between the reference and comparison are bound between 0 and 1. If we take your example on the relative predicted probability of DV, males and females in the sample have an underlying predicted probability (^p) bound between 0 and 1. Even if the ^p of males = .01 and females = .99, your relative odds comparison cannot go below zero, but can be quite large depending on your reference/comparison equation (e.g., .01/.99 or .99/.01). So the interpretation in your scenario should actually be "Being female increases the odds of the DV occurring by 125% relative to the predicted probability of men."