## non-convergent models and the failure to look at the whole picture when fitting models

Very few models in statistics have nice, clean closed form solutions. Usually, coefficients in models must be estimated by taking an initial guess and improving the estimate (e.g., the Newton-Raphson method). If your estimates stabilize, then you say “Mission accomplished, the coefficient is X!” Sometimes, your statistical software will say “I stop because the model does not converge – the estimates bounce around.”

Normally, people throw up their hands and say “too bad, this model is inconclusive” and they move on. This is wrong. Why? The convergence/non-convergence of a model estimate is the result of completely arbitrary choices. Simple example:

I am estimating the effect of a new drug on the number of days that people live after treatment. Assume that I have nice data from a clean experiment. I will estimate the # of days using a negative binomial regression since I have count data which may/may not be over-dispersed. Stata says “sorry, likelihood function is not-concave, model won’t converge.” So I actually ask Stata to show me the likelihood function and it bounces around by about 3% – more than the default settings. Furthermore, my coefficient estimates bounce around a little. The effect of treatment is about two months +/- a week, depending in the settings.

As you can see, the data clearly supports the hypothesis that the treatment works (i.e., extra days alive >0). All “non-convergence” means is that there might be multiple likelihood function maxima and they are all close in terms of practical significance, or that the ML surface is very “wiggly” around the likely maximum point.

Does that mean you can ignore convergence issues in maximum likelihood estimation? No! Another example:

Same example as above – you are trying to measure effectiveness of a drug and I get “non-convergence” from Stata. But in this case, I look at the ML estimates and notice they bounce around a lot. Then, I ask Stata to estimate with different sensitivity settings and discover that the coefficients are often near zero and sometimes that are far from zero.

The evidence here supports the null hypothesis. Same error message, but different substantive conclusions.

The lesson is simple. In applied statistics, we get lazy and rely on simple answers: p-values, r-squared, and error messages. What they all have in common is that they are arbitrary rules. To really understand your model, you need to actually look at the full range of information and not just rely on cut-offs. This makes publication harder (referees can’t just look for asterisks in tables) but it’s better thinking.

*50+ chapters of grad skool advice goodness: **Grad Skool Rulz ($2!!!!)**/**From Black Power/Party in the Street*