(Related to Over-ensapsulation and Subtext is not invariant under linear transformation)

Between 2004 and 2007, Goran Bjelakovic et al. published 3 famous meta-analysis of vitamin supplements, concluding that vitamins don't help people but instead kill people. This is now the accepted dogma; and if you ask your doctor about vitamins, she's likely to tell you not to take them, based on reading either one of these articles, or one of the many summaries of these articles made in secondary sources like The Mayo Clinic Journal.

The 2007 study claims that beta-carotene and vitamins A and E are positively correlated with death - the more you take, the more likely you are to die. Therefore, vitamins kill. The conclusion on E requires a little explanation, but the data on beta-carotene and A is simple and specific:

Univariate meta-regression analyses revealed significant influences

^{ }of dose of beta carotene (Relative Risk (RR), 1.004; 95% CI, 1.001-1.007;P= .012),^{ }dose of vitamin A (RR, 1.000006; 95% CI, 1.000002-1.000009;^{ }P= .003), ... on mortality.

This appears to mean that, for each mg of beta carotene that you take, your risk of death increases by a factor (RR) of 1.004; for each IU of vitamin A that you take, by a factor of 1.000006. "95% CI, 1.001-1.007" means that the standard deviation of the sample indicates a 95% probability that the true RR lies somewhere between 1.001 and 1.007. "P = .012" means that there's only a 1.2% chance that you would be so unlucky as to get a sample giving that result, if in fact the true RR were 1.

A risk factor of 1.000006 doesn't sound like much; but I'm taking 2,500 IU of vitamin A per day. That gives a 1.5% increase in my chance of death! (Per 3.3 years.) And look at those P-values: .012, .003!

So why do I still take vitamins?

What all of these articles do, in excruciating detail with regard to sample selection (though not so much with regard to the math), is to run a linear regression on a lot of data from studies of patients taking vitamins. A linear regression takes a set of data where each datapoint looks like this:

Y = a_{1}X_{1} + c

and a multiple linear regression takes a set of data where each datapoint usually looks like this:

Y = a_{1}X_{1} + a_{2}X_{2} + ... a_{n}X_{n} + c

where Y and all the X_{i}'s are known. In this case, Y is a 1 for someone who died and a 0 for someone who didn't, and each X_{i} is the amount of some vitamin taken. In either case, the regression finds the values for a_{1}, ... a_{n}, c that best fit the data (meaning they minimize the sum, over all data points, of the squared error of the value predicted for Y, (Y - (a_{1}X_{1} + a_{2}X_{2} + ... a_{n}X_{n} + c)^{2}).

Scientists love linear regression. It's simple, fast, and mathematically pure. There are lots of tools available to perform it for you. It's a powerful hammer in a scientists' toolbox.

But not everything is a nail. And even for a nail, not every hammer is the right hammer. You shouldn't use linear regression just because it's the "default regression analysis". When a paper says they performed "a regression", beware.

A linear analysis assumes that if 10 milligrams is good for you, then 100 milligrams is ten times as good for you, and 1000 milligrams is one-hundred times as good for you.

This is not how vitamins work. Vitamin A is toxic in doses over 15,000 IU/day, and vitamin E is toxic in doses over 400 IU/day (Miller et al. 2004, Meta-Analysis: High-Dosage Vitamin E Supplementation May Increase All-Cause Mortality; Berson et al. 1993, Randomized trial of vitamin A and vitamin E supplementation for retinitis pigmentosa.). The RDA for vitamin A is 2500 IU/day for adults. Good dosage levels for vitamin A appear to be under 10,000 IU/day, and for E, less than 300 IU/day. (Sadly, studies rarely discriminate in their conclusions between dosage levels for men and women. Doing so would give more useful results, but make it harder to reach the coveted P < .05 or P < .01.)

Quoting from the 2007 JAMA article:

The dose and regimen of the antioxidant supplements were: beta carotene 1.2 to 50.0 mg (mean, 17.8 mg) , vitamin A 1333 to 200 000 IU (mean, 20 219 IU), vitamin C 60 to 2000 mg (mean, 488 mg), vitamin E 10 to 5000 IU (mean, 569 IU), and selenium 20 to 200 μg (mean 99 μg) daily or on alternate days for 28 days to 12 years (mean 2.7 years).

The *mean* values used in the study of both A and E are in ranges known to be toxic. The maximum values used were ten times the known toxic levels, and about 20 times the beneficial levels.

17.8 mg of beta-carotene translates to about 30,000 IUs of vitamin A, if it were converted to vitamin A. This is also a toxic value. It is surprising that beta-carotene showed toxicity, though, since common wisdom is that beta-carotene is converted to vitamin A only as needed.

Vitamins, like any medicine, have an inverted-J-shaped response curve. If you graph their health effects, with dosage on the horizontal access, and some measure of their effects - say, change to average lifespan - on the vertical axis, you would get an upside-down J. (If you graph the death rate on the vertical axis, as in this study, you would get a rightside-up J.) That is, taking a moderate amount has some good effect; taking a huge a mount has a large bad effect.

If you then try to draw a straight line through the J that best-matches the J, you get a line showing detrimental effects increasing gradually with dosage. The results are exactly what we expect. Their conclusion, that "Treatment with beta carotene, vitamin A, and vitamin E may increase mortality," is technically correct. Treatment with anything may increase mortality, if you take ten times the toxic dose.

For a headache, some people take 4 200mg tablets of aspirin. 10 tablets of aspirin might be toxic. If you made a study averaging in people who took from 1 to 100 tablets of aspirin for a headache, you would find that "aspirin increases mortality".

(JAMA later published 4 letters criticizing the 2007 article. None of them mentioned the use of linear regression as a problem. They didn't publish my letter - perhaps because I didn't write it until nearly 2 months after the article was published.)

Anyone reading the study should have been alerted to this by the fact that all of the water-soluble vitamins in the study showed no harmful effects, while all of the fat-soluble vitamins "showed" harmful effects. Fat-soluble vitamins are stored in the fat, so they build up to toxic levels when people take too much for a long time.

A better methodology would have been to use piecewise (or "hockey-stick") regression, which assumes the data is broken into 2 sections (typically one sloping downwards and one sloping upwards), and tries to find the right breakpoint, and perform a separate linear regression on each side of the break that meets at the break. (I almost called this "The case of the missing hockey-stick", but thought that would give the answer away.)

Would these articles have been accepted by the most-respected journals in medicine if they evaluated a pharmaceutical in the same way? I doubt it; or else we wouldn't have any pharmaceuticals. Bias against vitamins? You be the judge.

### Meaningful results have meaningful interpretations

The paper states the mortality risk in terms of "relative risk" (RR). But relative risk is used for studies of 0/1 conditions, like smoking/no smoking, not for studies that use regression on different dosage levels. How do you interepret the RR value for different dosages? Is it RR x dosage? Or RR^{dosage} (each unit multiplies risk by RR)? The difference between these interpretations is trivial for standard dosages. But can you say you understand the paper if you can't interpret the results?

To answer this question, you have to ask exactly what type of regression the authors used. Even if a linear non-piecewise regression were correct, the best regression analysis to use in this case would be a logistic regression, which estimates the probability of a binary outcome conditioned on the regression variables. The authors didn't consider it necessary to report what type of regression analysis they performed; they reported only the computer program (STATA) and the command ("metareg"). The STATA metareg manual is not easy to understand, but three things are clear:

- It doesn't use the word "logistic" anywhere, and it doesn't use the logistic function, so it isn't logistic regression.
- It does regression on the log of the risk ratio between two binary cases, a "treatment" case and a "no-treatment" case; and computes regression coefficients for possibly-correlated continuous treatment variables (such as vitamin doses).
- It doesn't directly give relative risk for the correlated variables. It gives regression coefficients telling the change in log relative risk per unit of (in this case) beta carotene or vitamin A. If anything, the reported RR is probably e
^{r}, where*r*is the computed regression coefficient. This means the interpretation is that risk is proportional to RR^{dosage}.

Since there is no "treatment/no treatment" case for this study, but only the variables that would be correlated with treatment/no treatment, it would have been impossible to put the data into a form that metareg can use. So what test, exactly, did the authors perform? And what do the results mean? It remains a mystery to me - and, I'm willing to bet, to every other reader of the paper.

### References

Bjelakovic et al. 2007, "Mortality in randomized trials of antioxidant supplements for primary and secondary prevention: Systematic review and meta-analysis", Journal of the American Medical Association, Feb. 28 2007. See a commentary on it here.

Bjelakovic et al. 2006, "Meta-analysis: Antioxidant supplements for primary and secondary prevention of colorectal adenoma", Alimentary Pharmacology & Therapeutics 24, 281-291.

Bjelakovic et al. 2004, "Antioxidant supplements for prevention of gastrointestinal cancers: A systematic review and meta-analysis," The Lancet 364, Oct. 2 2004.

I don't understand. Why is "they used the wrong statistical formula" worth 47 upvotes on the main article? Because people here are interested in supplementation? Because it's a fun math problem?

In the other comments, people are discussing which algorithm would be more appropriate, and debating the nuances of each particular method. Not willing to take the time to understand the math, it comes across as, "This could be right, or wrong, depending on such-and-such, and boy isn't that stupid..."

I run into this problem every time I read anything on health or medicine (it seems limited to these topics). Someone says it's good for you, someone says it's bad for you, both sides attack the other's (complex, expert) methods, and the non-expert is left even more confused than when they first started looking into the matter. And it doesn't help that personal outcomes can be drastically different regardless of the normal result.

To me, this topic is still confusing, with a slight update toward "take more vitamins." Without taking classes in statistics and/or medicine, how can I become less wrong on problems like this? Who can I trust, and why?

47 votes doesn't mean "This is a great article". It means 47 more people liked it than disliked it. Peanut butter gets more karma than caviar.

Both. It's instrumental in that vitamin supplementation is a concern many here have. It's also useful as an example of how studies can have flaws, and how these flaws can be found with surprisingly little analysis. Dissections of bad studies helps us avoid similar flaws in our own conclusions. And there are indeed researchers on LessWrong, as well as motivated laymen that can follow the math, and even run their own mathematical regressions. This truly is valuable.

Depends. I could become less wrong about mathematical questions by learning to listen to people who are less wrong about math. (More generally: I may be able to improve my chance of answering a question correctly even if I can't directly answer it myself.)

The general advice here is

And an alternate alternate explanation: Poor priorities. Doctors want to hear all the clinical details, and are mentally worn out by the time they finish with those. There's just no time or energy to do the math too.

When I used to work for NASA in theoretical air traffic management, I'd try to explain some abstract point about turbulent or chaotic traffic flow to operational FAA guys, and they would get bogged-down in details about what kind of planes we were talking about, what altitudes they were flying at, which airlines they belonged to, and on and on.

Wow, I just read Robin's writeup on this and it caused me to significantly lower the amount of credence I place on his other positions (but very slightly lower my opinion of supplements). It just struck me as overwhelmingly sloppy and rhetorical. Particularly his justification attempt in response to this thread. (But I suppose Robin's responses to criticism have never impressed me anyway.)

Another possibility is that cancers have higher nutritional needs than normal cells, and some vitamins might be feeding cancer more than they're feeding the person.

They do the same kind of thing with ionizing radiation: a lot of organizations assume that the health effects of radiation are completely linear, even far below the range where we've been able to measure, despite the lack of evidence for this (and some evidence suggesting a J-shaped curve). Other organizations refuse to extrapolate to extremely low doses, citing the lack of evidence.

The issue is just way too politicized.

There's a general principle that very small doses of toxins or stresses of any kind - vaccines, radiation, oxidants, poisons, alcohol, heat, cold, exercise - are beneficial, because they provoke the body to a protective overreaction. One of the talks at the 2007 DC conference on cognitive aging even suggested that this is responsible for why people who think more have fewer memory problems as they age.

(This suggests that our bodies are lazy - they could maintain themselves better than they do on every dimension. Or it might be that, if we measured all the responses simultaneously, we'd find that mounting a protective response to radiation made us more vulnerable to infection, alcohol, and all the rest.)

Thanks Phil. I am suitably outraged at both that both the authors and the journal published this.

I'm not sure whether 'benefit of the doubt' in this instance suggests 'political motivation' or 'incompetence'. I'll give them whichever benefit of the doubt they prefer. The most basic knowledge of the field suggests a prior probability that a fat soluble vitamin has a linear response with dosage is negligible.

I think the simplest hypothesis is that this was a case of pushbutton statistics - get a statistics package, read the documentation, and feed it numbers until it gives you numbers back.

The papers overwhelm the reader with so many details about how to categorize and treat the different samples in the meta-study, that it's easy to feel like they've "done enough" and just wave the math through.

It might be that, in order to pay more attention to statistical correctness, you've got to pay less attention to other details. A person has only so much mental energy! So it may reflect not poor statistics skills so much as poor priorities. Doctors want to hear all the clinical details; but there's little time and mental energy left for anything else.

So why do you still take vitamins? If you look at their Figure 2, there aren't many studies that 'favored antioxidants', and some of those studies had low doses.

"A linear analysis assumes that if 10 milligrams is good for you, then 100 milligrams is ten times as good for you, and 1000 milligrams is one-hundred times as good for you." That's only true if the range of data included both 10 milligrams and 1000 milligrams. Linearity is only assumed within the range of data of the data sets.

The hockey stick approach seems too restrictive as well. ... (read more)

An even better methodology would be to allow for higher order terms in the regression model. Adding square... (read more)

For anyone interested, here is a decent algorithm for getting the "correct" number of lines in your linear regression.

http://www.cs.princeton.edu/~wayne/kleinberg-tardos/06dynamic-programming-2x2.pdf

Pages 5 and 6.

One can complain about empirical studies in dozens of ways. Yes, for any linear regression one can complain that they should have included higher order moments for all of the variables. But if readers can feel justified in ignoring any analysis for which one can make such a complaint, then readers can feel justified in ignoring pretty much any such data analysis. That is way too low a standard for ignoring data.

If you suspect that this lack has seriously skewed the results of some particular study, then you should get the data and do your own analysis the way you think it should be done, and then publish that. Then readers can at least compare the prestige of the two publications in deciding who is right.

I think the complaint here is less that higher order moments would've produced higher quality results, and more that when testing for adverse effects on health, they used mean dosages

already known to be toxic, which is a pretty thorough screening out of any evidence collected.It would be confidence-inspiring to see the raw data, and some better analyses of it, of course.

But Phil isn't saying we can ignore the study just because it uses a linear regression. He's giving good, and what should be obvious-to-experts, reasons why a linear regression will be deceptive on this question. Once you know dosage matters and that

then linear regression looks like a really bad choice.

Another beauty. (The logistic regression thing isn't that big a deal, though -- the logistic function only makes a difference at the extremes, and the fact that the RR is very close to one means it's right in the middle.)

What vitamins does everyone take? I take a no-iron multi-vitamin, extra vitamin D, and fish oil, all from cheap sources. I would be especially curious if anyone takes/has evidence for more expensive vitamins that are better absorbed.

One supplement that is now being widely prescribed is Vitamin D. Testing to see if one is Vitamin D deficient is common here in the Boston area. It is being suggested that insufficient Vitamin D is linked to multiple health ailments- autoimmune diseases and increased risk of heart attacks in particular. My doctor prescribed Vitamin D supplements for me.

Had the silliness of this linear model been visible in a scatter plot? Is there any point in using linear regression, when lines are a subset of more complex curves? (I haven't read the papers, no access.)

It's possible that better mathematical tools would improve conclusions in studies of this kind. But I increasingly believe that the problem lies not in the mathematics but in the very nature of the inquiry: questions of the type "does vitamin A improve health" simply

cannot be answeredon the basis of the information obtainable through these kinds of small sample size studies. The information content of the empirical data is far smaller than the compl... (read more)Thank you for a very nice article.

Then vitamins are not evil, as the paper claims.

Roughly speaking, can we assume that the right thing they should have written as a conclusion in the paper would have been the weaker claim:

"Vitamins X and Y are evil under these daily doses; further studies are needed to confirm if they are beneficial in some other dosage, and if so, which is the optimal one."

?

It would have been had that been the only problem with the study. See the comments by myself, Dr Steve Hickey, Len Noriega etc here http://www.cochranefeedback.com/cf/cda/feedback.do?DOI=10.1002/14651858.CD007176&reviewGroup=HM-LIVER

Meta-analyses in general are not to be trusted -

at all...Very, very briefly (I'm preparing a very long blog post on this, but I want to post it when Dr Hickey, my uncle, releases his book on this, which won't be for a while yet) - meta-analysis is essentially a method for magnifying the biases of the analyst. When collating the papers, nobody is blinded to anything so it's very, very easy to remove papers that the people doing the analysis disagree with (approx 1% or fewer of papers that turn up in initial searches end up getting used in most meta-analyses, and these are hand-picked). On top of this, many of them include additional unpublished (and therefore unreviewed) data from trials included in the analysis. You can easily see how this could cause problems, I'm sure. There are many,

manyproblems of this nature. I'd strongly recommend everyone do what I did (for a paper analysing these problems) - go to the Cochrane or JAMA sites, and just read every meta-analysis published in a typical year, without any previous prejudice as to the worth or otherwise of the technique. If you can find asingle onethat appears to be good science, I'd be astonished...Actually only synthetic beta carotene and things like retinol are implicated.

Beta carotene comes in multiple forms - and it is widely recognised that the form in carrots and greens can't be overdosed on - even if you make it into tablets. It

doesmake your skin turn orange - but seems otherwise harmless.There's a broadly similar story for retinol - and scientists have known about this for quite a long time, now.

This would have been a much better title. Otherwise, great post.