The Accidental Statistician: Apparently the Editor is not the only stupid one, it is the general opinion of everyone in phylogenetics so I have to be wrong ...

I appeal the very stupid editors comments on the bootstrap and guess what another methods man agrees with him and so it gets rejected again. So here is a little response.

IT IS ABOUT THE BIOLOGY.

I really don't care about the methods as they are all heuristic approximations with more incorrect assumptions than you can wave a stick at. The trees are right putting bootstraps on them does not make them any more right. I am telling a story about biology not about maths.

Now I can do a story about maths.

Bootstraps were invented by Efron who wrote a nice book about them with Tibshirani that maybe some of the editors and "experts" in phylogenetics might like to read. We all know that bootstrap is a resampling method where you resample with replacement a set of data. We do this in order to construct confidence intervals for complicated functions by simulations rather than an analytical solution which often is too complex or does not exist.

Now there are two key points:

1) If your sample is biased your bootstrap will still be biased and your confidence interval will still be wrong.

2) Resampling creates an identical and independently distributed distribution (you lose all correlation between variables when you resample).

Extending these two points for phylogenetics.

In phylogenetics they carry out convenience samples i.e. this is the set of sequences that someone happens to have collected but the have no idea if they are a representative or good sample of any kind. If I try to get convenience sampling based research published in almost any field (except phylogenetics) I would have my work rejected by most statisticians as wrong. So we suspect that samples are biased and if this is true then the bootstrap is not going to tell us much about this bias. In fact Efron and Tibshirani discuss this very problem on p138 of their book where they say bias estimation is an interesting but very tricky problem.

If you are using a technique where those correlations define your output - like say tree building in phylogenetics. Bootstrapping is a fairly stupid thing to do. Why do I want to create bootstraps which lose the correlated properties of my data? To do it properly you can read chapter 9 of Efron and Tibshirani which says the bootstraps are on the covariate vectors not on the data itself. So as far as I can tell all of the bootstrap implementations in phylogenetics don't do this.

This is a rejection of a paper where I am talking about the interesting bit of biology, not methods. The interesting bit is that not all H5N8 influenza virus is from the same ancestor. This means that viral subtypes spontaneously are recreated multiple times by reassortment of the viral neuraminidases and hemagglutinins. This means when you create a tree of a subtype it might not be homologous as has been found recently in a paper published in Science about Dengue which argues that the whole serotype argument for Dengue does not hold water. You see that is interesting biology but you miss it while you get tied up in your bootstraps.

As time goes on I am even more convinced that I am right and the biology I discovered is definitely important and significant and that the method pedants like the grammar pedants are defending an empty palace. They try to defend beautiful methods that actually have no relationship to reality.

The Accidental Statistician

Friday, 2 October 2015

Apparently the Editor is not the only stupid one, it is the general opinion of everyone in phylogenetics so I have to be wrong ...

No comments:

About Me