The Accidental Statistician: H5N8 in Taiwan - Poor methods and not the best peer review.

I was critical of the sharing of data from the Taiwanese outbreak but there are a few more problems I have with the paper that reports the analysis of the data. So the paper says in the methods section that:

Phylogenetic analysis, as described previously (Lee et al., 2014a and Lee et al., 2014b), was performed using these full genome sequences and closely related sequences from GenBank, GISAID and the publicly available government website (http://ai.gov.tw/index.php?id=720), which gave the sequences of the 16 H5 viruses isolated by the Council of Agriculture (COA), Taiwan during the recent outbreaks.

Now lets look at those two papers by Lee from 2014 with the methods in them. The first one is a letter and so does not even have a methods section. The methods are only mentioned in the figure legend.

Phylogenetic tree of hemagglutin (HA) genes of influenza A(H5N8) viruses, South Korea, 2014Triangles indicate viruses characterized in this studyOther viruses detected in South Korea are indicated in boldfaceSubtypes are indicated in parenthesesA total of 72 HA gene sequences were ≥1,600 ntMultiple sequence alignment was performed by using ClustalW (www.ebi.ac.kr/Tolls/clustalw2)The tree was constructed by using the neighbor-joining method with the Kimura 2-parameter model and MEGA version 5.2 (www.megasoftware.net/) with 1,000 bootstrap replicatesH5, hemagglutinin 5; Gs/Gd, Goose/Guangdong; LPAI, low pathogenic avian influenza; HPAI, highly pathogenic avian influenzaScale bar indicates nucleotide substitutions per site.

This uses NJ-tree construction in Mega - and Mega 6.06 was already available.

The second paper does have a methods section which says:

Molecular clock analysis. For the HA and NA genes, the genetic distance from the common ancestral node of the lineage to each viral isolate was measured from the ML tree and plotted against the sample collection dates. Linear regression was used to indicate the rate of accumulation of mutations over time. A more detailed evolutionary time scale for each virus gene phylogeny, with confidence limits, was obtained using relaxed molecular clocks under uncorrelated lognormal (UCLD) and exponential (UCED) rate distributions, implemented in a Bayesian Markov chain Monte Carlo (BMCMC) statistical framework (27), using BEAST, version 1.8 (28). The SRD06 nucleotide substitution model (29) and Bayesian Skyride demographic model (30) were used. Multiple runs were performed for each data set, giving a total of 6 107 states (with 1 107 states discarded as burn-in) that were summarized to compute statistical estimates of the parameters. Convergence of the BMCMC analysis was assessed in Tracer, version 1.6 (A. Rambaut M. Suchard, and A. J. Drummond, 2013 [http://tree.bio.ed.ac.uk/software/tracer/]

So this analysis was carried out with Beast in a Bayesian framework. So which of these totally different methods was used in the current paper? It has to be the Beast analysis because of the way that the trees appear. But this also raises questions as they talk about the Bayesian Skyride model. I think they mean the Bayesian Skyline model and are confused by this paper. Anyway you should not be using the Skyline unless you are interested in hypotheses about viral demographics and phylodynamics. What doe they mean by multiple runs? This shows a naivity in using Beast. So while they might get the right results, they could have got them faster and easier using a simpler coalescent model.

What is of larger concern is that in both the Taiwanese outbreak paper and the second Lee paper the referees did not notice the errors in linking to two methodologies or the incorrect use of the Bayesian Skyline. So much for the peer review process improving science.

The Accidental Statistician

Saturday, 19 December 2015

H5N8 in Taiwan - Poor methods and not the best peer review.

No comments:

About Me