The Accidental Statistician: An essential test: checking for reassortment in influenza viral phylogenetics

It is standard practice for computing a viral phylogenetic tree to collect sequences from a single strain and then carry out the phylogenetic analysis. The problem is that if there has been reassortment between strains then you will get a misleading tree as it will selectively ignore data from certain parts of the tree. This sampling bias is a serious issue.

I once did an analysis of H5N8 evolution in the recent Korean outbreak. The paper had been delayed by submission and rejection to Science and then submission to Emerging Infectious Diseases. EID rejected it and then published 3 papers on H5N8 with pretty much the same analysis within the next year but lets ignore that. The paper went to peerJ where it was further delayed by an anonymous referee who asked for a complete analysis of all the H5 and N8 sequences to check for reassortment in the Korean outbreak. I got pretty annoyed at this delay as I knew competitors were going to publish soon the same analysis (the EID papers). I wrote a very hot headed reply to the Editor in Chief and questioned the editor's requirement for me to do the analysis. Anyway the editor Claus Wilke was very helpful in suggesting that I used MAFFT and FastTree to do the trees for all H5 and N8 and show that there was a single clade with all of the Korean sequences, which there was. This went in the supporting materials.

I was still fuming at the delay and watching several other groups publish the same analysis when I had done it months before them and when at least one of them was likely a referee that rejected the EID paper. Anyway today I decided to look at how often people are asked to check the reassortment of hemagglutinin and neuraminidase within a viral strain tree. Only FastTree can cope with the many thousands of sequences involved. FastTree has been cited for Influenza research 173 times, of those only 23 papers mention reassortment and NONE mention novel subtypes or strains.

Therefore this standard practice that this anonymous referee made me carry out is far from standard. In fact I have NEVER seen anyone carry out that analysis in any other published paper. I am not disputing that it is actually very important. I am disputing why anyone is allowed to publish a viral strain tree without actually including this analysis in at least the method and supplementary materials.

The Accidental Statistician

Sunday, 3 January 2016

An essential test: checking for reassortment in influenza viral phylogenetics

No comments:

About Me