The Accidental Statistician: January 2016

Sunday, 31 January 2016

Second referees comments - Do lineage and subtype have meaning any more?

The second referee is a bit pedantic which is perhaps important. There are good arguments for very strict use of terms but this is a minor correction. I am a bit hand waving and inexact and so he has some points, but still why anonymous? What are you afraid of?

Basic reporting

The manuscript suffers from sub-standard writing. There's a typo in the text ("creating phylogenetic trees oh H5N8" on line 112), as well as grammar mistakes (line 182, line 229). Some very unusual language is also employed throughout the manuscript, such as references to H5N8 trees on line 63(unclear whether trees are of the HA segment, NA segment or the inadvisable concatenation of the two), hemagglutinin and neuraminidase subunits on line 69 and 70 (they are segments, subunits are what proteins have), sequence degeneracy on line 93 (the opposite of saturation is low diversity, not degeneracy), information content of HA and NA trees on line 121 (two trees are always sufficient to infer reassortment), consistency of trees as strong evidence of phylogenetic analysis validity on line 131 (tree consistency indicates that the segments have a similar history and says nothing about the validity of the analysis), envelope segments on line 133 (to my knowledge only Retroviruses possess a surface protein called envelope) and reassortment in Flaviviruses on line 231 (Flaviviruses cannot reassort because their genomes are on a single RNA strand). The conclusion is rather short, the second paragraph of which is basically the same thing repeated over and over again.

Experimental design

The reporting of trees is extremely unhelpful. All trees are shown as cladograms and thus only indicate the topology of the tree. Dalby writes that this was done for clarity on line 83 but it achieves the opposite effect. Branch lengths allow everyone to see how much evolution has occurred on each branch and thus how robust some of the inferences are, especially in light of reporting on how much evolutionary change has occurred in trees on line 107 without supporting evidence. It is never made clear whether the trees have been rooted or not and without branch lengths it is impossible to tell whether they are. Although not a major of flaw of the study, nor a problem unique to this manuscript, the use of a parameter rich GTR+I+G nucleotide substitution model is questionable. Model testing, as it is done today, is based on a circular argument (the tree with a given model has the highest likelihood, therefore the model is used to reconstruct the tree) and ignores identifiability problems when it comes to the combination of Gamma-distributed rate heterogeneity AND invariant sites. Gamma-distributed rate heterogeneity takes care of slowly (or non-) evolving sites, so the addition of invariant site estimation combines two models that are explaining the same variation.

Validity of the findings

In the manuscript Dalby describes the rise of an avian influenza A virus subtype H5N8, which has recently caused a sustained outbreak in Korea. The author finds that the combination of H5 and N8 segments in avian influenza A viruses has arisen multiple times independently rather than circulated cryptically in birds as a single genomic lineage. I have no problems with the overall findings - I think the divergence between the H5s and the N8s that have ended up reassorting together is sufficient to infer numerous origins of the subtype. What I disagree with are the details surrounding each independent origin of the subtype. Some very bold claims are made in the absence of any clear evidence that would be available to the reader, for example that the origin of the Californian quail H5N8 subtype is unambiguous when it is actually quite the opposite, given the phylogenetic position of the sequence or that the Thailand 2012 H5N8 neuraminidase clusters with H3N8 neuraminidases when it does nothing of the sort.

Comments for the author

I think this manuscript could easily be improved by:

1. Showing maximum likelihood trees with clear rooting and actual branch lengths.

2. The direction and context of each reassortment should be explicitly tested using an appropriate model - e.g. BEAST with discrete traits of location, host and subtype (as appropriate) - to support the various proposed hypotheses for the origins of subtype H5N8.

3. Clean up the language - use the correct terms agreed upon in the literature.

4. Show full trees of all HA and NA sequences indicating where H5N8 viruses are.

I would strongly advise the author to implement these suggestions before attempting to submit this manuscript elsewhere.

So from the comments to the author:

1) Is trivial and ok. Actually with branch lengths reading the trees is a whole lot harder and the key arguments of the paper as it is about reassortment and this depends on clades and not branch lengths but this is a minor point. This is cosmetic and not grounds for more than revision.
2) This is not going to happen there are 4007 sequences this would take large amounts of computer time and give you nothing new or significant in identifying which clades H5N8 can be found in. Putting in subtypes and locations would actually be over-fitting of the data to the model and a very bad statistical error because you leave no variables to test your model against. This would be an example of Bode's Law. Put in all the empirical data to the model and you get no free variables left.
3) Agreed but again that is minor changes.
4) They are in the supplementary materials and always were - but referees don't look. Figures 5-13 are parts of this complete tree. Version 3 will just have the full H5 and N8 trees and go to F1000. There will be no anonymous referees and it will be published first.

Regarding the point on the California quail sequence. It is ambiguous if you think that the H5N8 trees are telling you anything, but the point of the paper is that they aren't. So it is completely unambiguous that this does not contain the H5 from Goose Guangdong and it is in NO WAY connected to the H5N8 sequences from Korea regardless of what the location and chronology suggest (that is why doing what is suggested in comment 2 is a very bad idea).

The point about Quang Ninh is partially true it is part of an amorphous clade that includes H10N8 isolated at the same but also mixed types. The ancestral sequence to this clade is most definitely an H3N8 from Vietnam and H3N8 or H6N8 are the sources for almost all of the N8 sequences.

Flaviviruses do not reassort as they are not segmented but they definitely undergo recombination which is equivalent. It is an analogy and not homology but sometimes metaphors are not clear. Again this is easily removed. The point of the analogy is the wider consideration that lineage has no meaning if there are multiple subtypes with the same lineage and subtypes with multiple lineages. What does the word lineage mean? How are we going to define it other than in some arbitrary way based on distances in a phylogenetic tree?

Constructing Trees based on a single influenza subtype is not a good idea as it introduces sampling bias (amended and toned down)

Version 2 of the paper about H5N8 is rejected. Regardless of it being still right and that what it says is not desperately controversial but it is important. https://peerj.com/preprints/1489/

It is saying that doing trees by finding all the H5N8 sequences or all the sequences of any other subtype is not a good idea as this is a biased sample that misses out reassortment events that give alternative subtypes. An H5N8 sequence can be next to an H5N1 sequence in the true tree and then the H5N8 can appear again in another place in the H5 tree.

I gave a very clear tree to show this is absolutely true and even posted it on this blog and repost it again here.

There is no doubt. Doing anything other than complete sampling of ALL of the H5 trees will not give you the correct sampling for the hemagglutinin tree. I have done this in BOTH versions of the paper.

I put them in the supplementary materials because they are large - the Hemagglutinin tree contains over 4000 sequences and this is not easy to deal with. I just cut out the clades with H5N8 to make it easier to understand and to focus on them. For some unknown reason the referees fail to grasp this and one even commented that my method and sampling was wrong becauseI showed a tree calculated just from the H5N8 sequences.

This comment from a referee just drives me crazy. I am lost for words as to how deliberately obstructive this person is.

In this paper the author is attempting to explain the evolutionary and reassortment history of H5N8 influenza A virus. However, the dataset design ignores what is already known about the emergence and reassortment history of these multiple virus lineages. In particular, the H5-HA of the recent North American high path H5N8 virus is derived from the Goose Guangdong HPAI H5N1 lineage circulating since 1996. This reassortment history has been well studied and published. The author wants to determine if H5N8 has been circulating cryptically in avian hosts or if emerges repeatedly through reassortment. But this has been shown - the highly pathogenic H5N8 virus emerged through reassortment (see Lee et al, 2014 EID for example). In fact, this has been show for every avian virus subtype in the MANY MANY publications investigating the reassortment history of avian influenza A virus in both wild and domestic populations.

The paper is poorly referenced and has not included important citations relevant to the study presented. I believe this has lead to incorrect understanding of influenza A ecology and evolution by the author and subsequently a poorly designed dataset to shed light on the questions he is attempting to address. The figures are completely inappropriate and not in line with the standards of phylogenetic studies or influenza research. It is unfortunate that the author has decided to show cladograms instead of phylograms. Branch lengths in a cladogram are meaningless. However, long branchs are indicative of poor sampling and missing data. This would be obvious from phylograms, but they are conveniently obscured in cladograms. The most informative analysis was of all available H5-HA and N8-NA phylogenies available from the supporting material link. By highlighting only H5N8 viruses in these trees it is evident that the other datasets presented in the main text of the study are poorly sampled.

Experimental design

As stated above, this is a poorly designed investigation. While I admire the effort to understand influenza ecology and evolution, the work presented here ignores much of what is already known about this lineage and influenza A virus in general. The assumptions of the analyses conducted are not appropriate. The analysis conducted by this author assumes a direct lineage connecting all H5N8 viruses that have been sampled (Figure 1-4). This is not true and that is evident from the supporting material presented by the author. The HA-H5 lineage has associated with multiple different virus genotypes and only a handful of lineages have emerged as highly pathogenic. The dataset design does not address the questions posed and ecological or evolutionary inference is questionable.

Validity of the findings

The inferences made from Figure 1-4 are dubious. The author acknowledges this in the manuscript when he states “These trees show that the apparently simple H5N8 phylogenetic trees for the two envelope segments (figures 1-4) are actually more complex and that multiple reassortment events have occurred resulting in the creation of novel H5N8 subtype lineages. These events cannot be seen in the structure of the H5N8 only trees but they need to be taken into account if the phylogenetic trees are going to be calculated correctly, especially if coalescent methods are going to be used.” This is an appropriate warning. I wish the author had heard it! This is evident and known to the influenza field. Regardless, at this point in the paper the author suggest that the reader ignore all previous results. Figure 5-13 are sections from the supporting material. The author attempts to determine source of HA and NA virus subtypes. The author has determined his reading is better than a probabilistic approach to assess reassortment history. However, this reading is in absence of informative branch lengths or assessment of sampling. Any inference presented here is either dubious, in contrast to other studies (not cited) or meaningless.

Comments for the author

I can't endorse publication of this manuscript. It does not serve the influenza field, nor does it add to the current body of knowledge. The quality of the research is not up to standards in the field. I believe that this manuscript should be rejected.

This person is trying to use my own findings to say why my findings are wrong. I heard my warning, that is why I wrote it. That is in fact why I wrote the paper because all of that extensive literature that I did not cite and that annoyed the referee with his MANY MANY snide comment, is nonsense carried out by someone who needs to read about statistics and sampling. The referee agrees completely with what I am trying to do, with the results that I find and have in the supplementary material but argues that I am saying the exact opposite of the entire argument of the paper in order to reject it. This is a classic example of creating a straw-man.

The entire point of figures 1-4 is to show that they are wrong and thus that the prevailing dogma that always does analysis of influenza strains like this is wrong. The experimental design is exactly correct. First you do what is done by everyone this is the control. Then you do something new - the tree of ALL of the H5 and N8 sequences to show what should be done. That is why there are figures 5-13 that show how sampling has to be done.

I could think that this referee is sufficiently confused not to be able to understand, but I think that they do understand and this is just malevolence, they want to block publication.

How can I be sampling incorrectly when I include every known sequence, all of them, none excluded?

If that is not a valid sample then there are no valid samples in H5 influenza research ever. To know who it is for sure I will wait for a couple of months and see who tries to publish the view that sampling is wrong if we focus on a single influenza subtype. I expect to see it in something like Emerging Infectious Disease or PLoS Pathogens and a fairly big name to be submitting author.

Finally the last lines are NOT permitted in a referees comments. You are not allowed in your instructions to reviewers to put that sort of response in the comments to the author. That is for the editor to decide. It is not constructive or useful.

This is someone with an axe to grind who is annoyed that their work has not been cited. Boo hoo to you. It is appalling behaviour for a so called professional scientist. What the paper says is still true, it will still be published and whoever you are as you chose to remain anonymous (for good reason) you will eventually be exposed for the dishonest person that you are.

Tuesday, 5 January 2016

From the past: The EID paper referees comments from 30/1/2015. For the future: monitoring birds in Russia.

Reviewer: 1

Comments to the Author

In their manuscript, ‘The European and Japanese outbreaks of H5N8 derive from a single source population that has been dispersed along the long distance migratory bird migratory flyways’, Dalby and Iqbal use Bayesian coalsescent methods to infer ancestral of previously detected H5N8 subtype influenza A viruses and estimate time since divergence for isolates derived from recent Eurasian outbreaks.  Furthermore, the authors provide generalized and anecdotal informal on bird migration patterns in Eurasia to gain inference on possible origins and dispersal patterns.

First, I would like to applaud the authors for investigating a topic of great interest and importance to both human and animal population health.  Few studies combine genetic data with information on bird migration which can be a useful methodology for understanding the global dispersal of avian pathogens. I personally believe that such studies are relatively rare on account of the difficulty of combining disparate types of data and in obtaining relevant information for wild birds at locations along migratory pathways.  Regrettably, this is where I think this particular investigation falls short.

Aside from a few potential (minor?) issues (see specific comments below) I do not have any problems with genetic analyses included in this manuscript and the conclusions drawn therefrom per se; however, I do think that the authors ‘epidemiological data’ on migratory birds falls short of supporting conclusions.  For example, in the abstract alone, I would argue that the following claims are not sufficiently supported by empirical data: ‘traced to a single source population, which has been spread by migratory birds’, we can show when and where the outbreak originated’, ‘This population was located in the Siberian summer breeding grounds of long-range migratory birds’.  Because recent outbreaks of H5N8 influenza A viruses also have occurred in poultry throughout Eurasia, I feel that one could use the same genetic analyses conducted by the authors and provide similar generalized/anecdotal information regarding Eurasian poultry trade patterns to reach the conclusion that this virus has been dispersed through bird trade.  That is, the authors fail to provide any convincing evidence to demonstrate that wild birds have been solely responsible for viral dispersal as implied.

Unfortunately, I feel as though I cannot be more supportive of the authors’ current submission to Emerging Infectious Diseases at this time.  Given what I believe to the critical flaw in the manuscript as written, I might suggest that the authors shorten their submission by focusing on ancestry of H5N8 viruses and time of divergence.  By formulating a short communication (i.e. Dispatch) on this more focused topic, I think that a few speculative sentence could be included in the discussion in support of the authors’ thesis, that wild birds are dispersing H5N8 viruses throughout Eurasia.  In hopes that the authors will pursue a revision in the future, I’ve appended numerous specific comments below which I hope will ultimately prove useful towards this end.

Specific comments:

Introduction

Lines 28-29: By ‘cases’, do you mean ‘detections’?  There have certainly been undetected cases, no?

Line 43: By ‘the virus’, which specific strain are you referring to?  Different strains of HP H5N8 probably have differences in pathogenicity and host adaptation.

Lines 46-47: (‘Ducks and…’) I’m not sure that this is supported as written.  If you are referring to the laboratory study conducted by Kang et al., specify (e.g. ‘in a laboratory challenge study…’) and provide citation.

Line 51: Have antibodies specific to this strain been demonstrated?  If so, I presume this was through experimental challenge?  Clarify.

General comment: Considerable text is included in the introduction presenting information from very distantly related viruses of the same serotype (H5N8).  Would the introduction be better focused on the evolution of the reassortant HP H5N8 viruses currently causing poultry outbreaks?

Materials and methods

Line 62: I don’t believe that ‘flu resource’ is correct nomenclature.  Change throughout.

Lines 74-75: By ‘different reassortment events’ do you mean ‘different ancestral lineages’?

Lines 79-80: Please justify why this model and molecular clock assumption.

Lines 82-83: How long were runs?  Burn-in period?

Lines 88-92: Methodology here is insufficient for evaluating contributions of wild birds in viral dispersal.

Results and discussion

Lines 96-97: Is support (i.e. posterior probability values) presented on trees?  If not, I cannot assess support for the topology presented.

Line 101: It is a bit confusing as to results for which dataset you are referring here.  Some clarification for the reader in this section would be helpful.

Line 102: Add ‘gene segment’ after ‘hemagglutinin’.

Line 115: ‘in Korea in Korea’;

Lines 199-121: This scenario is plausible but is not demonstrated by results.

Lines 123-124: Couldn’t one argue that wide dispersal is reflective of the extent of poultry trade in Eurasia?

Lines 126-128: Please provide genus/species for bird species.

Line 128: By ‘carrying the virus’, do you mean ‘infected with H5N8 viruses’?

Lines 131-132: Please provide genus/species for bird species.

Lines 133-134: Prevalence for these species has been reported as being much more variable than implied here.  Also, I’m not sure birds ‘are funneled’; rather, birds ‘congregate’.

Lines 138: Add ‘of Eurasia’ after ‘flyways’ to reflect recent detection in North America.

Lines 140-143: This anecdotal information provides weak support for your thesis.

Line 145: Is this referring to H5N1?

Lines 150-153: Speculative statements here need appropriate caveats.

Lines 157-159: I tend to disagree with this statement.  Estimated evolutionary divergence does provide information on host, location, or ‘events’.

Lines 167-169: (‘However the small…’) This statement is not supported.

Line 175: Define ‘vigilance’.  By ‘avian flu’ do you mean ‘HP H5N8’?

Line 185: Define ‘vigilance’.  Are you referring to surveillance?

Lines 185-187: How will this help if infections are asymptomatic in wild birds?  If the thesis put forward re. potential for transmission to humans in the prior paragraph holds true, might your recommendation here actually increase the probability for bird to human transmission?

Reviewer: 2

Comments to the Author

The manuscript by Dalby and Iqbal reports a phylogenetic study performed on sequences of influenza A virus (IAV), H5N8 subtype. The analysis is timely and has the potential to provide significant information to the understanding of the H5N8 IAV outbreak. The manuscript, unfortunately, suffers from a lack of clarity in both the methodology and the presentation of the results. Below is a list of both minor and major issues that the authors may want to address:

- The article has been submitted as a “full research manuscript” but is very short and could have rather been formatted as a “dispatch”.

- The use of “seroptype” is not appropriate; the use of “subtype” for the HA and NA gene segments is widely used and I recommend the author to do so in their manuscript to avoid confusion for the readers.

- line 28: “was identified in a wild bird”. More information on the bird species is needed as well as the context in which it has been found (active/passive surveillance, outbreaks in poultry in the same area, with other H5 virus subtypes, etc.).

- line 47: “mallards are often asymptomatic but can still be carriers of the virus”: What virus: H5 in general ? HP/LP ? Please be more specific.

- line 57: “this study identify the source of the outbreaks”. Define “source” and the specific objectives of the study. It is somewhat vague as the phylogenetic analyses provide information on the relatedness and evolutionary history of isolated viruses, but not on the exact identification of the donor of the virus circulating in Europe in November/December. Indeed, the lack of sequence available between the spring/summer and the winter of 2014 precludes conclusion on the exact origin of the virus.

- line 62: “all of the available H5N8 sequences were downloaded”. Provide details on the number of downloaded sequences per segment, whether if they were nucleotide or protein sequences, etc. Also, why not including H5 sequences of non-H5N8 virus subtypes ? This would have given more power to the analyses and strengthened the conclusions regarding the global evolutionary history and origin of the H5N8 virus. Since IAV gene segments have an – almost – independent evolution, it would have been more appropriate to not restrict the data to segment that belong to the H5N8 subtype only, in particular for the internal gene segments. I understand that it may not has been the initial objective of the study but I believe that including only H5N8 sequences gives biased results and a limited understanding of the global evolutionary history of the virus genes (reassortment events, etc.).

- line 69 “no editing at the 3' end”. Why that ? Why not trimming the sequences to the stop codon ?

- lines 71-75: I'm confused by this part. Based on the results it looks like it had affected the estimates of the TMRCA. Sequence selection procedure overall needs more clarity.

- lines 77-80: Why use a strict molecular clock and a nucleotide-based substitution model while it has been shown that more appropriate models exits for IAV, in particular for Bayesian analyses performed with BEAST ? (Shapiro et al. Mol Biol Evol. 2006;23:7–9; Bahl et al. Virology 2009;390:289-297; etc.). Where tips dates coded by year only, or years/month/day ? Was that the case for all sequences ? How did you dealt with missing information ?

- lines 82-84: Where several runs combined ? What were the chain lengths and sampling frequency ?

- line 96: “produces a consistent gene tree”: How that is consistent ? Also, there is a major information missing on the phylogenetic trees: the posterior probabilities. It is overall difficult to evaluate the methodology but since no posterior are indicated on the trees, it also makes the interpretation of the results somewhat complicated.

- line 103 and throughout the manuscript: “outbreaks diverged between 1.58 and 5.53 months ago”. I suggest to use exact dates (e.g. 2014.XX) rather than some time in the paste, given that the time reference used is not clearly stated (December 2014 ?).

- line 119: “this result indicate”. This statement requires additional support. The fact that two events occurred at the same time does not supports that the two events were related... I agree that it is likely that these particular two events were linked, but it would be welcomed to have a more comprehensive discussion rather than a statement that is not fully supported by the data and analyses.

- line 124: “that migratory birds”. This is a very general statement and it is very unlikely that migratory birds, in general, were the source. Please be more specific (species involved or not, etc.). Overall, the manuscript critically lacks strong arguments and discussion of the ornithological aspects.

- line 133: “Mallard... high prevalence”. Prevalence of infected ducks strongly depends on the time of the year and geographic location. This again is a general statement that does not provide strong support to the conclusion. I suggest the author to have a look at recent publications on the ecology and epidemiology of IAV in wild ducks to strengthen the discussion (e.g. Latorre-Margalef et al. 2014. Proc B.281:20140098)

- lines 134-138: How would you explain this absence the virus spread in other migratory flyways ? Southwards ?

- lines 141-143: Not all bird species migrate at the same time, even for closely related ones (e.g. ducks). Again, although this information is interesting and valuable, concluding that the two events were related needs more support or at least to be better discussed.

- lines 149-153. Is there any molecular evidence of this change in virulence and selection ?

- lines 155-159: “the evolutionary events responsible for the European and Japanese cases must have occurred in migratory birds”. This statement is very vague. What evolutionary events ? In which migratory birds species and populations ? Where ? How ? etc.

- Line 169: Based on which segment ? If changes in population sizes were investigated then the results should be presented. Also, why not using a Bayesian Skyline priors to investigate such changes instead of exponential population growth ?

- line 178: “the longer outbreaks in wild birds and poultry persist...”. Not sure we can use “outbreak” for wild birds if they are asymptomatic. Is it expected that the H5N8 virus will persist in wild birds ? Are there other evidence of poultry-origin virus spillover to wild birds with subsequent long-time maintenance ?

- line 186: “bird watchers”. Bird watchers I doubt, especially if birds are asymptomatic. There is certainly a need for better surveillance and for the implication of trained ornithologists and veterinarian but this is somewhat different to bird watchers.

- Figures 1-8: The trees are poorly formatted (very difficult to read). The red clade is missing on Fig. 1. As they all seem to provide the same information I would suggest to place them together in an online supplementary file, and select one tree (e.g. HA) for detailed presentation in the manuscript. Slight editing (colors, simplified taxa names, etc.) could also help to read the figures.

- Figure legend: double check figure numbers: looks like there is a Fig. 9 missing.

- Figure 10. The map is not informative as presented. It shows very general migratory flyways and could have rather focused on the migratory routes of the species that are implicated in the spread of the H5N8 IAV subtype. Birdlife international and other websites provide maps that could be much more informative that the one presented in Figure 10: http://www.birdlife.org/datazone/species/factsheet/22680317

Reviewer: 3

Comments to the Author

In this manuscript, authors describe the recent H5N8 outbreaks of Europe and Japan might be a single source population, which has been spread by migratory birds by combining genetic methods and epidemiological data. It seems to be probable theory to explain current H5N8 outbreaks in European countries and Japan at the same time by similar viruses. Although the topic is of interest, the manuscript will need revising.

Comments

1. Line 36-39: Reference is inadequate. The contents of this sentence were derived from reference 17 rather than reference 4.

2. Line 44-45: The H5N8 virus of Ireland was genetically quite different from recent Asian H5N8 viruses, so “the original H5N8” is improper expression. These were just same serotype of viruses.

3. Line 45-47: The authors have to identify a quotation.

4. Line 115: Delete the duplicated word (in Korea)

5. Line 149-152: The authors have to identify a quotation.

6. Figure Legends and Figures: All figure legends are same except for each gene name. However, there were different scale in figures between NA gene and the other genes. Please correct the scale in figures or rewrite figure legends.

The bits highlighted in red are a fairly standard reviewers tool. First you give the praise. This is really good work but ...

Then the but kills off the paper. The but here is that the data could fit with the movement of poultry between farms. Except they cannot. For that to happen British, Dutch, And German farmers would have to all have obtained eggs from Korea. This would then have to infected a limited number of farms and spread from the farms to the local wild bird populations in small enough numbers not to the detected by a screen but in large enough numbers to give sporadic cases. This is absolute nonsense. A simple likelihood model says that the wild bird transmission is much more likely than these multiple domestic bird submissions. This referee has another agenda. Blocking the paper until their own paper is ready for the same journal or use this idea to write a paper for Science. EID went on to publish three more papers about the H5N8 outbreak and its spread. This extra data included cases from Russia which I had predicted must be there, but which were unavailable at the time.

The summer breeding grounds in Russia are an essential for monitoring bird influenza epidemics. We need to have a well funded international effort to get the best possible sampling of the virus so that we can predict future outbreaks.

Sunday, 3 January 2016

An essential test: checking for reassortment in influenza viral phylogenetics

It is standard practice for computing a viral phylogenetic tree to collect sequences from a single strain and then carry out the phylogenetic analysis. The problem is that if there has been reassortment between strains then you will get a misleading tree as it will selectively ignore data from certain parts of the tree. This sampling bias is a serious issue.

I once did an analysis of H5N8 evolution in the recent Korean outbreak. The paper had been delayed by submission and rejection to Science and then submission to Emerging Infectious Diseases. EID rejected it and then published 3 papers on H5N8 with pretty much the same analysis within the next year but lets ignore that. The paper went to peerJ where it was further delayed by an anonymous referee who asked for a complete analysis of all the H5 and N8 sequences to check for reassortment in the Korean outbreak. I got pretty annoyed at this delay as I knew competitors were going to publish soon the same analysis (the EID papers). I wrote a very hot headed reply to the Editor in Chief and questioned the editor's requirement for me to do the analysis. Anyway the editor Claus Wilke was very helpful in suggesting that I used MAFFT and FastTree to do the trees for all H5 and N8 and show that there was a single clade with all of the Korean sequences, which there was. This went in the supporting materials.

I was still fuming at the delay and watching several other groups publish the same analysis when I had done it months before them and when at least one of them was likely a referee that rejected the EID paper. Anyway today I decided to look at how often people are asked to check the reassortment of hemagglutinin and neuraminidase within a viral strain tree. Only FastTree can cope with the many thousands of sequences involved. FastTree has been cited for Influenza research 173 times, of those only 23 papers mention reassortment and NONE mention novel subtypes or strains.

Therefore this standard practice that this anonymous referee made me carry out is far from standard. In fact I have NEVER seen anyone carry out that analysis in any other published paper. I am not disputing that it is actually very important. I am disputing why anyone is allowed to publish a viral strain tree without actually including this analysis in at least the method and supplementary materials.

Saturday, 2 January 2016

Subtype vs Strain - Prof Stronzo Bestiale.

I just did a search on google for reassortment leading to novel subtypes of influenza or novel strains of influenza and the results gave me 481 for strain and 219 for subtype. Which is interesting considering the Italian referee whose objection to one of my papers because I did not use enough commas and used the term strain and not subtype. I affectionately call this referee Prof Stronzo Bestiale

Friday, 1 January 2016

What I have learned from the rejected submissions

New year, new start in thinking about how science works and in particular trying to understand the mind of the viral phylogeneticist. These are deductions based on the negative referee comments I have received recently.

The field is very compartmentalised.
Two editors who are significant in the field and one of which at least uses FastTree have NO IDEA whatsoever about how it works to give local bootstraps.
Condensed trees with bootstrap cut-offs are not widely understood in the community (they need to read more and use their brains).
Most of the community do not read very much especially theory and the level of maths is often poor.
If you want to publish you need to keep it simple with a single idea and single question as referees do not seem to be able to manage two ideas at the same time.
Make what you want to say clear - being discrete and tactful and not shouting the implications allows others to deliberately misinterpret your words to create a straw man that they can demolish.
Think like a martial arts master. Use your opponent's attack against them. Turn their comments into your strengths. There is nothing better than the referee who suggested that my experiment was badly designed because of sampling. If they had understood what I was saying in the paper they would know that was exactly my point. Anyone sampling based on influenza subtype is doing something stupid. It is a shame that most of that referee's papers will have done the same.
Be prepared for a long slow fight. They are entrenched. You know that they are wrong and that all they have done is flawed. You have to wait for time and the evidence to grow for their entire edifice to crash around their ears.