The different clades in the H5 tree are so intermingled that it is often impossible to say that a clade belongs to a single subtype. There is significant mixing and intermingling of subtypes within the complete H5 tree. Constructing trees based on a single H5 containing subtype such as H5N1 or H5N8 will introduce sampling bias.
An example is presented below.
This clade contains H5N2, H5N1, H5N8, H5N5, H5N9 and mixed H5 containing subtypes. H5N8 and H5N9 subtypes occur randomly within the tree. The likelihood ratios are high for the nodes/splits which indicates that this is a reliable reconstruction of the evolutionary tree and that this seemingly random mixing of subtype is not an artefact of tree construction.
This unambiguously shows that to correctly estimate the evolutionary history of H5 you need to sample across all subtypes and that even using geographical or chronological criteria will not produce an unbiased sample. In this case they are all sequences from the Americas but they are found from Guatemala to Alaska and British Columbia and from 2005-2014 which is a very wide range of times and locations.
The conclusion from this result is that we can no longer accept that subtype trees of influenza represent an unbiased sample of lineages or evolution and that all papers that have been published that take this approach for sequence selection in phylogenetic analysis have to be questioned. If analysis is over short time spans such a single influenza season then these trees are likely to be unbiased because the sampling will be from a specific sub-clade, although these analyses will have limited value for making inferences about viral evolution.
However many previously generated trees and a large part of the existing influenza literature is likely to be flawed because of these sampling issues and these papers need to be revisited urgently with a more complete analysis of the data.