Monday, 27 April 2015

Quick to Block

I generally find those who are swift to block aren't worth talking to. In that I include James Delingpole, Guido Fawkes and Damian Thompson. They are the fingers in their ears debaters who like their own voice and their own opinions much more than anybody else's.

If you read the  book Emotional Vampires you will recognise them clearly.  You will also know that they will never recognise this about themselves and that arguing with them is a waste of breath. They are always right, always perfect, they never make a mistake. I have worked around people like this where you walk on egg-shells not to say the wrong thing or do the wrong thing and it was the most miserable experience of my life.

It is always disappointing interacting with those you admire.

I have to remind myself that I should not overly admire people as they often prove to be as flawed as everyone else. So I believe strongly in rights for sex workers and I was disappointed to learn that Tina Fey does a lot of anti-sex worker "comedy". Today I was being generous and giving her the benefit of the doubt and so I was thinking aloud, well she may just be delivering the material, she might not be the writer and if you have a show you can just be there to deliver the lines and not think too much about the content.

So anyway Dr Brooke Magnanti had made the allegation and there was a link to the Saturday Night Live routine. I had sent her a tweet saying about maybe she is just a performer. To which she responded strongly. I have seen interactions on Twitter before and I would say she often responds pretty strongly. Often this is with justification and my tweet certainly annoyed her. So I carefully wrote another saying it is not an excuse but implying that nobody is perfect and if she was a writer as well she had no excuses at all.

So Brooke Magnanti's response was:

Dude. I get you want to argue this but fuck off, she makes jokes about dead bodies of people like me. Begone.

Along of course with a block. So while I am very disappointed with Tina Fey. However I am also disappointed with Brooke Magnanti and this is a bigger disappointment to me personally because she was the person I admired that I was alluding to in the title.

Now with more time to do my research and get an informed opinion rather than just living of tweets I know that Fey was the chief writer for SNL when they did the French Hooker sketches and that is just one of a long line of offences (

Stoya's article is great as usual. So Fey is bad and didn't deserve the benefit of the doubt it is a sort of, whatever as I am not that big a fan. I just wanted to be sure she was not being done an injustice, as I have jumped on too many band-wagons on Twitter and I was balance and thoughtful today. I make plenty of mistakes and so sending Magnanti my other tweet was one. Fey is definitely a slut shamer and sex worker hater.

Friday, 10 April 2015

False Discovery Rate vs p-values

The problem with p-values are they are just based on an aside made by Fisher who said that any data that was more than 2 standard deviations away from the mean would be unusual. So the cult of the p-value was born.

If you look at Bayesian analysis of a rare disease (rare is less than 1 case in 10,000) where you have a test that is correct 99% of the time for true positives and also has a false positive rate of 1% (p-value) then you will still have a large number of cases where you identify the disease where it isn't actually present.

So for examples in a population of 1,000,000 you may have 100 cases of which your very good test finds 99 and only misses one. But it also finds 9999 cases that are not actually real. You false cases vastly outnumber your real cases. So if you are diagnosed your probability of having the disease is 99/10098, or less than 0.1% and that is with a p-value of 0.01!

The same thing happens if I do multiple tests. If I set a p-value of 0.05 which is quite typical there is a 1/20 chance of seeing a result when it is not there. So if I do 20 tests on average 1 of them will show significance. This is easily corrected by Bonferroni's method amongst others.

In genetic analysis you get complex problems with tens of thousands of variables that are all tested simultaneously. You also have massively under-powered studies because your sample numbers are small and so, sample size << number of variables. So you will always be doing many more tests than are justified by the resulting data and you will almost always be over-fitting the model.

This is why the false discovery rate is so important in genetic analysis but really when you look deeply at this it will still mean that even with careful use of FDR most genetic results from big data experiments will turn out to be wrong.

Here are the professionals talking about false discovery rates.
Selective Inference and False Discovery Rate I
Selective Inference and False Discovery Rate II
Estimating Local False Discovery Rate in Differential Expression
Interpreting p and q values in Genetic Analysis

Friday, 20 March 2015

Science free Science

When I started protein modelling we studied one protein at a time and tried to work out in detail how it functioned and what it did and each lab was a world leader in that protein. Then along came high-throughput and structural genomics.

In some ways this was good as we had missed all the connections but it also means that a lot of specialist expertise was lost and detailed studies that did not have the high-throughput angle were not funded. Losing that expertise was bad, we lost expertise on transcription factors and individual protein family databases disappeared to be replaced by the EBI and NCBI monoliths.

Then once these early big data adopters, bioinformaticians as we called them then found that they were still missing the connections that high-throughput was supposed to give them the new buzz became systems biology. Really this was just dusting off the work of Bertalanffy and others who had tried before but were data poor and so couldn't find any solutions. I was at a meeting about Systems Biology at Exeter and there was one glum group the biological phenotype researchers. They had been doing systems biology all the time and now they were going to be bull-dozed over by the bioinformatics revolution as the big data people with no local expertise demolished their field and took all the funding. The perfect analogy is the local stall (Mon and Pop store for US readers) when Wall-mart arrives. So another field got concreted over by the bioinformatics juggernaut. The real sense of systems biology was lost and it failed because what is needed was a paradigm shift in the way of thinking and big data is rooted in reductionism and computability

Now they want to get big data from health services because despite all of this analysis and all the work of the last 2 decades we still have no idea what we are doing and we haven't cured cancer. So now the bioinformaticians are moving into health-care data. Now they are going to concrete over the public health specialists and epidemiologists as all the big money is going to be directed into these monolithic projects that will yet again fail to find a cure for cancer or understand how living things work.

So I feel sad for the loss of all those disciplines and all those experts and I have to include myself among the bioinformaticians. At least I can say I was never a very pushy or successful one. I admire a lot of the people driving the steam-roller. They are good and honest scientists most of them (there are always exceptions more interested in themselves than science). But I wish they would just stop the Big Data juggernaut for a quiet coffee break and have a think about what we can do not to wipe these fields out, but to learn from them. The real article that said we had all gone mad was Chris Anderson's The End of Theory in Wired, Now I also like Peter Norvig and his work and ideas but in his quotes it goes too far. Big data is good and important and I am glad we live in the Google world where I no longer have to memorise endless tedious facts. I am glad we have machines that can deal with massive amounts of data, using whatever your favourite result finding algorithm is, be it SVMs, neural nets, solitons or whatever else gets you excited.

But a former student of mine who was doing a post-doc at a very prestigious US University after completing a PhD at Cambridge asked one of the leading bioinformaticians (he has many hundred publications) about the biological meaning of the patterns found and about how the data had been collected and what the limitations were and he shrugged his shoulders and said he had no idea. That is what we are missing and that is why Big Data fails, because it has no context and no big picture. If I show you a picture of Madonna from the 1990s Erotica Tour and ask you to say what you see I will say Madonna in her silly pointy bra outfit. Another generation will not recognise her, some might say Lady Gaga, some might say a singer, some might say a woman in underwear. Knowledge out of context does not work and Big Data is leading us to Science free Science. It will give us answers like 42 is the answer to Life the Universe and Everything but we won't know why.

So lets stop concreting over the epidemiologists, the systemists, the experts on a single protein, the public health specialists and lets start listening to them rather than the white noise of high throughput data.

Monday, 2 March 2015

Amazon Vine and helpful votes.

I am a reviewer for Amazon Vine and I also review outside the Vine program. Since joining Vine I have seen my helpful review percentage fall and I have dropped out of the Top 1000 reviewers. So I thought I would check if there is a statistical difference between helpful review percentage for vine items and non-Vine items. So I did a simple chi-squared test.

The Chi-square statistic, P value and statement of significance appear beneath the table. Blue means you're dealing with dependent variables; red, independent.
 positivenegativeMarginal Row Totals
vine176   (195.58)   [1.96]52   (32.42)   [11.82]228
non-vine777   (757.42)   [0.51]106   (125.58)   [3.05]883
Marginal Column Totals9531581111    (Grand Total)

The Chi-square statistic is 17.3343. The P value is 3.1E-05. This result is significant at p < 0.05.

That is a very significant difference. There is quite a lot of bias because I have many more non-Vine reviews that have been there for much longer than the Vine reviews and my most helpful review is a non-Vine review but this does still support what a lot of Vine members say that some people systematically go around clicking on the unhelpful review button for anything Vine reviews.

Friday, 20 February 2015

Why I will never trust Science again.

On the 7th of December I submitted a paper about the spread of H5N8 bird flu via bird migration to Science. A pdf version of the file can be found here. Then I waited and went off for my Christmas vacation. That is why I did not see the final reply from Science until the beginning of January.


Biomedical Science
University of Westminster
Westminster None W1W 6UW

Dear Dr. Dalby

Manuscript number: aaa3940

Thank you for submitting your manuscript "The European and Japanese outbreaks of H5N* derive from a single source population that has been dispersed along the long distance bird migratory flyways. " to Science. Because your manuscript was not given a high priority rating during the initial screening process, we have decided not to proceed to in-depth review. The overall view is that the scope and focus of your paper make it more appropriate for a more specialized journal. We are therefore notifying you so that you can seek publication elsewhere.

We now receive many more interesting papers than we can publish. We therefore send for in-depth review only those papers most likely to be ultimately published in Science. Papers are selected on the basis of discipline, novelty, and general significance, in addition to the usual criteria for publication in specialized journals. Therefore, our decision is not necessarily a reflection of the quality of your research but rather of our stringent space limitations.


Caroline Ash, Ph.D.
Senior Editor

That was fine but the timing was a bit unfortunate and so delayed the paper being sent out to another more specific journal. I was happy with the paper but it was borderline in significance and Science has a lot more important manuscripts to publish.

So I sent it to Emerging Infectious Disease on the 9th of January in a modified form with some typos removed and a switch of emphasis on the epidemiology as that is what they need. The new manuscript for EID I sent is here. Again the paper was rejected on the 30th of January because it does not really fit with EID which wants manuscripts that are about diseases that affect human health and in this case it looks like it will only be an avian disease.

I was reading the Science weekly e-mail and saw that they were going to publish a paper on the spread of H5N8 by migratory birds in their Insights column. It is available here. This was even reported by the BBC. So it seems it was a more important story than I had thought.

Reading this I was rather angry that this was published and that my paper had been rejected as it comes to the same conclusions and so I wrote a short and quite angry e-mail to the editor of Science complaining about ethics and precedence. This was the reply which is in the name of Caroline Ash.

Dear Dr Dalby
Thank you for your message. I understand your concern that we should publish an item on the same topic as yours shortly after having  rejected your report. However, I should clarify. The Verhagen piece is published in the Insights section of Science and is therefore intended as commentary without data. Your paper was submitted as a formal research report with data that would normally be subject to peer review.
We receive many excellent papers, but we are limited in the number and subject areas we can pursue in each section of the journal and find ourselves rejecting the majority. Although we decided against in-depth review of your paper we enjoyed reading it and unless you have submitted elsewhere would encourage you to try our new journal Science Advances:
I hope this information is of some help and I am sorry your experience at Science was disappointing.
Kind regards
Caroline Ash
Caroline Ash
Senior Editor, Science;
ASI Science International, 82-88 Hills Road, Cambridge, UK, CB2 1LQ
+44 1223 326500;

So therefore anything we read in the Insights section of Science should be taken with a pinch of salt because it does not contain data and it is only a commentary. Anyway I went through the Science paper with a fine toothed comb and found a few errors that raise concerns. First there is s a different lineage in circulation in North America and more seriously the reference cited to support the figure and in fact the main conclusion about migration was wrong. So I submitted a letter to Science pointing out these faults. 

So I expected them to take this seriously as the error in the reference fundamentally undermines the paper and there is no alternative reference that collects the data that supports the figure and conclusion other than my own paper which is in PeerJ preprints. So I was fairly astounded by their reply.
Dear Dr. Dalby,

Manuscript number: aaa8769

Thank you for sending a Letter to Science. We have read your contribution but will not be able to publish it.  We invite you to leave an online comment instead.  To leave a comment, go and find the published paper to which your comment refers.  Then click Leave a Comment to submit.  Online comments should be no more than 300 words.  Excerpts from comments are occasionally published in the print Letters section of Science.

Note that we will post a correction to the reference you mention.

Please do not reply to this email, as it will not be read by Science. Unfortunately, the volume of submissions precludes specific discussions about individual submitted letters.


Jennifer Sills

I had asked for Caroline Ash to act as Editor on the letter submission as she had been the one who responded to my earlier e-mail and had been the signatory on the original rejection. I would say that my experience with Science goes beyond disappointing.  I would say that at this moment I am extremely angry with their behaviour. I would encourage anyone reading this to only support open access publication and transparency in peer review.

Sunday, 18 January 2015

Books I have read

When Bacon wrote his dictum about books there were so few that you could not be over-powered by their number. Now more than ever we are over-whelmed by writing and his dictum has become much more significant.

Now we have to distinguish the mundane from the profound, we have to distinguish the books that have an impact on the reader from the general background noise.  Not only are some books more significant, but they also take on a life of their own, moulding the experiences and beliefs of the reader.  Each book has its own time, it has order and it has age.

I keep a list of all of the books that I have read so that I can try and unpick the influences that they have on me. Now I have realised that just knowing what I have read is not enough. I need to know when I read it and in what order. For the last three years I have kept an ordered list, and pushed my reading to 50 books a year.

For example I read Brave New World in my 30s and for me it was a profound book because it struck a chord with my age, my experiences and the world in which I lived, but I doubt that the experiences of anyone else would put it into the same context. Reflecting on it, I think it is a book that is more likely to resonate with older readers with a wider range of experiences and I think that I would have appreciated it less if I had read it as a teenager.  The same is true of Borges, Labyrinths. Now for me it is an amazing book, but I do not think I would have grasped its many different layers and themes if I had read it in my teens or 20s. Reading it later I find that it has so many hidden ideas that make it a greater work of thought than many works of philosophy.