Saturday, 29 April 2017

Let's go back to the beginning.

I need to tell a long story and so I need to go back to the beginning. Part of this story I have already told but not very well and so this is an attempt to put everything into context.

I did a degree in Chemistry and Law at the University of Exeter. When it came to choosing what to do next I wanted to stay in research. I had met some lawyers at the recruitment fairs and they had convinced me that I did not want to be a lawyer. My drive was wanting to change the world, their motivations were purely financial. I was not going to be Perry Mason and I was not going to write environmental protection legislation and so I turned back to science which had been my dream since my teens.

What caught my eye was protein molecular modelling. I had not done much biological chemistry or biochemistry as an under-graduate but the beauty of the computer models captivated me. It was like the best video game I had ever seen (at that time they used the best graphics computers you could buy and they cost tens of thousands). I had applied for PhDs elsewhere in physical chemistry including with P.W. Atkins (his response was he didn't supervise students, then why was he is the graduate prospectus?). But nothing compared to those ribbon images of proteins.

I received the Norman Rydon scholarship from the Chemistry Department at Exeter. This allowed me to pick my supervisor and the money would come with me. It was an incredible stroke of luck and so I got to follow my dream and study molecular modelling of proteins. My PhD was in homology modelling of FBP aldolase and also including using molecular dynamics to study the conformations of peptide inhibitors. Unfortunately about the time I finished Swiss-Model appeared and what had taken me 3 years to do now could be done in 10 minutes on a server ... That was the end of homology modelling research for me. What I also learned was that modellers depend on the quality of the data they are given. The FBP structure that I used had some limitations and so my models shared those limitations and so I went back a step to become a protein crystallographer.

While I was doing my PhD and protein crystallography post-doc I was the general computational biologist or bioinformatician on call for the research group. This was the mid-1990s and so bioinformatics did not really exist as a subject. I did sequence alignment, BLAST searches and phylogenetic analysis for projects where we tried to understand the evolution of protein structure and function.

What I realised from building these alignments and trees was that if your data is very partial and contains mostly sequences from related species and only a few sequences from more distant species, then this will bias the tree towards your data and possibly away from a better representation of reality. The problem is how do you select sequences to include and exclude? An even bigger issue is the irregularity of the sampling across the "tree of life" (we also did not call it that then either we just called it across the kingdoms). We worked with the recently discovered Archaea and they are dramatically different to the bacteria and the eukaryotes and putting the Archaeal sequences into trees was difficult.

From alignments I also learned that making secondary structure predictions on all the sequences in the alignment is better than just making predictions on a single instance. They should all have the same structure and so this sequence level variation should disappear in predictions. This turned out the be a major discovery (made by someone else) and that ended investigation in secondary structure prediction (OK I have missed out neural nets etc. but I have serious doubts that they contribute anything more than the use of multiple alignment and using GOR or even Chou and Fasman).

I continued as a post-doc in protein crystallography but also dabbled in bioinformatics until in 1999 Exeter set-up an MSc in Bioinformatics. As one of the local experts I helped to set up the course and I taught the sequence and structure modules. I was made a lecturer in 2000 and I remained at Exeter for five years until I got caught up in the departmental politics of the closing of the Chemistry Department (I was a lecturer in Biological Sciences and Engineering and Computer Science at the time). Exeter made me redundant but the atmosphere had soured for me there anyway, because they disapproved of me trying to have a work life balance and putting my wife and newly born children first. They also did not like my involvement in politics. I was a city councillor in Exeter for five years.

One of my students had said did you see this advert for a bioinformatics lecturer at Oxford. I hadn't but the closing date hadn't passed and so I applied. My curriculum was okay. I was much better at teaching than research. Setting myself up as an independent researcher had also been made difficult by having to separate my research from my PhD and post-doctoral supervisor. Luckily I had the support of Dr Ron Yang and we had worked together on some projects. He did the computing (most of it) and I gave the biological input (a bit at the end to make sure that it actually worked). This meant that I had my four publications and that is what matters in UK academia.

I went to the interview at Oxford. I thought it went well. The head of the course was young, bright and very unexpected. Dr (now Prof.) Charlotte Dean would go on to be head of the department of statistics. What was unexpected was how relaxed and casual she was. She was not the serious unapproachable Oxford don. They offered me the job on the same day and I started a few months later. We moved from Devon to Oxfordshire. I lead the teaching in the modules Charlotte was not leading. In Oxford the Bioinformatics MSc was in the Department of Statistics and I became a Departmental Lecturer in Statistics. This also meant I taught statistics, Perl programming and he biology courses. I had gone from being a lecturer in Biological Sciences and Engineering and Computer Science to a lecturer in Statistics. I had degrees in none of these subjects but I am 100% a computational biologist. This is the curse of being inter-disciplinary. I did start to think about systems biology at Exeter and we had a meeting there where I met Kitano and his work was a major influence on my thinking.

This is when I became an accidental statistician and I am glad that it happened, because apart from the stunning molecular images I found that data is what fascinates me. I just can never get enough data. I think that if I had been introduced to statistics earlier on then I might have been a statistician from the start but at school it is never taught well and people fear statistics. At Oxford I learnt to love it. When the lecturer who taught statistical data mining left I took over his module. Now I was teaching masters level statistics to people who had degrees in maths and some of them from Oxford. It was an amazing experience, although I have to admit to spending the entire summer reading the textbooks from cover to cover (thanks to Hastie, Tibshirani and Friedman and also to Brian Ripley).

Charlotte went on to other things and I became acting head of the MSc teaching more and more. My interests now were systems biology and trying to put together data from different experiments and perspectives. What really troubled me and what still sits in the back of my mind is entropy and how it works in living systems. I was a book worm for systems biology.