I began in bioinformatics when secondary structure prediction was considered an interesting problem. Now it is considered as "solved".
All of the early methods used sliding windows (from Scheraga) and statistical propensities of the amino acids (Chou and Fasman, GOR etc.). The big step forward was realising that predictions for alignments which accounts for positional variation was better than trying on only a single sequence. With the massive growth of the databases this has only got better.
Supposedly Neural Networks also helped improve predictions until we could do no better. The problem is are the NNs actually detecting any patterns that the straight linear statistics were not? They will probably help to find the amphipathic helices that the window methods will struggle with as this is a periodic pattern but they will not be able to deal with beta sheets as these are non local and exist outside of the windows. I had a student who constructed a neural network for prediction without any hidden layers and got the same prediction accuracy. This suggests that the neural networks are contributing nothing to the predictions.
We know that the codon distribution is optimised to make sure that mutations have a minimum effect of the resulting proteins (Baldi and Brunak and Andreas Wagner).
We need to repeat the student's experiment and to check in the NNs actually make any difference. We also need to change the amino acid coding scheme to give realistic distances between amino acids and not Hamming distances.
No comments:
Post a Comment