Ardi Tampuu  PhD "Neural networks for analyzing biological data"

Klipi teostus: Merlin Pastak 13.10.2020 1182 vaatamist Arvutiteadus


Supervisor:
Prof. Raul Vicente, Institute of Computer Science, University of Tartu
 
Opponents:
Dr. Oliver Stegle, European Molecular Biology Laboratory (Germany)
Prof. Aušra Saudargienė, Lithuanian University of Health Sciences (Lithuania)

Artificial neural networks (ANNs) are a machine learning algorithm that has gained popularity in recent years. Different subtypes of ANNs are used in various fields of computer science. For example, convolutional networks are useful in object and face recognition systems; whereas recurrent neural networks are effective in speech recognition and natural language processing. However, these examples are not the only possible applications of neural nets - in this thesis we demonstrated the benefits of ANNs in analyzing two biological datasets. First, we investigated if based only on the information contained within a DNA snippet it is possible to predict if the snippet originates from a viral genome or not. Through two publications we demonstrated that machine learning algorithms can make this prediction. Convolutional neural networks (CNNs) proved to be the most accurate. The recommendation system created allows virologists to identify yet unknown viral species, which may have important effects on human health. The second biological dataset analyzed originates from neuroscience. In mammalian hippocampus there are so called place cells which activate only if the animal is in a specific location in space. We showed that recurrent neural networks (RNNs) allow to predict the animal’s location with ~10cm precision based on the activity of only a few dozen place cells. RNNs proved to be more effective than the most commonly used Bayesian methods. These networks use the past neuronal activity as a context that helps fine-tune the location predictions. Also in many other neural datasets the prior brain activity might reflect important information about the current behaviour. Hence, RNNs might turn out to be very useful in making sense of brain signals. Similarly, CNNs are likely to prove more efficient than the currently used methods on many other bioinformatics datasets. We hope this thesis encourages more scientists to try neural networks on their own datasets.