Tauno Metsalu PhD defence - Statistical analysis of multivariate data in bioinformatics

Klipi teostus: Hendrik Türk 04.03.2016 5301 vaatamist Arvutiteadus


Proteins are one of the most important building blocks of an organism. By investigating the abundance and relations between different proteins, it is possible to get information about the current state of the organism. Modern technologies allow to collect a large amount of data related to proteins in a short period of time. This type of analysis is quite complicated and has created a new field of science called bioinformatics. The aim of the dissertation is to describe problems and solutions related to statistical analysis of multivariate data. It is shown how this type of data can be presented as a matrix. An overview of data sources and analysis methods is given and it is shown how they can be used in practice. A pan-European project PREDECT is described where many organizations are contributing to develop better cancer models. An overview is given about collecting metadata from multiple partners, and about web tools created for initial data analysis. An analysis concerning a novel breast cancer model is described, and a comparison of tissue slices in different cultivation conditions is made. A freely available web tool is introduced which allows to perform exploratory data analysis. Next chapters describe data analysis in various projects. Multiple novel genes were found in the human placenta that have an allele-specific expression. Molecular mechanisms of a disease called atopic dermatitis were examined, more specifically the influence of the protein interferon-gamma. MicroRNAs were found that can be used as markers for a disease called endometriosis, and a classifier was built to differentiate people with endometriosis from healthy people.