Machine Learning Based Bioinformatics as a Tool for Big-Bata Analytics on Molecular Biology Datasets
Dr. Seferina Mavroudi, Lecturer,
Department of Social Work,
Technological Institute of Western Greece,
Megalou Alexandrou 1, 263 34, Koukouli Patra
Deciphering the underlying biological mechanisms that lead to disease could pave the way for personalized medicine hopefully leading to early prevention of disease and drugs with minimal side-effects. Fulfilling this premise however is very demanding since Biology is complex, with thousands of key players interacting with each other in systems at various scales. In the light of the curse of dimensionality it is obvious that only the advent of big data in modern molecular biology provides the ground for building meaningful models that could formulate novel hypothesis. Moreover, extracting valuable biological knowledge in such environments is usually not feasible with simple statistical methods and sophisticated machine learning paradigms have to be encountered.
In the present talk we will briefly introduce the systems biology perspective according to which all essential biological molecules from genes, proteins, metabolites to cells and organs form “a network of networks”. We will mention the genomic, proteomic and other heterogeneous medical data sources of big data production and we will ultimately elaborate on the analysis of these kinds of data with modern machine learning techniques. The challenges, pitfalls and perspectives of the analysis will be discussed.
Specific case studies concerning proteomic and transcriptomic data analysis aiming at biomarker discovery will be presented. The first case study is related to big data proteomics analysis and specifically to the case of analyzing TMT based Mass Spectrometry datasets which is not only a big data problem but is also related to complex analysis steps. Due to the huge amount of the processing data, standard approaches and serial implementations fail to deliver high quality biomarkers while being extremely time consuming. For this task machine learning and more specifically meta-heuristic methods were deployed combined with high performance parallel computing techniques to provide biomarkers of increased predictive accuracy with feasible and realistic time requirements.
The second case study which will be presented includes big data analytics on transcriptomics data related to the diagnosis of early stage Parkinson disease. Specificaly, a unique network medicine pipeline has been used to combine multiple gene expression datasets created from both microarrays and RNA-sequencing experiments. The proposed methodology not only uncovered significantly fewer biomarkers than the standard approach but also came out with a set of biomarkers which present higher predictive performance and are highly relevant to the underlying mechanisms of Parkinson disease. Cloud computing technology has been used to ease the application of the proposed pipeline in multiple datasets.
Dr. Seferina Mavroudi is a lecturer in the Department of Social Work of the TEI of Western Greece and worked as an adjunct lecturer (407/80) in the Department of Computer Engineering and Informatics of the University of Patras, Greece. Her research interests include computational intelligence, bioinformatics, and scientific computing.
She graduated in 1998 from the Department of Electrical and Computer Engineering, School of Engineering of the Aristotle University of Thessaloniki. In 2000 she received a Master’s degree from the European Postgraduate Program on Biomedical Engineering, organized by the Faculty of Medicine of the University of Patras, the Faculty of Mechanical Engineering and the Faculty of Electrical and Computer Engineering of the National Technical University of Athens, in collaboration with more than 20 European Universities. In the same program, in February of the year 2003 she completed her Ph.D. Thesis with title “Development of advanced computational intelligence models for complex bioinformatics – and biosignal processing applications”. During her phd studies she visited the Bioinformatics Center of the University of Pennsylvania as a visiting researcher. She has over 40 publications in international scientific journals, proceedings of international conferences and book chapters.