Submission note: A thesis submitted in total fulfilment of the requirements for the degree of Doctor of Philosophy to the Department of Computer Science and Computer Engineering, School of Engineering and Mathematical Sciences, Faculty of Science, Technology and Engineering, La Trobe University, Bundoora.
The past years have seen novel high-throughput technologies for protein-protein interaction (PPI) measurements that have created large–scale data on protein interaction across human and most species. These data are commonly represented as networks, with nodes representing proteins and edges representing the directed PPIs. A fundamental challenge for bioinformatics is the interpretation of this wealth of data to elucidate the interaction patterns and biological characteristics of the proteins. The current protein-protein interaction algorithms are mainly based on PPI network topological structures, i.e. the similarity or distance measurements of proteins are based on the protein-protein interaction patterns, rather than the functional semantics of the proteins. The reality is that the interactions among proteins should be weighted from their functional semantics. How to establish protein similarity or distance measures from protein functional semantics is the key to how to more precisely and reasonably discover the biological patterns and characteristics of proteins. The protein similarity measurement is developed in chapter 1 and further developed to completion in other chapters. Another problem is that usually, functional semantics (i.e. functional annotations) of some proteins (in some cases, many proteins) in the PPI networks are unknown at the moment. How to define a model to describe the propagation of protein functions within the network structures needs to be dealt with. To address this problem, we developed five algorithms to semantically predict unknown protein functions, which have been proved to have higher efficiency in real datasets. XIV Currently there are several sources from which PPI networks are derived and numerous databases that store PPI data. It was observed that PPI data usually contain much negative information (PPIs that may not occur) that might distort the final analysis results or conclusions. How to validate the reliability of PPI data, or how to filter negative information from existing data sets still remains an open area. Generally, it is an unsolved problem in this field. However, it is necessary to manage the dataset by some means such as integrating multiple datasets or filtering a single dataset. In this thesis, detailed descriptions are given of data handling in all chapters, which is the base of our experiments. Finally, a review on computational developments regarding miRNA regulation is presented. The studies related to tumorigenesis are heavily explored in the current stage. In the future, we will mainly focus on basic studies on the correlation between miRNAs and general PPI networks and identifying miRNA-regulated signaling transduction pathways related to tumorigenesis, to explore the internal relationships between miRNA-targeted proteins during their interactions, which are expected to contribute to cancer treatment and drug discovery.
History
Center or Department
Faculty of Science, Technology and Engineering. School of Engineering and Mathematical Sciences.
Thesis type
Ph. D.
Awarding institution
La Trobe University
Year Awarded
2013
Rights Statement
The thesis author retains all proprietary rights (such as copyright and patent rights) over the content of this thesis, and has granted La Trobe University permission to reproduce and communicate this version of the thesis. The author has declared that any third party copyright material contained within the thesis made available here is reproduced and communicated with permission. If you believe that any material has been made available without permission of the copyright owner please contact us with the details.