La Trobe

Machine learning strategies for data analysis and applications in forecasting and diagnosis studies

Download (3.62 MB)
thesis
posted on 2023-01-19, 09:37 authored by Hao-Fan Yang
Submission note: Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy to the Department of Computer Science and Information Technology, School of Engineering and Mathematical Sciences, La Trobe University, Bundoora.

Rapid advances in the technology of sensing, collecting, and storing data have enabled users to obtain vast amounts of data. However, effectively and efficiently extracting useful information has proven extremely challenging, especially in the scientific domains. As scientists and researchers gather data, they incorporate and build on techniques and theories from many disciplines, such as statistics, machine learning, data mining, and more, to extract knowledge and information from a wide array of data. The interdisciplinary nature of this process means that practitioners with expertise in different disciplines are in high demand for data analysis and applications, yet the difficulty of attracting a sufficient number of experts to this field is a significant concern. Traditional data analysis techniques and tools are not able to tackle data of a massive size and also data of a non-traditional nature cannot be analysed by traditional approaches, even if the dataset is relatively small. Therefore, machine learning has become a key research area of computer science since it is expected to provide solutions to the above issues. The basic concept of machine learning is to simulate human learning behaviour using computational devices and automate the building of analytic/forecasting models that use algorithms to learn from data interactively. With this inductive learning, the analysed/forecasted results can be improved over time with less or even no human intervention. This thesis focuses on using machine learning approaches to deal with the problems of traffic forecasting and lung cancer diagnosis. The goal of the thesis is to develop deep insights from datasets in a timely fashion, extract hidden knowledge from the collected data with greater precision, and reduce the demand for practitioners with expertise. Based on different methods of data analysis in traffic forecasting, three novel models are proposed, exponential smoothing and Levenberg-Marquardt neural networks (E-L-NN), stacked auto-encoders Levenberg-Marquardt (SAE-LM), and hybrid empirical mode decomposition and stacked auto-encoders (EMD-SAE). The proposed models are applied to real-world data obtained from many highways in the United Kingdom and evaluated by comparing them with several current forecasting models. The experimental results have proven that the proposed models are able to generate forecasting results with high accuracy and efficiency. In particular, the SAE-LM model has superior performance (about 90 percent accuracy rate) in traffic flow forecasting and is the most suitable approach to deal with lumpy data. Lung cancer diagnosis research has been conducted by applying some machine learning and data mining approaches. Many attributes in the clinical data were obtained and classified by the decision tree method, and data association mining was employed as a tool to extract knowledge from the correlation between pathology reports and clinical information. The Apriori algorithm was used to extract association rules and the significance of each generated rule was examined and evaluated using support, confidence, and lift. The proposed framework was applied to real world data which is collected from the organization “TCGA”. Many interesting rules have been generated and evaluated, and the evaluation results demonstrated that the proposed framework can provide insight into solutions to support the diagnosis of lung cancer pathologic staging. This thesis contributes to developing more accurate and effective models for traffic flow forecasting which can solve traffic congestion problems and improve transportation mobility. It also contributes to effectively utilizing clinical information to replace the pathology reports and further support lung cancer diagnosis. Our research in exploring the advantages of many machine learning approaches for data science analysis and applications will help increase awareness among scientific researchers who are unacquainted with the potential benefits of machine learning.

History

Center or Department

School of Engineering and Mathematical Sciences. Department of Computer Science and Information Technology.

Thesis type

  • Ph. D.

Awarding institution

La Trobe University

Year Awarded

2016

Rights Statement

The thesis author retains all proprietary rights (such as copyright and patent rights) over the content of this thesis, and has granted La Trobe University permission to reproduce and communicate this version of the thesis. The author has declared that any third party copyright material contained within the thesis made available here is reproduced and communicated with permission. If you believe that any material has been made available without permission of the copyright owner please contact us with the details.

Data source

arrow migration 2023-01-10 00:15. Ref: latrobe:42418 (9e0739)

Usage metrics

    Open Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC