42476_SOURCE01_2_A.pdf (1005.73 kB)
Ranking queries in uncertain databases
thesis
posted on 2023-01-19, 09:24 authored by Huynh Thanh Ha NguyenSubmission note: A thesis submitted in total fulfilment of the requirements for the degree of Doctor of Philosophy to the School of Engineering and Mathematical Sciences, College of Science, Health and Engineering, La Trobe University, Bundoora.
In recent years, uncertain (imprecise) data has emerged in many real-world application domains, including sensor networks, moving object tracking, data integration, data cleaning, information extraction and others, hence a research area has emerged to support advanced techniques to efficiently manage and explore such uncertain databases. Ranking queries (also known as top-k queries) are one of the most important analytics techniques and are widely used in data exploration, data analytics and decision-making scenarios. Compared with ranking processes on traditional databases, ranking uncertain data to provide meaningful answers is more complicated due to the complex interplay between scores and probabilities which complicates the semantics of the queries. In addition, the problem is even more challenging when there are conflicting multiple ranking criteria involved in the ranking processes. The main contributions of this thesis are to develop several novel ranking approaches to effectively and efficiently retrieve the truly interesting top-k results on multidimensional and partially ordered domains on uncertain data. First, we define a novel approach, called the Dominating Top-k Aggregate query, which overcomes the weaknesses of several existing ranking approaches to provide trustworthy and useful knowledge from uncertain Big Data to support data analytics and decision making. We guarantee the reliability of our ranking results by demonstrating the satisfaction of data correlations that constrain six fundamental ranking properties of our method. Second, as the top-k representative skyline query is another important method for multi-criteria decision-making applications, we are the first to study the query in the context of uncertain data. We handle both discrete and continuous cases. We also personalize the query by employing user-references regarding the priority of individual raking criteria. Finally, we study ranking queries under the extended uncertain data model where the attribute values of data objects are expressed as continuous ranges.
In recent years, uncertain (imprecise) data has emerged in many real-world application domains, including sensor networks, moving object tracking, data integration, data cleaning, information extraction and others, hence a research area has emerged to support advanced techniques to efficiently manage and explore such uncertain databases. Ranking queries (also known as top-k queries) are one of the most important analytics techniques and are widely used in data exploration, data analytics and decision-making scenarios. Compared with ranking processes on traditional databases, ranking uncertain data to provide meaningful answers is more complicated due to the complex interplay between scores and probabilities which complicates the semantics of the queries. In addition, the problem is even more challenging when there are conflicting multiple ranking criteria involved in the ranking processes. The main contributions of this thesis are to develop several novel ranking approaches to effectively and efficiently retrieve the truly interesting top-k results on multidimensional and partially ordered domains on uncertain data. First, we define a novel approach, called the Dominating Top-k Aggregate query, which overcomes the weaknesses of several existing ranking approaches to provide trustworthy and useful knowledge from uncertain Big Data to support data analytics and decision making. We guarantee the reliability of our ranking results by demonstrating the satisfaction of data correlations that constrain six fundamental ranking properties of our method. Second, as the top-k representative skyline query is another important method for multi-criteria decision-making applications, we are the first to study the query in the context of uncertain data. We handle both discrete and continuous cases. We also personalize the query by employing user-references regarding the priority of individual raking criteria. Finally, we study ranking queries under the extended uncertain data model where the attribute values of data objects are expressed as continuous ranges.
History
Center or Department
College of Science, Health and Engineering. School of Engineering and Mathematical Sciences. Department of Computer Science and Computer Engineering.Thesis type
- Ph. D.