La Trobe
38049_SOURCE01_3_A.pdf (8.02 MB)

Keyword search in xml data

Download (8.02 MB)
thesis
posted on 2023-01-18, 17:40 authored by Tuan Khanh Nguyen
Submission note: A thesis submitted in total fulfilment of the requirements for the degree of Doctor of Philosophy to the School of Engineering and Mathematical Sciences, Faculty of Science, Technology and Engineering, La Trobe University, Bundoora.

Keyword searches in XML data have attracted much attention because it can liberate users from the steep learning curve of query languages and the schemas of XML data. However, due to the inherent ambiguity of keyword queries, using keyword searches to query XML data poses several challenges. This thesis proposes approaches to deal with the three main problems in XML keyword searches: identifying relevant results of a query; result ranking for top-k query; and keyword searching over multiple XML databases. First, this thesis introduces a novel semantics called dominant lowest common ancestor (DLCA) to define relevant results of XML keyword searches. Ranking criteria are proposed, and algorithms are introduced to rapidly identify the dominant results. The experiments have been conducted to demonstrate the superiority of our work in comparison with some state-of-the-art approaches in the literature. Second, this thesis also proposes an approach to tackle the many-result problem which is caused by short and ambiguous keyword queries. It proposes a ranking function by applying principles of probabilistic models with a consideration of data dependencies in XML data. An algorithm is developed to efficiently retrieve top-k results based on the well-known threshold algorithm. The empirical results show that the proposed approach outperforms existing approaches in a variety of situations. Finally, this thesis proposes an approach to resolve the problem of searching over multiple XML data sources to avoid the high cost of searching in numerous, potentially irrelevant data sources. The approach summarizes the data sources as succinct synopses for the rapid filtering of non-promising sources. A ranking function is introduced to effectively rank the relevance of the data source to the given query. Experiments are conducted to confirm the superiority of the proposed approach.

History

Center or Department

Faculty of Science, Technology and Engineering. School of Engineering and Mathematical Sciences.

Thesis type

  • Ph. D.

Awarding institution

La Trobe University

Year Awarded

2012

Rights Statement

The thesis author retains all proprietary rights (such as copyright and patent rights) over the content of this thesis, and has granted La Trobe University permission to reproduce and communicate this version of the thesis. The author has declared that any third party copyright material contained within the thesis made available here is reproduced and communicated with permission. If you believe that any material has been made available without permission of the copyright owner please contact us with the details.

Data source

arrow migration 2023-01-10 00:15. Ref: latrobe:38049 (9e0739)

Usage metrics

    Open Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC