posted on 2023-01-19, 11:22authored byMurtadha Talib Neamah AL-Sharuee
Submission note: A thesis submitted in total fulfilment of the requirements for the degree of Doctor of Philosophy to the Department of Computer Science and Information Technology, School of Engineering and Mathematical Science, College of Science, Health and Engineering, La Trobe University, Bundoora, Victoria, Australia.
Web development has changed human interaction and communication drastically and has led to an enormous and rapid growth in user-generated data. Thus, a very large number of product reviews are currently available which is rapidly and continuously increasing. The existing literature on review sentiment analysis mostly utilizes supervised models, which usually suffer from domain-dependency and require an expensive manual labelling effort to provide training data. In addition to this, the increased availability of web reviews requires a relevant solution that can handle review streams. Therefore, this thesis introduces an automatic contextual analysis and ensemble clustering approach (ACAEC). Using effective contextual procedures and modifying the base learning component (the k-means algorithm) results in a successful approach which can overcome the domain-dependency and the labelling cost problems. In addition to the already available datasets, two new sets of reviews on Australian airlines and home builders are scraped to evaluate the method. Then we adopt this method to enable a stream of reviews to be processed by suggesting temporal sentiment analysis (T SA) methods. Two methods are suggested, namely window sequential clustering (WSC) –dynamic learning and temporal analysis– and segregated window clustering (SWC) –temporal analysis. An unsupervised review selection is incorporated with T SA to obtain insights into the discovered sentiment patterns. To evaluate the proposed T SA, sets of review series of four airlines and an Australian property agent are scraped and used in the experiments. Moreover, a new label-free measure, consistency, is proposed to assess the clustering quality. The methods were effective and the experiments show that the average accuracy rates of SWC and WSC reach 87.54 percent and 83.87 percent, respectively. The suggested solutions are unsupervised i. e. domain-independent and suitable for the analysis of large quantity of data. Experiments on several data mining algorithms using different data representations show that the proposed methods are effective in processing reviews compared to supervised and unsupervised approaches in terms of accuracy, stability and generalizability.
History
Center or Department
College of Science, Health and Engineering. School of Engineering and Mathematical Science. Department of Computer Science and Information Technology.
Thesis type
Ph. D.
Awarding institution
La Trobe University
Year Awarded
2018
Rights Statement
The thesis author retains all proprietary rights (such as copyright and patent rights) over the content of this thesis, and has granted La Trobe University permission to reproduce and communicate this version of the thesis. The author has declared that any third party copyright material contained within the thesis made available here is reproduced and communicated with permission. If you believe that any material has been made available without permission of the copyright owner please contact us with the details.