posted on 2023-01-19, 09:25authored byAiden Nibali
Submission note: A thesis submitted in total fulfilment of the requirements for the degree of Doctor of Philosophy to the Department of Computer Science and Information Technology, College of Science, Health and Engineering, La Trobe University, Victoria, Australia.
To the human eye, an image of another person contains a wealth of information. The recent emergence of deep neural networks has enabled researchers to train models which imitate “human intuition”, accelerating progress on problems such as human pose estimation and tracking. Computer vision systems for solving human-centric tasks often require locating keypoints from visual cues. Regularised soft-argmax is proposed as a technique which can be applied to any neural network architecture that locates points of interest. This approach was implemented as a part of three different systems for solving distinct human-centric analysis problems. For 2D pose estimation, regularised soft-argmax was found to outperform existing techniques; for 3D pose estimation it was a key contributor to state-of-the-art results; and for action monitoring it outperformed alternatives for subject tracking. An important technological trend is the ever-improving camera quality and storage capacity of commodity handheld devices. This trend makes mobile devices convenient deployment targets for computer vision applications. In a handheld environment it is desirable to employ computational techniques with less demanding resource requirements, since the target device has limited processor speed, memory, and battery capacity. In this regard, regularised softargmax allows for a better trade-off between accuracy and resource usage compared to existing techniques. Empirical results show that a model trained using regularised soft-argmax was able to achieve triple the inference speed and half the memory usage, whilst matching 99 percent of the accuracy of a popular state-ofthe-art 2D pose estimation model. Additionally, a case study is investigated for applying computer vision techniques to the real-world action monitoring problem of athletic diving analysis. This complex vision task involves solving the when, where, and what of human activity in continuous video footage. Using regularised soft-argmax and a novel combination of traditional and deep learning techniques, a solution is presented with compelling practical results.
History
Center or Department
College of Science, Health and Engineering. Department of Computer Science and Information Technology.
Thesis type
Ph. D.
Awarding institution
La Trobe University
Year Awarded
2019
Rights Statement
The thesis author retains all proprietary rights (such as copyright and patent rights) over the content of this thesis, and has granted La Trobe University permission to reproduce and communicate this version of the thesis. The author has declared that any third party copyright material contained within the thesis made available here is reproduced and communicated with permission. If you believe that any material has been made available without permission of the copyright owner please contact us with the details.