La Trobe
DOCUMENT
1202387_Gardner,W_2023.pdf (2.59 MB)
DOCUMENT
article_SI_final.pdf (6.84 MB)
1/0
2 files

Effect of data preprocessing and machine learning hyperparameters on mass spectrometry imaging models

Version 3 2023-11-08, 05:34
Version 2 2023-09-22, 01:08
Version 1 2023-09-22, 00:59
journal contribution
posted on 2023-11-08, 05:34 authored by Wil GardnerWil Gardner, David WinklerDavid Winkler, David L. J. Alexander, Davide Ballabio, Benjamin MuirBenjamin Muir, Paul PigramPaul Pigram

ABSTRACT: The self-organizing map (SOM) is a nonlinear machine learning algorithm that is particularly well suited for visualizing and analyzing high-dimensional, hyperspectral time-of-flight secondary ion mass spectrometry (ToF-SIMS) imaging data. Previously, we compared the capabilities of the SOM with more traditional linear techniques using ToF-SIMS imaging data. Although SOMs perform well with minimal data preprocessing and negligible hyperparameter optimization, it is important to understand how different data preprocessing methods and hyperparameter settings influence the performance of SOMs. While these investigations have been reported outside of the ToF-SIMS field, no such study has been reported for hyperspectral MSI data. To address this, we used two labelled ToF-SIMS imaging data sets, one of which was a polymer microarray data set while the other was semi-synthetic hyperspectral data. The latter was generated using a novel algorithm which we describe. A grid-search was used to evaluate which data preprocessing methods and SOM hyperparameters had the largest impact on the performance of the SOM. This was assessed using multiple linear regression, whereby performance metrics were regressed onto each variable defining the preprocessing-hyperparameter space. We found that preprocessing was generally more important than hyperparameter selection. We also found statistically significant interactions between several parameters studied, suggesting a complex interplay between preprocessing and hyperparameter selection. Importantly, we identified interesting trends, both data set specific and data set agnostic, which we describe and discuss in detail.

Funding

Office of National Intelligence, National Intelligence and Security Discovery Research Grant (NI210100127)

Australian National Fabrication Facility (ANFF)

History

School

  • School of Computing, Engineering and Mathematical Sciences

Publication Date

2023-12-01

Journal

Journal of Vacuum Science and Technology Part A: International Journal Devoted to Vacuum, Surfaces, and Films

Volume

41

Issue

6

Article Number

063204

Pagination

12p.

Publisher

AIP Publishing

ISSN

0734-2101

Rights Statement

© 2023 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Usage metrics

    Journal Articles

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC