Raman spectroscopy, a non-invasive analytical method, offers insights into molecular structures and interactions in various liquid and solid samples with applications ranging from material science, and chemical analysis to medical diagnostics. Preprocessing of Raman spectra is vital to remove interferences like background signals and calibration errors, ensuring precise data extraction. Artificial intelligence, particularly machine learning (ML), aids in extracting valuable information from complex datasets. However, effective data preprocessing proves to be crucial as it can influence model robustness. This study addresses the integration of preprocessing and ML algorithms, often treated as distinct identities despite their intrinsic interconnection, in Raman spectra of blood samples from patients suffering from ovarian cancer. Optimal preprocessing configuration may not always be evident due to the complexity of spectral data. There are numerous options available for background corrections, normalization, outlier removal, noise filtering, and dimension reduction algorithms for Raman spectra. Moreover, hyperparameter tuning is required to detect the best choices for the preprocessing steps. In this work, we present a pipeline to co-optimize preprocessing techniques and ML classification methods to promote objective selection and minimize processing time. In our approach, preprocessing methods are not chosen arbitrarily but rather systematically evaluated to enhance the robustness of the models. These criteria focus on ensuring that the model performs well not only on the training data but also on unseen data, thus reducing the risk of overfitting and improving the generalization capability of the model. This systematic approach would reduce the time for new studies by detecting the most suitable preprocessing steps and hyperparameters needed and building a robust model for the task.
Revision total hip arthroplasty suffers from low visibility with intra-body navigation hinging primarily on auditory and tactile cues. Consequently, the risk of surgical injury increases. One proposition to increase surgical precision is integrating an algorithm which classifies encountered tissues based on their reflectance spectra into the surgical tools. Previous works have developed machine learning applications for the automatic, binary, classification of tissue based on diffuse reflectance spectroscopy (DRS) signals and exploratory investigations have successfully integrated DRS probes into surgical devices including surgical drills. However, one problem with these studies is a lack of transparency in the algorithms, which is important to increase practitioners’ trust and prevent bias. This study developed four machine learning algorithms which simultaneously classified broadband DRS signals (355 – 1850 nm) of six ovine tissue classes. The algorithms were Linear Discriminant Analysis (LDA), Random Forrest, Convolutional Neural Network (CNN), and a Transformer model. Class-wise wavelength importance was visualized using model-based methods to understand classification mechanisms and increase model-explainability. It is concluded that CNNs hold the potential for successful initial device design and medical integration.
SignificanceWavelength selection from a large diffuse reflectance spectroscopy (DRS) dataset enables removal of spectral multicollinearity and thus leads to improved understanding of the feature domain. Feature selection (FS) frameworks are essential to discover the optimal wavelengths for tissue differentiation in DRS-based measurements, which can facilitate the development of compact multispectral optical systems with suitable illumination wavelengths for clinical translation.AimThe aim was to develop an FS methodology to determine wavelengths with optimal discriminative power for orthopedic applications, while providing the frameworks for adaptation to other clinical scenarios.ApproachAn ensemble framework for FS was developed, validated, and compared with frameworks incorporating conventional algorithms, including principal component analysis (PCA), linear discriminant analysis (LDA), and backward interval partial least squares (biPLS).ResultsVia the one-versus-rest binary classification approach, a feature subset of 10 wavelengths was selected from each framework yielding comparable balanced accuracy scores (PCA: 94.8 ± 3.47 % , LDA: 98.2 ± 2.02 % , biPLS: 95.8 ± 3.04 % , and ensemble: 95.8 ± 3.16 % ) to those of using all features (100%) for cortical bone versus the rest class labels. One hundred percent balanced accuracy scores were generated for bone cement versus the rest. Different feature subsets achieving similar outcomes could be identified due to spectral multicollinearity.ConclusionsWavelength selection frameworks provide a means to explore domain knowledge and discover important contributors to classification in spectroscopy. The ensemble framework generated a model with improved interpretability and preserved physical interpretation, which serves as the basis to determine illumination wavelengths in optical instrumentation design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.