Gamma Spectroscopy Data Augmentation for Self-Supervised Machine Learning Applications to Nuclear Nonproliferation on Measured Data with Limited Ground-Truth
The timely detection of special nuclear material (SNM) transfers is an important monitoring objective in nuclear nonproliferation. Labeling sufficient volumes of radiation data for successful supervised machine learning can be too costly when manual analysis is employed. Therefore, this work is developing a machine learning model built on semi-supervised learning to utilize both labeled and unlabeled data and therefore alleviate the cost of labeling. As a preliminary experiment, radiation measurements collected with sodium iodide (NaI) detectors from the Multi-Informatics for Nuclear Operating Scenarios (MINOS) testbed at Oak Ridge National Laboratory (ORNL) are used. Anomalous measurements are identified using a method of statistical hypothesis testing. After background estimation, an energy dependent spectroscopic analysis is used to characterize an anomaly based on its radiation signatures in a noisy labeling heuristic. These noisily labeled spectra are used in training and testing classification models that estimate a binary label: SNM transfer or other anomalous measurement. Supervised logistic regression—trained only on limited labeled data—serves as a baseline to compare three semi-supervised machine learning models all trained on the same limited labeled data and a larger volume of unlabeled data: co-training, Label Propagation, and a Convolutional Neural Network (CNN). In each case, the semi-supervised models outperform logistic regression, suggesting unlabeled data can be valuable when training and demonstrating performative value in semi-supervised nonproliferation implementations. This work uses a self-supervised contrastive learning framework to efficiently extract information from unlabeled data. A contrastive model learns patterns by perturbing data instances using a set of label-invariant data augmentations, meaning augmented samples preserve labeling information present in an original measurement. A set of transformations are designed for gamma spectra, tailored for specific principles of radiation detection. MINOS measurements are augmented, and an encoder is contrastively trained to produce meaningful high-dimensional representations of spectra. A supervised classifier then uses these encoded representations to assign a label estimating whether a given transfer spectrum was of tracked nuclear material or not. Even a simple linear model built on these representations and trained on limited labeled data can achieve a balanced accuracy score of 80.30%. Several tools are employed for evaluating the efficacy of augmentations, representations, and classification models. Principal Component Analysis (PCA) is used to demonstrate that representations provide a richer feature space for detecting nuclear material transfers by embedding distributional information from unlabeled data. Integrated Gradients connect a classifier’s decision boundary to spectral features, suggesting the framework learns relevant patterns in spectra that can be used for detecting transfers. When labeled data are scarce, this work suggests that training a supervised classifier should be prioritized over semi-supervised (compared to self-supervised) contrastive learning an encoder to maximize detection accuracy. Hyperparameter optimization was conducted, finding a locally optimum maximum cross-validated balanced accuracy score. Overall, a methodology has been established for using semi-supervision to accurately classify SNM transfers without the prohibitive cost of labeling.