Detektion, Quantifikation und Mitigation von Robustheitsanfälligkeiten in Tiefen Neuronalen Netzen
Machine learning (ML) has made enormous progress in the last two decades. Specifically, Deep Neural Networks (DNNs) have led to several breakthroughs. The applications range from synthesizing high-resolution images that are indistinguishable from real photos to large-scale language models that achieve human-level performance on various tasks.
Yet, while humans can apply learned knowledge to new situations with only a few examples, neural networks often fail at this task. As a result, real-world distribution shifts in the environment, demographics, or data collection process pose severe safety risks to humans. For instance, autonomous cars may fail to adapt to unknown road conditions, and medical systems may provide incorrect diagnoses for minorities not included in the training data. Another security threat is the vulnerability of neural networks to small adversarially crafted perturbations. As such, even imperceptible changes in the data can lead to erroneous model behavior. In this cumulative dissertation, I demonstrate a literature gap regarding methods that simultaneously address real-world and adversarial distribution shifts. Therefore, I propose three objectives to increase the robustness of neural networks against both threats.
The first objective consists of the detection of potentially harmful model decisions caused by distribution shifts in the data. Here, we showed that the input-gradient geometry of neural networks can be used to detect both real-world and adversarial distribution shifts [P1]. Unlike prior work, we demonstrated the flexibility of our method by showing its effectiveness on both image and time series classification tasks.
The second objective considers the accurate quantification of network robustness against adversarial distribution shifts (attacks), which is essential to assess the worst-case risk in safety-critical applications. Toward this objective, we propose to improve two critical components of gradient-based adversarial attacks. In one contribution, we improved the convergence of gradient-descent-based optimization by including past gradient information in the optimization history [P2]. In another contribution, we introduced a novel optimization objective that leads to an increased attack success rate while simultaneously reducing the perturbation magnitude of adversarial attacks [P3].
Third, we present a novel approach to mitigate vulnerabilities against real-world and adversarial distribution shifts [P4]. To this end, we theoretically motivate how properties of local extrema in the loss landscape can be used to identify spurious predictions. Based on these findings, we propose the Decision Region Quantification (DRQ) algorithm that analyzes the robustness of local decision regions in the vicinity of a given data point to find the most robust prediction for a given sample.