Open this publication in new window or tab >>2026 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]
The empirical success of deep learning in a wide range of applications over the last decade has been remarkable. Neural networks can now achieve human-like or superhuman performance at tasks such as image recognition and segmentation,speech recognition, and natural language generation.
Despite decades of research dedicated to understanding how such models learn,there are still many unresolved questions. For instance, neural networks are often severely overparameterized, sometimes with many more parameters than training samples, which according to intuition from classical theory should lead to high sensitivity to noise and poor performance when encountering new data. Yet with enough parameters or training, one can overcome this issue, even without explicit regularization. Understanding implicit biases in training and the induced behavior of neural networks is an important puzzle piece towards understanding how these models learn so efficiently.
This thesis emphasizes the ‘network’ part of neural networks, and uses tools from graph theory to view this class of models from a new perspective that adds to our understanding of their inner workings.
The first paper treats deep linear neural networks, which are neural networks where the nonlinear activations have been removed. The gradient flow equations describing the network’s learning process is an analytically treatable dynamical system, and although it is a simplified model, a deep linear network shares several interesting features with its nonlinear counterpart, such as a non-convex loss function and nonlinear dynamics induced by the overparameterization. The network is considered as a directed acyclic graph and the learning dynamics are described in terms of its adjacency matrix. This reformulation simplifies the gradient flow equations and provides insight into the system properties. For instance,it allows us to highlight an equivalence relation among adjacency matrices, and to investigate stable and unstable manifolds at the critical points of the system without needing to compute the Hessian of the loss function.
The second paper uses the concept of frustration from statistical physics in the context of deep neural networks, and relates frustration to monotonicity of the network when viewed as a function. It is shown that state-of-the-art convolutional neural networks trained on image classification tasks are less frustrated,and thus closer to monotone functions, than what is expected from null models. This suggests an implicit bias in the kind of function that they learn.
Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2026. p. 36
Series
Linköping Studies in Science and Technology. Licentiate Thesis, ISSN 0280-7971 ; 2028
National Category
Computer Sciences Control Engineering
Identifiers
urn:nbn:se:liu:diva-221215 (URN)10.3384/9789181184822 (DOI)9789181184815 (ISBN)9789181184822 (ISBN)
Presentation
2026-03-13, Ada Lovelace, B-huset, Campus Valla, Linköping, 10:15
Opponent
Supervisors
2026-02-132026-02-132026-02-13Bibliographically approved