liu.seSearch for publications in DiVA
Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (3 av 3) Visa alla publikasjoner
Wendin, J. (2026). A Graph-Based Perspective on Neural Networks. (Licentiate dissertation). Linköping: Linköping University Electronic Press
Åpne denne publikasjonen i ny fane eller vindu >>A Graph-Based Perspective on Neural Networks
2026 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The empirical success of deep learning in a wide range of applications over the last decade has been remarkable. Neural networks can now achieve human-like or superhuman performance at tasks such as image recognition and segmentation,speech recognition, and natural language generation.

Despite decades of research dedicated to understanding how such models learn,there are still many unresolved questions. For instance, neural networks are often severely overparameterized, sometimes with many more parameters than training samples, which according to intuition from classical theory should lead to high sensitivity to noise and poor performance when encountering new data. Yet with enough parameters or training, one can overcome this issue, even without explicit regularization. Understanding implicit biases in training and the induced behavior of neural networks is an important puzzle piece towards understanding how these models learn so efficiently.

This thesis emphasizes the ‘network’ part of neural networks, and uses tools from graph theory to view this class of models from a new perspective that adds to our understanding of their inner workings.

The first paper treats deep linear neural networks, which are neural networks where the nonlinear activations have been removed. The gradient flow equations describing the network’s learning process is an analytically treatable dynamical system, and although it is a simplified model, a deep linear network shares several interesting features with its nonlinear counterpart, such as a non-convex loss function and nonlinear dynamics induced by the overparameterization. The network is considered as a directed acyclic graph and the learning dynamics are described in terms of its adjacency matrix. This reformulation simplifies the gradient flow equations and provides insight into the system properties. For instance,it allows us to highlight an equivalence relation among adjacency matrices, and to investigate stable and unstable manifolds at the critical points of the system without needing to compute the Hessian of the loss function.

The second paper uses the concept of frustration from statistical physics in the context of deep neural networks, and relates frustration to monotonicity of the network when viewed as a function. It is shown that state-of-the-art convolutional neural networks trained on image classification tasks are less frustrated,and thus closer to monotone functions, than what is expected from null models. This suggests an implicit bias in the kind of function that they learn.

sted, utgiver, år, opplag, sider
Linköping: Linköping University Electronic Press, 2026. s. 36
Serie
Linköping Studies in Science and Technology. Licentiate Thesis, ISSN 0280-7971 ; 2028
HSV kategori
Identifikatorer
urn:nbn:se:liu:diva-221215 (URN)10.3384/9789181184822 (DOI)9789181184815 (ISBN)9789181184822 (ISBN)
Presentation
2026-03-13, Ada Lovelace, B-huset, Campus Valla, Linköping, 10:15
Opponent
Veileder
Tilgjengelig fra: 2026-02-13 Laget: 2026-02-13 Sist oppdatert: 2026-05-18bibliografisk kontrollert
Wendin, J. & Altafini, C. (2026). Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective. SIAM Review, 68(2), 293-345
Åpne denne publikasjonen i ny fane eller vindu >>Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective
2026 (engelsk)Inngår i: SIAM Review, ISSN 0036-1445, E-ISSN 1095-7200, Vol. 68, nr 2, s. 293-345Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

This paper surveys recent progress in understanding the dynamics and loss landscape of the gradient flow equations associated with deep linear neural networks, i.e., the gradient descent training dynamics (in the limit when the step size goes to 0) of deep neural networks missing the activation functions and subject to quadratic loss functions. When formulated in terms of the adjacency matrix of the neural network, as is done in this paper, these gradient flow equations form a class of converging matrix ODEs which is nilpotent, polynomial, isospectral, and with conservation laws. A detailed description of the loss landscape shows that it is described in detail and is characterized by infinitely-many global minima and saddle points, both strict and nonstrict, but that it lacks local minima and maxima. The loss function itself is a positive semidefinite Lyapunov function for the gradient flow, and its level sets are unbounded invariant sets of critical points with critical values that correspond to the amount of singular values of the input-output data learnt by the gradient along a certain trajectory. The adjacency matrix representation we use in the paper allows us to highlight the existence of a quotient space structure in which each critical value of the loss function is represented only once, while all other critical points with the same critical value belong to the fiber associated to the quotient space. It also allows us to easily determine stable and unstable submanifolds at the saddle points, even when the Hessian fails to obtain them.

sted, utgiver, år, opplag, sider
Society for Industrial & Applied Mathematics (SIAM), 2026
Emneord
deep learning, deep linear neural networks, matrix ODEs, gradient systems, dynamical systems
HSV kategori
Identifikatorer
urn:nbn:se:liu:diva-223987 (URN)10.1137/24m1715519 (DOI)
Tilgjengelig fra: 2026-05-18 Laget: 2026-05-18 Sist oppdatert: 2026-05-18
Wendin, J., Larsson, E. G. & Altafini, C. Computing frustration and near-monotonicity in deep neural networks.
Åpne denne publikasjonen i ny fane eller vindu >>Computing frustration and near-monotonicity in deep neural networks
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

For the signed graph associated to a deep neural network, one can compute the frustration level, i.e., test how close or distant the graph is to structural balance. For all the pretrained deep convolutional neural networks we consider, we find that the frustration is always less than expected from null models. From a statistical physics point of view, and in particular in reference to an Ising spin glass model, the reduced frustration indicates that the amount of disorder encoded in the network is less than in the null models. From a functional point of view, low frustration (i.e., proximity to structural balance) means that the function representing the network behaves near-monotonically, i.e., more similarly to a monotone function than in the null models. Evidence of near-monotonic behavior along the partial order determined by frustration is observed for all networks we consider. This confirms that the class of deep convolutional neural networks tends to have a more ordered behavior than expected from null models, and suggests a novel form of implicit regularization.

Emneord
Disordered Systems and Neural Networks, Machine Learning
HSV kategori
Identifikatorer
urn:nbn:se:liu:diva-221214 (URN)10.48550/arXiv.2510.05286 (DOI)
Tilgjengelig fra: 2026-02-13 Laget: 2026-02-13 Sist oppdatert: 2026-05-06
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0009-0008-3946-1409