liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Spherical NeurO(n)s for Geometric Deep Learning
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-6091-861X
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Felix Klein’s Erlangen Programme of 1872 introduced a methodology to unify non-Euclidean geometries. Similarly, geometric deep learning (GDL) constitutes a unifying framework for various neural network architectures. GDL is built from the first principles of geometry—symmetry and scale separation—and enables tractable learning in high dimensions. Symmetries play a vital role in preserving structural information of geometric data and allow models (i.e., neural networks) to adjust to different geometric transformations. 

In this context, spheres exhibit a maximal set of symmetries compared to other geometric entities in Euclidean space. The orthogonal group O(n) fully encapsulates the symmetry structure of an nD sphere, including both rotational and reflection symmetries. In this thesis, we focus on integrating these symmetries into a model as an inductive bias, which is a crucial requirement for addressing problems in 3D vision as well as in natural sciences and their related applications. 

In Paper A, we focus on 3D geometry and use the symmetries of spheres as geometric entities to construct neurons with spherical decision surfaces—spherical neurons—using a conformal embedding of Euclidean space. We also demonstrate that spherical neuron activations are non-linear due to the inherent non-linearity of the input embedding, and thus, do not necessarily require an activation function. In addition, we show graphically, theoretically, and experimentally that spherical neuron activations are isometries in Euclidean space, which is a prerequisite for the equivariance contributions of our subsequent work. 

In Paper B, we closely examine the isometry property of the spherical neurons in the context of equivariance under 3D rotations (i.e., SO(3)-equivariance). Focusing on 3D in this work and based on a minimal set of four spherical neurons (one learned spherical decision surface and three copies), the centers of which are rotated into the corresponding vertices of a regular tetrahedron, we construct a spherical filter bank. We call it a steerable 3D spherical neuron because, as we verify later, it constitutes a steerable filter. Finally, we derive a 3D steerability constraint for a spherical neuron (i.e., a single spherical decision surface). 

In Paper C, we present a learnable point-cloud descriptor invariant under 3D rotations and reflections, i.e., the O(3) actions, utilizing the steerable 3D spherical neurons we introduced previously, as well as vector neurons from related work. Specifically, we propose an embedding of the 3D steerable neurons into 4D vector neurons, which leverages end-to-end training of the model. The resulting model, termed TetraSphere, sets a new state-of-the-art performance classifying randomly rotated real-world object scans. Thus, our results reveal the practical value of steerable 3D spherical neurons for learning in 3D Euclidean space. 

In Paper D, we generalize to nD the concepts we previously established in 3D, and propose O(n)-equivariant neurons with spherical decision surfaces, which we call Deep Equivariant Hyper-spheres. We demonstrate how to combine them in a network that directly operates on the basis of the input points and propose an invariant operator based on the relation between two points and a sphere, which as we show, turns out to be a Gram matrix. 

In summary, this thesis introduces techniques based on spherical neurons that enhance the GDL framework, with a specific focus on equivariant and invariant learning on point sets. 

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2024. , p. 37
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2393
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:liu:diva-207304DOI: 10.3384/9789180756808ISBN: 9789180756792 (print)ISBN: 9789180756808 (electronic)OAI: oai:DiVA.org:liu-207304DiVA, id: diva2:1894492
Public defence
2024-09-27, Ada Lovelace, B-building, Campus Valla, Linköping, 10:15 (English)
Opponent
Supervisors
Note

Funding:  Wallenberg AI, Autonomous Systems and Software Program (WASP); National Academic Infrastructure for Supercomputing in Sweden (NAISS) partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and by the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.

Available from: 2024-09-03 Created: 2024-09-03 Last updated: 2024-09-06Bibliographically approved
List of papers
1. Embed Me If You Can: A Geometric Perceptron
Open this publication in new window or tab >>Embed Me If You Can: A Geometric Perceptron
2021 (English)In: Proceedings 2021 IEEE/CVF International Conference on Computer Vision ICCV 2021, Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 1256-1264Conference paper, Published paper (Refereed)
Abstract [en]

Solving geometric tasks involving point clouds by using machine learning is a challenging problem. Standard feed-forward neural networks combine linear or, if the bias parameter is included, affine layers and activation functions. Their geometric modeling is limited, which motivated the prior work introducing the multilayer hypersphere perceptron (MLHP). Its constituent part, i.e., the hypersphere neuron, is obtained by applying a conformal embedding of Euclidean space. By virtue of Clifford algebra, it can be implemented as the Cartesian dot product of inputs and weights. If the embedding is applied in a manner consistent with the dimensionality of the input space geometry, the decision surfaces of the model units become combinations of hyperspheres and make the decision-making process geometrically interpretable for humans. Our extension of the MLHP model, the multilayer geometric perceptron (MLGP), and its respective layer units, i.e., geometric neurons, are consistent with the 3D geometry and provide a geometric handle of the learned coefficients. In particular, the geometric neuron activations are isometric in 3D, which is necessary for rotation and translation equivariance. When classifying the 3D Tetris shapes, we quantitatively show that our model requires no activation function in the hidden layers other than the embedding to outperform the vanilla multilayer perceptron. In the presence of noise in the data, our model is also superior to the MLHP.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Series
IEEE International Conference on Computer Vision. Proceedings, ISSN 1550-5499, E-ISSN 2380-7504
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-183312 (URN)10.1109/iccv48922.2021.00131 (DOI)000797698901044 ()9781665428125 (ISBN)9781665428132 (ISBN)
Conference
IEEE/CVF International Conference on Computer Vision (ICCV), 10-17 October 2021 (Virtual Event), Montreal, QC, Canada
Note

Funding: Wallenberg AI, Autonomous Systems and Software Program (WASP); Swedish Research Council [2018-04673]; strategic research environment ELLIIT

Available from: 2022-03-02 Created: 2022-03-02 Last updated: 2024-09-03Bibliographically approved
2. Steerable 3D Spherical Neurons
Open this publication in new window or tab >>Steerable 3D Spherical Neurons
2022 (English)In: Proceedings of the 39th International Conference on Machine Learning / [ed] Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, Sivan Sabato, PMLR , 2022, Vol. 162, p. 15330-15339Conference paper, Published paper (Refereed)
Abstract [en]

Emerging from low-level vision theory, steerable filters found their counterpart in prior work on steerable convolutional neural networks equivariant to rigid transformations. In our work, we propose a steerable feed-forward learning-based approach that consists of neurons with spherical decision surfaces and operates on point clouds. Such spherical neurons are obtained by conformal embedding of Euclidean space and have recently been revisited in the context of learning representations of point sets. Focusing on 3D geometry, we exploit the isometry property of spherical neurons and derive a 3D steerability constraint. After training spherical neurons to classify point clouds in a canonical orientation, we use a tetrahedron basis to quadruplicate the neurons and construct rotation-equivariant spherical filter banks. We then apply the derived constraint to interpolate the filter bank outputs and, thus, obtain a rotation-invariant network. Finally, we use a synthetic point set and real-world 3D skeleton data to verify our theoretical findings. The code is available at https://github.com/pavlo-melnyk/steerable-3d-neurons.

Place, publisher, year, edition, pages
PMLR, 2022
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-187149 (URN)000900064905021 ()
Conference
International Conference on Machine Learning, Baltimore, Maryland, USA, 17-23 July 2022
Note

Funding: Wallenberg AI, Autonomous Systems and Software Program (WASP); Swedish Research Council [2018-04673]; strategic research environment ELLIIT

Available from: 2022-08-08 Created: 2022-08-08 Last updated: 2024-09-03
3. TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
Open this publication in new window or tab >>TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
2024 (English)In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2024, IEEE Computer Society, 2024, p. 5620-5630Conference paper, Published paper (Refereed)
Abstract [en]

In many practical applications, 3D point cloud analysis requires rotation invariance. In this paper, we present a learnable descriptor invariant under 3D rotations and reflections, i.e., the O(3) actions, utilizing the recently introduced steerable 3D spherical neurons and vector neurons. Specifically, we propose an embedding of the 3D spherical neurons into 4D vector neurons, which leverages end-to-end training of the model. In our approach, we perform TetraTransform--an equivariant embedding of the 3D input into 4D, constructed from the steerable neurons--and extract deeper O(3)-equivariant features using vector neurons. This integration of the TetraTransform into the VN-DGCNN framework, termed TetraSphere, negligibly increases the number of parameters by less than 0.0002%. TetraSphere sets a new state-of-the-art performance classifying randomly rotated real-world object scans of the challenging subsets of ScanObjectNN. Additionally, TetraSphere outperforms all equivariant methods on randomly rotated synthetic data: classifying objects from ModelNet40 and segmenting parts of the ShapeNet shapes. Thus, our results reveal the practical value of steerable 3D spherical neurons for learning in 3D Euclidean space

Place, publisher, year, edition, pages
IEEE Computer Society, 2024
Series
Proceedings: IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-207318 (URN)10.1109/CVPR52733.2024.00537 (DOI)9798350353006 (ISBN)9798350353013 (ISBN)
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16-22 June, 2024.
Available from: 2024-09-04 Created: 2024-09-04 Last updated: 2024-09-17
4. On Learning Deep O(n)-Equivariant Hyperspheres
Open this publication in new window or tab >>On Learning Deep O(n)-Equivariant Hyperspheres
Show others...
2024 (English)In: Proceedings of the 41st International Conference on Machine Learning / [ed] Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix, PMLR , 2024, Vol. 235, p. 35324-35339Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

In this paper, we utilize hyperspheres and regular n-simplexes and propose an approach to learning deep features equivariant under the transformations of nD reflections and rotations, encompassed by the powerful group of O(n). Namely, we propose O(n)-equivariant neurons with spherical decision surfaces that generalize to any dimension n, which we call Deep Equivariant Hyperspheres. We demonstrate how to combine them in a network that directly operates on the basis of the input points and propose an invariant operator based on the relation between two points and a sphere, which as we show, turns out to be a Gram matrix. Using synthetic and real-world data in nD, we experimentally verify our theoretical contributions and find that our approach is superior to the competing methods for O(n)-equivariant benchmark datasets (classification and regression), demonstrating a favorable speed/performance trade-off. The code is available on GitHub.

Place, publisher, year, edition, pages
PMLR, 2024
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 235
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-206460 (URN)
Conference
41st International Conference on Machine Learning, Vienna, Austria, 21-27 July 2024
Available from: 2024-08-14 Created: 2024-08-14 Last updated: 2024-09-03

Open Access in DiVA

fulltext(10731 kB)183 downloads
File information
File name FULLTEXT02.pdfFile size 10731 kBChecksum SHA-512
0a4fd15d6a6369e53b7185716fdb15b16c74ec6967a617d9311759a3ed86d2e5ca0fcddeb214163d1bdb8e2faf9a42f4a486c29ed720fb91f24898728cfeaa0f
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Melnyk, Pavlo

Search in DiVA

By author/editor
Melnyk, Pavlo
By organisation
Computer VisionFaculty of Science & Engineering
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 206 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 2351 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf