liu.seSearch for publications in DiVA
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Remote Sensing 3D SceneRetrieval: Multi-modal Alignment of Text, Images,and Digital Elevation Models
Linköping University, Department of Electrical Engineering, Computer Vision.
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 28 HE creditsStudent thesis
Abstract [en]

Multi-modal retrieval has traditionally focused on combining diverse query in-puts, such as text and sketches, in remote sensing and computer vision. However, retrieval involving multi-modal target representations, such as paired rgband depth data, remains largely unaddressed. This work investigates whetherincorporating a depth modality can improve the performance of vision-languagemodels in the context of satellite image retrieval. To explore this, a novel dataset, rsitdd, was constructed, combining orthophotos and digital height models, andused to train a clip-based remote sensing depth encoder. Experimental results show that models augmented with a depth encoder outperform their text-image-only counterparts across multiple benchmark settings. These findings highlight the potential of depth-enhanced models for remote sensing applications and demonstrate that even simple fusion techniques can yield measurable performance improvements.

Place, publisher, year, edition, pages
2025. , p. 62
Keywords [en]
multi-modal retrieval, remote sensing retrieval, dual-encoder
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-219595ISRN: LiTH-ISY-EX--25/5794--SEOAI: oai:DiVA.org:liu-219595DiVA, id: diva2:2015058
External cooperation
Maxar
Subject / course
Computer Engineering
Presentation
2025-08-28, Linköping, 09:00 (English)
Supervisors
Examiners
Available from: 2025-11-21 Created: 2025-11-20 Last updated: 2025-11-21Bibliographically approved

Open Access in DiVA

fulltext(28580 kB)38 downloads
File information
File name FULLTEXT01.pdfFile size 28580 kBChecksum SHA-512
63e09dfc74ea07292ebd01ec6142b3a2761829a877c3bcb288f36d2a125bed1ca76a08b00af9cb89b07693c42646117438933fee1a7e343423251b9a9dafa67a
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Adam, Samuelsson
By organisation
Computer Vision
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 284 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf