liu.seSearch for publications in DiVA
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automatic Detection, unpacking of untagged compressed data
Linköping University, Department of Computer and Information Science, Cybersecurity.
Linköping University, Department of Computer and Information Science, Cybersecurity.
2026 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Automatisk detektion, uppacking av otaggad komprimerad data (Swedish)
Abstract [en]

Modern digital systems rely heavily on firmware updates that are frequently distributed as compressed binary blobs. In forensic investigations and security audits, these blobs often appear withoutfile headers or metadata, rendering standard signature-based extraction tools ineffective. This thesispresents BinSift, a modular Python-based framework designed for the automatic detection, classification, and “blind” decompression of untagged compressed data.To calibrate the system, a large-scale statistical analysis was conducted on the FirmSec dataset, profiling approximately 34,136 firmware images totaling over 200 GB of binary data. Results indicatethat an average Shannon entropy threshold of 7.1 bits per byte provides an optimal balance for capturing modern compression formats like LZMA and SquashFS while minimizing false positives fromhigh-density uncompressed code.The BinSift framework was evaluated against industrial firmware samples, achieving a 59.0% successrate in “True Blind” mode without any prior knowledge of file headers. This approach maintained an81.5% fidelity retention compared to metadata-assisted baselines. When excluding mathematicallyunrecoverable encrypted payloads, the effective success rate rose to 84.4%. These findings demonstrate that entropy-based stream identification and bit-level refinement are viable solutions for bypassing obfuscation in embedded systems forensics.

Place, publisher, year, edition, pages
2026. , p. 66
Keywords [en]
Firmware Forensics, Blind Decompression, Shannon Entropy, Embedded Systems Security, Binary Blob Analysis, Signatureless Extraction, Heuristic Stream Detection, Reverse Engineering
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:liu:diva-224113ISRN: LITH-EX-A--26/018--SEOAI: oai:DiVA.org:liu-224113DiVA, id: diva2:2060854
Presentation
2026-05-13, Charles Babbage, Linköping, 14:15 (English)
Supervisors
Examiners
Available from: 2026-05-27 Created: 2026-05-19 Last updated: 2026-05-27Bibliographically approved

Open Access in DiVA

fulltext(4222 kB)39 downloads
File information
File name FULLTEXT01.pdfFile size 4222 kBChecksum SHA-512
6d8a8edeccae0c3442899d454a16f2b5caeea9ff8c209b10ba8ae55a5b614f7e94c1c2882aa2d19bf912a5b33f2d9b3f69a49f460cef30d6756c2d08d6e3f920
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Attin, ArvidChristensson, Martin
By organisation
Cybersecurity
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 124 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf