liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data De-Duplication with Adaptive Chunking and Accelerated Modification Identifying
Xi An Jiao Tong University, Peoples R China.
Xi An Jiao Tong University, Peoples R China.
Inspur Beijing Elect Informat Ind Co Ltd, Peoples R China.
Linköping University, Department of Science and Technology, Communications and Transport Systems. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-0019-8411
Show others and affiliations
2016 (English)In: Computing and informatics, ISSN 1335-9150, Vol. 35, no 3, p. 586-614Article in journal (Refereed) Published
Abstract [en]

The data de-duplication system not only pursues the high de-duplication rate, which refers to the aggregate reduction in storage requirements gained from de-duplication, but also the de-duplication speed. To solve the problem of random parameter-setting brought by Content Defined Chunking (CDC), a self-adaptive data chunking algorithm is proposed. The algorithm improves the de-duplication rate by conducting pre-processing de-duplication to the samples of the classified files and then selecting the appropriate algorithm parameters. Meanwhile, FastCDC, a kind of content-based fast data chunking algorithm, is adopted to solve the problem of low de-duplication speed of CDC. By introducing de-duplication factor and acceleration factor, FastCDC can significantly boost de-duplication speed while not sacrificing the de -duplication rate through adjusting these two parameters. The experimental results demonstrate that our proposed method can improve the de -duplication rate by about 5 %, while FastCDC can obtain the increase of de -duplication speed by 50 % to 200 % only at the expense of less than 3 % de duplication rate loss.

Place, publisher, year, edition, pages
Slovak Academy of Sciences Institute of Informatics , 2016. Vol. 35, no 3, p. 586-614
Keywords [en]
Data de-duplication, self-adaptive, FastCDC
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:liu:diva-131589ISI: 000382272600004OAI: oai:DiVA.org:liu-131589DiVA, id: diva2:974633
Note

Funding Agencies|National Key Technology R D Program [2011BAH04B03, 2016YFB1000303]; NSFC [61572394]; Marie Curie IRSES Actions of the European Union Seventh Framework Program (EU-FP7 Contract) [318906]

Available from: 2016-09-27 Created: 2016-09-27 Last updated: 2017-11-21Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Link to publication

Search in DiVA

By author/editor
Fowler, Scott
By organisation
Communications and Transport SystemsFaculty of Science & Engineering
In the same journal
Computing and informatics
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 85 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf