liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Data De-Duplication with Adaptive Chunking and Accelerated Modification Identifying
Xi An Jiao Tong University, Peoples R China.
Xi An Jiao Tong University, Peoples R China.
Inspur Beijing Elect Informat Ind Co Ltd, Peoples R China.
Linköping University, Department of Science and Technology, Communications and Transport Systems. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-0019-8411
Show others and affiliations
2016 (English)In: Computing and informatics, ISSN 1335-9150, Vol. 35, no 3, 586-614 p.Article in journal (Refereed) Published
Abstract [en]

The data de-duplication system not only pursues the high de-duplication rate, which refers to the aggregate reduction in storage requirements gained from de-duplication, but also the de-duplication speed. To solve the problem of random parameter-setting brought by Content Defined Chunking (CDC), a self-adaptive data chunking algorithm is proposed. The algorithm improves the de-duplication rate by conducting pre-processing de-duplication to the samples of the classified files and then selecting the appropriate algorithm parameters. Meanwhile, FastCDC, a kind of content-based fast data chunking algorithm, is adopted to solve the problem of low de-duplication speed of CDC. By introducing de-duplication factor and acceleration factor, FastCDC can significantly boost de-duplication speed while not sacrificing the de -duplication rate through adjusting these two parameters. The experimental results demonstrate that our proposed method can improve the de -duplication rate by about 5 %, while FastCDC can obtain the increase of de -duplication speed by 50 % to 200 % only at the expense of less than 3 % de duplication rate loss.

Place, publisher, year, edition, pages
Slovak Academy of Sciences Institute of Informatics , 2016. Vol. 35, no 3, 586-614 p.
Keyword [en]
Data de-duplication, self-adaptive, FastCDC
National Category
Signal Processing
URN: urn:nbn:se:liu:diva-131589ISI: 000382272600004OAI: diva2:974633

Funding Agencies|National Key Technology R D Program [2011BAH04B03, 2016YFB1000303]; NSFC [61572394]; Marie Curie IRSES Actions of the European Union Seventh Framework Program (EU-FP7 Contract) [318906]

Available from: 2016-09-27 Created: 2016-09-27 Last updated: 2016-10-03Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Fowler, Scott
By organisation
Communications and Transport SystemsFaculty of Science & Engineering
In the same journal
Computing and informatics
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 9 hits
ReferencesLink to record
Permanent link

Direct link