liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Storage and Transformation for Data Analysis Using NoSQL
Linköping University, Department of Computer and Information Science.
Linköping University, Department of Computer and Information Science.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Lagring och transformation för dataanalys med hjälp av NoSQL (Swedish)
Abstract [en]

It can be difficult to choose the right NoSQL DBMS, and some systems lack sufficient research and evaluation. There are also tools for moving and transforming data between DBMS' in order to combine or use different systems for different use cases. We have described a use case, based on requirements related to the quality attributes Consistency, Scalability, and Performance. For the Performance attribute, focus is fast insertions and full-text search queries on a large dataset of forum posts. The evaluation was performed on two NoSQL DBMS' and two tools for transforming data between them. The DBMS' are MongoDB and Elasticsearch, and the transformation tools are NotaQL and Compose's Transporter. The purpose is to evaluate three different NoSQL systems, pure MongoDB, pure Elasticsearch and a combination of the two. The results show that MongoDB is faster when performing simple full-text search queries, but otherwise slower. This means that Elasticsearch is the primary choice regarding insertion and complex full-text search query performance. MongoDB is however regarded as a more stable and well-tested system. When it comes to scalability, MongoDB is better suited for a system where the dataset increases over time due to its simple addition of more shards. While Elasticsearch is better for a system which starts off with a large amount of data since it has faster insertion speeds and a more effective process for data distribution among existing shards. In general NotaQL is not as fast as Transporter, but can handle aggregations and nested fields which Transporter does not support. A combined system using MongoDB as primary data store and Elasticsearch as secondary data store could be used to achieve fast full-text search queries for all types of expressions, simple and complex.

Place, publisher, year, edition, pages
2017. , p. 73
Keywords [en]
NoSQL, MongoDB, Elasticsearch, NotaQL, Transporter, DBMS, Database
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:liu:diva-142004ISRN: LIU-IDA/LITH-EX-A--17/049--SEOAI: oai:DiVA.org:liu-142004DiVA, id: diva2:1149908
External cooperation
Imatrics AB
Subject / course
Information Technology
Supervisors
Examiners
Available from: 2017-10-18 Created: 2017-10-17 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(1317 kB)368 downloads
File information
File name FULLTEXT01.pdfFile size 1317 kBChecksum SHA-512
baff9d821ea4f9630fd0f4391d8de23615480c3c6fde1a15196c50049fd86a28665af21f75b08eba63f050d3b146ae87fd8cb3642106fd0e2785ef4504f7cfc8
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nilsson, ChristofferBengtson, John
By organisation
Department of Computer and Information Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 368 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 348 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf