Data over ordered domains such as time or linear positions are termed sequence data. Sequence data require special treatments which are not provided by traditional DBMSs. Modelling sequence data in traditional (relational) database systems often results in awkward query expressions and bad performance. For this reason, considerable research has been dedicated to supporting sequence data in DBMSs in the last decade. Unfortunately, some important requirements from applications are neglected, i.e., how to support sequence data viewed as continuous under arbitrary user-defined interpolation assumptions, and how to perform sub-sequence extraction efficiently based on the conditions on the value domain. We term these kind of queries as value queries (in contrast to shape queries that look for general patterns of sequences).
This thesis presents pioneering work on supporting value queries on 1-D sequence data based on arbitrary user-defined interpolation functions. An innovative indexing technique, termed the IP-index, is proposed. The motivation for the IP-index is to support efficient calculation of implicit values of sequence data under user-defined interpolation functions. The IP-index can be implemented on top of any existing ordered indexing structure such as a B+-tree. We have implemented the IP-index in both a disk-resident database system (SHORE) and a main-memory database system (AMOS). The highlights of the IP-index - fast insertion, fast search, and space efficiency are verified by experiments. These properties of the IP-index make it particularly suitable for large sequence data.
Based on the work of the IP-index, we introduce an extended SELECT operator, σ*, for sequence data. The σ* operator, σ*cond(TS), retrieves sub-sequences (time intervals) where the values inside those intervals satisfy the condition cond. Experiments made on SHORE using both synthetic and real-life time sequences show that the σ* operator (supported by the IP-index) dramatically improve the performance of value queries. A cost model for the σ* operator is developed in order to be able to optimize complex queries. Optimizations of time window queries and sequence joins are investigated and verified by experiments.
Another contribution of this thesis is on physical organization of sequence data. We propose a multi-level dynamic array structure for dynamic, irregular time sequences. This data structure is highly space efficient and meets the challenge of supporting both efficient random access and fast appending. Other relevant issues such as management of large objects in DBMS, physical organization of secondary indexes, and the impact of main-memory or disk-resident DBMS on sequence data structures are also investigated.
A thorough application study on "terrain-aided navigation" is presented to show that the IP-index is applicable to other application domains.
Linköping: Linköpings universitet , 1999. , 202 p.
1999-03-11, Hörsal Planck, IFM, Linköpings universitet, Linköping, 13:15 (Swedish)