Indexing strategies for time series data
2002 (English)Doctoral thesis, monograph (Other academic)
Traditionally, databases have stored textual data and have been used to store administrative information. The computers used. and more specifically the storage available, have been neither large enough nor fast enough to allow databases to be used for more technical applications. In recent years these two bottlenecks have started to di sappear and there is an increasing interest in using databases to store non-textual data like sensor measurements or other types of process-related data. In a database a sequence of sensor measurements can be represented as a time series. The database can then be queried to find, for instance, subsequences, extrema points, or the points in time at which the time series had a specific value. To make this search efficient, indexing methods are required. Finding appropriate indexing methods is the focus of this thesis.
There are two major problems with existing time series indexing strategies: the size of the index structures and the lack of general indexing strategies that are application independent. These problems have been thoroughly researched and solved in the case of text indexing files. We have examined the extent to which text indexing methods can be used for indexing time series.
A method for transforming time series into text sequences has been investigated. An investigation was then made on how text indexing methods can be applied on these text sequences. We have examined two well known text indexing methods: the signature files and the B-tree. A study has been made on how these methods can be modified so that they can be used to index time series. We have also developed two new index structures, the signature tree and paged trie structures. For each index structure we have constructed cost and size models. resulting in comparisons between the different approaches.
Our tests indicate that the indexing method we have developed. together with the B-tree structure. produces good results. It is possible to search for and find sub-sequences of very large time series efficiently.
The thesis also discusses what future issues will have to be investigated for these techniques to be usable in a control system relying on time-series indexing to identify control modes.
Place, publisher, year, edition, pages
Linköping: Linköpings universitet , 2002. , 210 p.
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 757
IdentifiersURN: urn:nbn:se:liu:diva-30968Local ID: 16647ISBN: 91-7373-346-6OAI: oai:DiVA.org:liu-30968DiVA: diva2:251791
2002-05-21, Planck, Hus F, Linköpings univertitet, Linköping, 13:15 (Swedish)