liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Publications (9 of 9) Show all publications
Helske, S. & Helske, J. (2019). Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R. Journal of Statistical Software, 88(3), 1-32
Open this publication in new window or tab >>Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R
2019 (English)In: Journal of Statistical Software, ISSN 1548-7660, E-ISSN 1548-7660, Vol. 88, no 3, p. 32p. 1-32Article in journal (Refereed) Published
Abstract [en]

Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be used in various longitudinal settings: to account for measurement error, to detect unobservable states, or to compress information across several types of observations. Extending to mixture hidden Markov models (MHMMs) allows clustering data into homogeneous subsets, with or without external covariates. The seqHMM package in R is designed for the efficient modeling of sequences and other categorical time series data containing one or multiple subjects with one or multiple interdependent sequences using HMMs and MHMMs. Also other restricted variants of the MHMM can be fitted, e.g., latent class models, Markov models, mixture Markov models, or even ordinary multinomial regression models with suitable parameterization of the HMM. Good graphical presentations of data and models are useful during the whole analysis process from the first glimpse at the data to model fitting and presentation of results. The package provides easy options for plotting parallel sequence data, and proposes visualizing HMMs as directed graphs.less thanbr /greater thanComment: 33 pages, 8 figures

Place, publisher, year, edition, pages
Alexandria, VA, United States: American Statistical Association, 2019. p. 32
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-154355 (URN)10.18637/jss.v088.i03 (DOI)000457019000001 ()
Available from: 2019-02-07 Created: 2019-02-07 Last updated: 2019-03-07Bibliographically approved
Helske, S., Helske, J. & Eerola, M. (2018). Combining Sequence Analysis and Hidden Markov Models in the Analysis of Complex Life Sequence Data. In: Gilbert Ritschard, Matthias Studer (Ed.), Sequence Analysis and Related Approaches: (pp. 185-200). Switzerland: Springer
Open this publication in new window or tab >>Combining Sequence Analysis and Hidden Markov Models in the Analysis of Complex Life Sequence Data
2018 (English)In: Sequence Analysis and Related Approaches / [ed] Gilbert Ritschard, Matthias Studer, Switzerland: Springer, 2018, p. 185-200Chapter in book (Refereed)
Abstract [en]

Life course data often consists of multiple parallel sequences, one for each life domain of interest. Multichannel sequence analysis has been used for computing pairwise dissimilarities and finding clusters in this type of multichannel (or multidimensional) sequence data. Describing and visualizing such data is, however, often challenging. We propose an approach for compressing, interpreting, and visualizing the information within multichannel sequences by finding (1) groups of similar trajectories and (2) similar phases within trajectories belonging to the same group. For these tasks we combine multichannel sequence analysis and hidden Markov modelling. We illustrate this approach with an empirical application to life course data but the proposed approach can be useful in various longitudinal problems.

Place, publisher, year, edition, pages
Switzerland: Springer, 2018
Series
Life Course Research and Social Policies, ISSN 2211-7776, E-ISSN 2211-7784 ; 10
Keywords
life course, longitudinal data, sequence analysis, family and work trajectories, Markov models, hidden Markov models, latent Markov models, population dynamics
National Category
Probability Theory and Statistics Social Sciences Interdisciplinary
Identifiers
urn:nbn:se:liu:diva-152155 (URN)10.1007/978-3-319-95420-2_11 (DOI)978-3-319-95420-2 (ISBN)978-3-319-95419-6 (ISBN)
Available from: 2018-10-19 Created: 2018-10-19 Last updated: 2018-10-19Bibliographically approved
Helske, J. (2017). KFAS: Exponential Family State Space Models in R. Journal of Statistical Software, 78(10)
Open this publication in new window or tab >>KFAS: Exponential Family State Space Models in R
2017 (English)In: Journal of Statistical Software, ISSN 1548-7660, E-ISSN 1548-7660, Vol. 78, no 10Article in journal (Refereed) Published
Abstract [en]

State space modeling is an efficient and flexible method for statistical inference of a broad class of time series and other data. This paper describes the R package KFAS for state space modeling with the observations from an exponential family, namely Gaussian, Poisson, binomial, negative binomial and gamma distributions. After introducing the basic theory behind Gaussian and non-Gaussian state space models, an illustrative example of Poisson time series forecasting is provided. Finally, a comparison to alternative R packages suitable for non-Gaussian time series modeling is presented.

Place, publisher, year, edition, pages
Foundation for Open Access Statistic, 2017
Keywords
R, exponential family, state space models, time series, forecasting, dynamic linear models
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-144911 (URN)10.18637/jss.v078.i10 (DOI)
Available from: 2018-02-06 Created: 2018-02-06 Last updated: 2018-02-06
Helske, S., Helske, J. & Eerola, M. (2016). Analysing Complex Life Sequence Data with Hidden Markov Modelling. In: G. Ritschard and M. Studer (Ed.), Proceedings of the International Con-ference on Sequence Analysis and Related Methods, Lausanne, June 8-10,2016, pp 209-240: . Paper presented at International Conference on Sequence Analysis and Related Methods, Lausanne, June 8-10, 2016. LaCOSA II
Open this publication in new window or tab >>Analysing Complex Life Sequence Data with Hidden Markov Modelling
2016 (English)In: Proceedings of the International Con-ference on Sequence Analysis and Related Methods, Lausanne, June 8-10,2016, pp 209-240 / [ed] G. Ritschard and M. Studer, LaCOSA II , 2016Conference paper, Published paper (Refereed)
Abstract [en]

When analysing complex sequence data with multiple channels (dimen- sions) and long observation sequences, describing and visualizing the data can be a challenge. Hidden Markov models (HMMs) and their mixtures (MHMMs) offer a probabilistic model-based framework where the information in such data can be compressed into hidden states (general life stages) and clusters (general patterns in life courses). We studied two different approaches to analysing clustered life sequence data with sequence analysis (SA) and hidden Markov modelling. In the first approach we used SA clusters as fixed and estimated HMMs separately for each group. In the second approach we treated SA clusters as suggestive and used them as a starting point for the estimation of MHMMs. Even though the MHMM approach has advantages, we found it to be unfeasible in this type of complex setting. Instead, using separate HMMs for SA clusters was useful for finding and describing patterns in life courses. 

Place, publisher, year, edition, pages
LaCOSA II, 2016
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-144920 (URN)
Conference
International Conference on Sequence Analysis and Related Methods, Lausanne, June 8-10, 2016
Available from: 2018-02-06 Created: 2018-02-06 Last updated: 2018-02-06Bibliographically approved
Luukko, P. J., Helske, J. & Rasanen, E. (2016). Introducing libeemd: a program package for performing the ensemble empirical mode decomposition. Computational statistics (Zeitschrift), 31(2), 545-557
Open this publication in new window or tab >>Introducing libeemd: a program package for performing the ensemble empirical mode decomposition
2016 (English)In: Computational statistics (Zeitschrift), ISSN 0943-4062, E-ISSN 1613-9658, Vol. 31, no 2, p. 545-557Article in journal (Refereed) Published
Abstract [en]

The ensemble empirical mode decomposition (EEMD) and its complete variant (CEEMDAN) are adaptive, noise-assisted data analysis methods that improve on the ordinary empirical mode decomposition (EMD). All these methods decompose possibly nonlinear and/or nonstationary time series data into a finite amount of components separated by instantaneous frequencies. This decomposition provides a powerful method to look into the different processes behind a given time series data, and provides a way to separate short time-scale events from a general trend. We present a free software implementation of EMD, EEMD and CEEMDAN and give an overview of the EMD methodology and the algorithms used in the decomposition. We release our implementation, libeemd, with the aim of providing a user-friendly, fast, stable, well-documented and easily extensible EEMD library for anyone interested in using (E)EMD in the analysis of time series data. While written in C for numerical efficiency, our implementation includes interfaces to the Python and R languages, and interfaces to other languages are straightforward.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2016
Keywords
Hilbert-Huang transform; Intrinsic mode function; Time series analysis; Adaptive data analysis; Noise-assisted data analysis; Detrending
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-144918 (URN)10.1007/s00180-015-0603-9 (DOI)000374375800008 ()
Note

Funding Agencies|Finnish Cultural Foundation; Emil Aaltonen Foundation; Academy of Finland; European Communitys FP7 through the CRONOS project [280879]

Available from: 2018-02-06 Created: 2018-02-06 Last updated: 2018-03-06
Helske, J. & Nyblom, J. (2015). Improved frequentist prediction intervals for autoregressive models by simulation. In: Siem Jan Koopman and Neil Shephard (Ed.), Unobserved Components and Time Series Econometrics: (pp. 291-309). Oxford: Oxford University Press
Open this publication in new window or tab >>Improved frequentist prediction intervals for autoregressive models by simulation
2015 (English)In: Unobserved Components and Time Series Econometrics / [ed] Siem Jan Koopman and Neil Shephard, Oxford: Oxford University Press, 2015, p. 291-309Chapter in book (Other academic)
Abstract [en]

It is well known that the so-called plug-in prediction intervals for autoregressive processes, with Gaussian disturbances, are too short, i.e. the coverage probabilities fall below the nominal ones. However, simulation experiments show that the formulas borrowed from the ordinary linear regression theory yield one-step prediction intervals, which have coverage probabilities very close to that claimed. From a Bayesian point of view the resulting intervals are posterior predictive intervals when uniform priors are assumed for both autoregressive coefficients and logarithm of the disturbance variance. This finding enables one to see how to treat multi-step prediction intervals that are obtained by simulation either directly from the posterior distribution or using importance sampling. An application of the method to forecasting the annual gross domestic product growth in the United Kingdom and Spain is given for the period 2002 to 2011 using the estimation period 1962 to 2001.

Place, publisher, year, edition, pages
Oxford: Oxford University Press, 2015
Keywords
prediction interval, coverage probabilities, Bayesian estimation, multi-step forecasting, gross domestic product
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-144913 (URN)10.1093/acprof:oso/9780199683666.003.0013 (DOI)9780199683666 (ISBN)9780191763298 (ISBN)
Available from: 2018-02-06 Created: 2018-02-06 Last updated: 2018-02-14Bibliographically approved
Helske, J. & Nyblom, J. (2014). Improved frequentist prediction intervals for ARMA models by simulation. In: Knif, Johan; Pape, Bernd (Ed.), Contributions to Mathematics, Statistics, Econometrics, and Finance: essays in honour of professor Seppo Pynnönen (pp. 71-86). Vaasa, Finland: University of Vaasa
Open this publication in new window or tab >>Improved frequentist prediction intervals for ARMA models by simulation
2014 (English)In: Contributions to Mathematics, Statistics, Econometrics, and Finance: essays in honour of professor Seppo Pynnönen / [ed] Knif, Johan; Pape, Bernd, Vaasa, Finland: University of Vaasa , 2014, p. 71-86Chapter in book (Other academic)
Place, publisher, year, edition, pages
Vaasa, Finland: University of Vaasa, 2014
Series
Acta Wasaensia, ISSN 0355-2667, E-ISSN 2342-1282 ; 296
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-144914 (URN)9789524765220 (ISBN)9789524765237 (ISBN)
Available from: 2018-02-06 Created: 2018-02-06 Last updated: 2018-02-06
Helske, J., Nyblom, J., Ekholm, P. & Meissner, K. (2013). Estimating aggregated nutrient fluxes in four Finnish rivers via Gaussian state space models. Environmetrics, 24(4), 237-247
Open this publication in new window or tab >>Estimating aggregated nutrient fluxes in four Finnish rivers via Gaussian state space models
2013 (English)In: Environmetrics, ISSN 1180-4009, E-ISSN 1099-095X, Vol. 24, no 4, p. 237-247Article in journal (Refereed) Published
Abstract [en]

Reliable estimates of the nutrient fluxes carried by rivers from land-based sources to the sea are needed for efficient abatement of marine eutrophication. Although nutrient concentrations in rivers generally display large temporal variation, sampling and analysis for nutrients, unlike flow measurements, are rarely performed on a daily basis. The infrequent data calls for ways to reliably estimate the nutrient concentrations of the missing days. Here, we use the Gaussian state space models with daily water flow as a predictor variable to predict missing nutrient concentrations for four agriculturally impacted Finnish rivers. Via simulation of Gaussian state space models, we are able to estimate aggregated yearly phosphorus and nitrogen fluxes, and their confidence intervals.The effect of model uncertainty is evaluated through a Monte Carlo experiment, where randomly selected sets of nutrient measurements are removed and then predicted by the remaining values together with re-estimated parameters. Results show that our model performs well for rivers with long-term records of flow. Finally, despite the drastic decreases in nutrient loads on the agricultural catchments of the rivers over the last 25years, we observe no corresponding trends in riverine nutrient fluxes.

Keywords
simulation, sparse data, interpolation, Kalman filter, Kalman smoother, PHOSPHORUS LOAD, FINLAND, STREAMS, SERIES
National Category
Oceanography, Hydrology and Water Resources
Identifiers
urn:nbn:se:liu:diva-144915 (URN)10.1002/env.2204 (DOI)000319414200004 ()
Available from: 2018-02-06 Created: 2018-02-06 Last updated: 2018-03-06
Helske, J., Eerola, M. & Tabus, I. (2010). Minimum description length based hidden Markov model clustering for life sequence analysis. In: Proceedings of the Third Workshop on Information Theoretic Methods in Science and Engineering, August 16-18, 2010, Tampere, Finland: . Paper presented at 2010 Workshop on Information Theoretic Methods in Science and Engineering, August 16-18, 2010, Tampere, Finland.
Open this publication in new window or tab >>Minimum description length based hidden Markov model clustering for life sequence analysis
2010 (English)In: Proceedings of the Third Workshop on Information Theoretic Methods in Science and Engineering, August 16-18, 2010, Tampere, Finland, 2010Conference paper, Published paper (Refereed)
Abstract [en]

In this article, a model-based method for clustering life sequences is suggested. In the social sciences, model-free clustering methods are often used in order to find typical life sequences. The suggested method, which is based on hidden Markov models, provides principled probabilistic ranking of candidate clusterings for choosing the best solution. After presenting the principle of the method and algorithm, the method is tested with real life data, where it finds eight descriptive clusters with clear probabilistic structures.

National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-144916 (URN)
Conference
2010 Workshop on Information Theoretic Methods in Science and Engineering, August 16-18, 2010, Tampere, Finland
Available from: 2018-02-06 Created: 2018-02-06 Last updated: 2018-02-06
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-7130-793x

Search in DiVA

Show all publications