liu.seSearch for publications in DiVA
Change search
Refine search result
1234567 1 - 50 of 610
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Ahlinder, Jon
    et al.
    Totalförsvarets Forskningsinstitut, FOI, Stockholm, Sweden.
    Nordgaard, Anders
    Swedish National Forensic Centre (NFC), Linköping, Sweden.
    Wiklund Lindström, Susanne
    Totalförsvarets Forskningsinstitut, FOI, Stockholm, Sweden.
    Chemometrics comes to court: evidence evaluation of chem–bio threat agent attacks2015In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 29, no 5, p. 267-276Article in journal (Refereed)
    Abstract [en]

    Forensic statistics is a well-established scientific field whose purpose is to statistically analyze evidence in order to support legal decisions. It traditionally relies on methods that assume small numbers of independent variables and multiple samples. Unfortunately, such methods are less applicable when dealing with highly correlated multivariate data sets such as those generated by emerging high throughput analytical technologies. Chemometrics is a field that has a wealth of methods for the analysis of such complex data sets, so it would be desirable to combine the two fields in order to identify best practices for forensic statistics in the future. This paper provides a brief introduction to forensic statistics and describes how chemometrics could be integrated with its established methods to improve the evaluation of evidence in court.

    The paper describes how statistics and chemometrics can be integrated, by analyzing a previous know forensic data set composed of bacterial communities from fingerprints. The presented strategy can be applied in cases where chemical and biological threat agents have been illegally disposed.

  • 2.
    Ahlqvist, Max
    et al.
    Linköping University, Department of Management and Engineering, Solid Mechanics. Linköping University, Faculty of Science & Engineering. Epiroc Rock Drills AB, Sweden.
    Weddfelt, Kenneth
    Epiroc Rock Drills AB, Sweden.
    Norman, Viktor
    Linköping University, Department of Management and Engineering, Engineering Materials. Linköping University, Faculty of Science & Engineering.
    Leidermark, Daniel
    Linköping University, Department of Management and Engineering, Solid Mechanics. Linköping University, Faculty of Science & Engineering.
    Probabilistic evaluation of the Step-Stress fatigue testing method considering cumulative damage2023In: Probabilistic Engineering Mechanics, ISSN 0266-8920, E-ISSN 1878-4275, Vol. 74, article id 103535Article in journal (Refereed)
    Abstract [en]

    A general testing and analysis framework for the Step-Stress fatigue testing method is identified, utilizing interval-censored data and maximum likelihood estimation in an effort to improve estimation of fatigue strength distribution parameters has been performed. The Step-Stress methods limitations are characterized, using a simple material model that considers cumulative damage to evaluate load history effects. In this way, the performance including cumulative damage was evaluated and quantified using a probabilistic approach with Monte-Carlo simulations, benchmarked against the Staircase method throughout the work. It was found that the Step-Stress method, even when cumulative damage occurs to a wide extent, outperforms the Staircase method, especially for small sample sizes. Furthermore, positive results reaches further than the increase performance in estimating fatigue strength distribution parameters, where improvements in secondary information, i.e. S-N data gained from failure specimens, are shown to be distributed more closely to the fatigue life region of interest.

  • 3.
    Ahmad, M. Rauf
    et al.
    Linköping University, Department of Mathematics, Mathematical Statistics . Linköping University, The Institute of Technology.
    Ohlson, Martin
    Linköping University, Department of Mathematics, Mathematical Statistics . Linköping University, The Institute of Technology.
    von Rosen, Dietrich
    Linköping University, Department of Mathematics, Mathematical Statistics . Linköping University, The Institute of Technology.
    A U-statistics Based Approach to Mean Testing for High Dimensional Multivariate Data Under Non-normality2011Report (Other academic)
    Abstract [en]

    A test statistic is considered for testing a hypothesis for the mean vector for multivariate data, when the dimension of the vector, p, may exceed the number of vectors, n, and the underlying distribution need not necessarily be normal. With n, p large, and under mild assumptions, the statistic is shown to asymptotically follow a normal distribution. A by product of the paper is the approximate distribution of a quadratic form, based on the reformulation of well-known Box's approximation, under high-dimensional set up.

  • 4.
    Ahmad, M. Rauf
    et al.
    Linköping University, Department of Mathematics, Mathematical Statistics . Linköping University, The Institute of Technology.
    Ohlson, Martin
    Linköping University, Department of Mathematics, Mathematical Statistics . Linköping University, The Institute of Technology.
    von Rosen, Dietrich
    Department of Energy and Technology, Swedish Univerity of Agricultural Sciences, SE-750 07 Uppsala, Sweden.
    Some Tests of Covariance Matrices for High Dimensional Multivariate Data2011Report (Other academic)
    Abstract [en]

    Test statistics for sphericity and identity of the covariance matrix are presented, when the data are multivariate normal and the dimension, p, can exceed the sample size, n. Using the asymptotic theory of U-statistics, the test statistics are shown to follow an approximate normal distribution for large p, also when p >> n. The statistics are derived under very general conditions, particularly avoiding any strict assumptions on the traces of the unknown covariance matrix. Neither any relationship between n and p is assumed. The accuracy of the statistics is shown through simulation results, particularly emphasizing the case when p can be much larger than n. The validity of the commonly used assumptions for high-dimensional set up is also briefly discussed.

  • 5.
    Ahmad, M. Rauf
    et al.
    Swedish University of Agricultural Sciences, Uppsala, Sweden and Department of Statistics, Uppsala University, Sweden.
    von Rosen, Dietrich
    Linköping University, Department of Mathematics, Mathematical Statistics. Linköping University, The Institute of Technology.
    Singull, Martin
    Linköping University, Department of Mathematics, Mathematical Statistics. Linköping University, The Institute of Technology.
    A note on mean testing for high dimensional multivariate data under non-normality2013In: Statistica Neerlandica, ISSN 0039-0402, E-ISSN 1467-9574, Vol. 67, no 1, p. 81-99Article in journal (Refereed)
    Abstract [en]

    A test statistic is considered for testing a hypothesis for the mean vector for multivariate data, when the dimension of the vector, p, may exceed the number of vectors, n, and the underlying distribution need not necessarily be normal. With n,p→∞, and under mild assumptions, but without assuming any relationship between n and p, the statistic is shown to asymptotically follow a chi-square distribution. A by product of the paper is the approximate distribution of a quadratic form, based on the reformulation of the well-known Box's approximation, under high-dimensional set up. Using a classical limit theorem, the approximation is further extended to an asymptotic normal limit under the same high dimensional set up. The simulation results, generated under different parameter settings, are used to show the accuracy of the approximation for moderate n and large p.

    Download full text (pdf)
    fulltext
  • 6.
    Aitken, Colin
    et al.
    School of Mathematics, University of Edinburgh, Edinburgh, United Kingdom.
    Nordgaard, Anders
    Swedish Police Authority, National Forensic Centre (NFC), Linköping, Sweden.
    Taroni, Franco
    School of Criminal Justice, Université de Lausanne, Lausanne, Switzerland.
    Biedermann, Alex
    School of Criminal Justice, Université de Lausanne, Lausanne, Switzerland.
    Commentary: Likelihood Ratio as Weight of Forensic Evidence: A Closer Look: A commentary on Likelihood Ratio as Weight of Forensic Evidence: A Closer Look by Lund, S. P., and Iyer, H. (2017). J. Res. Natl. Inst. Stand. Technol. 122:272018In: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 9, article id 224Article in journal (Other academic)
    Download full text (pdf)
    fulltext
  • 7.
    Alenlöv, Johan
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.
    Doucet, Arnaud
    Univ Oxford, England.
    Lindsten, Fredrik
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.
    Pseudo-Marginal Hamiltonian Monte Carlo2021In: Journal of machine learning research, ISSN 1532-4435, E-ISSN 1533-7928, Vol. 22Article in journal (Refereed)
    Abstract [en]

    Bayesian inference in the presence of an intractable likelihood function is computationally challenging. When following a Markov chain Monte Carlo (MCMC) approach to approximate the posterior distribution in this context, one typically either uses MCMC schemes which target the joint posterior of the parameters and some auxiliary latent variables, or pseudo-marginal Metropolis-Hastings (MH) schemes. The latter mimic a MH algorithm targeting the marginal posterior of the parameters by approximating unbiasedly the intractable likelihood. However, in scenarios where the parameters and auxiliary variables are strongly correlated under the posterior and/or this posterior is multimodal, Gibbs sampling or Hamiltonian Monte Carlo (HMC) will perform poorly and the pseudo-marginal MH algorithm, as any other MH scheme, will be inefficient for high-dimensional parameters. We propose here an original MCMC algorithm, termed pseudo-marginal HMC, which combines the advantages of both HMC and pseudo-marginal schemes. Specifically, the PM-HMC method is controlled by a precision parameter N, controlling the approximation of the likelihood and, for any N, it samples the marginal posterior of the parameters. Additionally, as N tends to infinity, its sample trajectories and acceptance probability converge to those of an ideal, but intractable, HMC algorithm which would have access to the intractable likelihood and its gradient. We demonstrate through experiments that PM-HMC can outperform significantly both standard HMC and pseudo-marginal MH schemes.

    Download full text (pdf)
    fulltext
  • 8.
    Alhasan, Ahmed
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning.
    Generating Geospatial Trip DataUsing Deep Neural Networks2022Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Synthetic data provides a good alternative to real data when the latter is not sufficientor limited by privacy requirements. In spatio-temporal applications, generating syntheticdata is generally more complex due to the existence of both spatial and temporal dependencies.Recently, with the advent of deep generative modeling such as GenerativeAdversarial Networks (GAN), synthetic data generation has seen a lot of development andsuccess. This thesis uses a GAN model based on two Recurrent Neural Networks (RNN)as a generator and a discriminator to generate new trip data for transport vehicles, wherethe data is represented as a time series. This model is compared with a standalone RNNnetwork that does not have an adversarial counterpart. The result shows that the RNNmodel (without the adversarial counterpart) performed better than the GAN model dueto the difficulty that involves training and tuning GAN models.

    Download full text (pdf)
    Master Thesis
  • 9.
    Alnervik, Jonna
    et al.
    Linköping University, Department of Computer and Information Science, Statistics.
    Nord Andersson, Peter
    Linköping University, Department of Computer and Information Science, Statistics.
    En retrospektiv studie av vilka patientgrupper som erhåller insulinpump2010Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
    Abstract [sv]

    Målsättning

    Att utreda skillnader i tillgänglighet till insulinpump mellan olika patientgrupper samt vad som orsakar ett byte till insulinpump.

    Metod

    Data från 7224 individer med typ 1 diabetes vid tio olika vårdenheter analyserades för att utreda effekterna av njurfunktion, kön, långtidsblodsocker, insulindos, diabetesduration samt ålder. Jämförelsen mellan patientgrupper utfördes med logistisk regression som en tvärsnittsstudie och Cox-regression för att utreda vad som föregått ett byte till pump.

    Resultat

    Genom logistisk regression erhölls en bild av hur skillnader mellan patienter som använder insulinpump och patienter som inte gör det ser ut i dagsläget. Cox-regressionen tar med ett tidsperspektiv och ger på så sätt svar på vad som föregått ett byte till insulinpump. Dessa analyser gav liknande resultat gällande variabler konstanta över tiden. Kvinnor använder pump i större utsträckning än män och andelen pumpanvändare skiljer sig åt vid olika vårdenheter. I dagsläget visar sig hög ålder sänka sannolikheten att använda insulinpump, vilket bekräftas vid den tidsberoende studien som visade hur sannolikheten att byta till pump är avsevärt lägre vid hög ålder. Långtidsblodsockret har också tydlig effekt på sannolikheten att gå över till pump där ett högt långtidsblodsocker medför hög sannolikhet att byta till insulinpump.

    Slutsatser

    I dagsläget finns det skillnader i andelen insulinpumpanvändare mellan olika patientgrupper och skillnader finns även i de olika gruppernas benägenhet att byta från andra insulinbehandlingar till insulinpump. Beroende av patienters njurfunktion, kön, långtidsblodsocker, insulindos, diabetesduration och ålder har dessa olika sannolikheter att byta till insulinpump.

    Download full text (pdf)
    FULLTEXT01
  • 10.
    Alsén, Simon
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning.
    Åkesson, Andreas
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning.
    Jämförelse av metoder för hantering av partiellt bortfall vid logistisk regressionsanalys2021Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Missing data is a common problem in research and can lead to loss of statistical power and bias in parameter estimates. Numerous methods have been developed for dealing with missing data, and the aim of this thesis is to evaluate how a number of these methods affect the parameter estimates in a logistic regression model, and whether these methods are suitable for the data in question. The methods included in this study are complete case analysis, MICE and missForest.

    For the purpose of evaluating the methods, missing values in varying proportions and under different missing mechanisms are generated in a real dataset consisting of 2987 observations and five variables. The performance of the methods is assessed by normalized root mean squared error (NRMSE), and by comparing the regression coefficients estimated using the original, true data set with the regression coefficients estimated using imputed data sets.

    missForest results in the lowest NRMSE. In the subsequent logistic regression analysis, however, MICE results in considerably lower bias than missForest.

    Download full text (pdf)
    fulltext
  • 11.
    Amundsson, Martin
    Linköping University, Department of Computer and Information Science.
    Långtidscovid: symptomförlopp och mönster över tid: En explorativ analys av crowdsource-insamlat enkätdata2022Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Two years after the first recorded outbreak of Covid-19 its long-term effects are still not completely understood. An unknown proportion of all covid patients go on to develop post-acute covid syndrome and suffer long-term symptoms and health effects long after the initial infection subsides. Project Crowdsourcing Långtidscovid-Sverige sent out in summer of 2021 an open online survey and gathered respondents through crowdsourcing to gather info about people in Sweden with prolonged health effects lasting at least three months after confirmed or suspected Covid-19 infection.

    In this thesis an explorative analysis of the aforementioned survey is conducted with its initial focus placed onthe progression of symptoms. Descriptive statistics are provided for the survey sample; hierarchical clusteringon principal components is performed; and association rule mining as well as sequence rule mining is used toextract frequently co-occurring symptoms.

    Women stand for 85.2% of all respondents, possibly indicating a skewed gender distribution in the sample. The average age of a respondent is 50 years old, but ranges between 18 and 80 years of age. The number of reported symptoms tend to diminish over time and symptoms within the 'air passages' category diminish on average quicker than other categories.

    Hierarchical clustering with Ward’s criterion revealed 4 clusters with an average silhouette coefficient of 0.246. The resulting clusters are not well-separated from each other and have some overlap in their bordering regions, and should therefore be interpreted with caution. Broadly speaking, individuals from cluster 1, 3 and 4 are distinguished primarily by their total number of symptoms reported, meanwhile cluster 2 is characterized by individuals that experience many symptoms early on and fewer symptoms later on.

    The most prevalent symptom over the entire period is fatigue (90.2%), closely followed by worsening symptomsafter physical activity (87.1%), problems with concentration (82.3%), headaches (79.5%), and brain fog (77.9%). There are several strong associations between various symptoms, especially for symptoms within the same category. Most symptoms have a sequential correlation with themselves and have an increased tendency to occur several times.

    Download full text (pdf)
    fulltext
  • 12.
    Anders, Erik
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning.
    Classification of Corporate Social Performance2021Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Over the past few years there has been an exponentially increasing attention in financetowards socially responsible investments which creates a need to determine whether acompany is socially responsible or not. The ESG ratings often used to do this are based onEnvironmental, Social and Governance related data about the companies and have manyflaws. This thesis proposes to instead model them by their controversies discussed in themedia. It tries to answer the question if it is possible to predict future controversies of acompany by its controversies and ESG indicators in the past and to isolate predictors whichinfluence these. This has not been done before and offers a new way of rating companieswithout falling for the biases of conventional ESG ratings. The chosen method to approachthis issue is the Zero Inflated Poisson Regression with Random Intercepts. A selectionof variables was determined by Lasso and projection predictive variable selection. Thismethod discovered new connections in the data between ESG indicators and the numberof controversies but also made it apparent that it is difficult to make predictions for futureyears. Nether the less the coefficients of the selected indicators can give a valuable insightinto the potential risk of an investment.

    Download full text (pdf)
    fulltext
  • 13.
    Anderskär, Erika
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning.
    Thomasson, Frida
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning.
    Inkrementell responsanalys av Scandnavian Airlines medlemmar: Vilka kunder ska väljas vid riktad marknadsföring?2017Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Scandinavian Airlines has a large database containing their Eurobonus members. In order to analyze which customers they should target with direct marketing, such as emails, uplift models have been used. With a binary response variable that indicates whether the customer has bought or not, and a binary dummy variable that indicates if the customer has received the campaign or not conclusions can be drawn about which customers are persuadable. That means that the customers that buy when they receive a campaign and not if they don't are spotted. Analysis have been done with one campaign for Sweden and Scandinavia. The methods that have been used are logistic regression with Lasso and logistic regression with Penalized Net Information Value. The best method for predicting purchases is Lasso regression when comparing with a confusion matrix. The variable that best describes persuadable customers in logistic regression with PNIV is Flown (customers that have own with SAS within the last six months). In Lassoregression the variable that describes a persuadable customer in Sweden is membership level1 (the rst level of membership) and in Scandinavia customers that receive campaigns with delivery code 13 are persuadable, which is a form of dispatch.

    Download full text (pdf)
    fulltext
  • 14.
    Andersson Hagiwara, Magnus
    et al.
    University of Borås, Sweden.
    Andersson Gare, Boel
    Jönköping University, Sweden.
    Elg, Mattias
    Linköping University, Department of Management and Engineering, Logistics & Quality Management. Linköping University, Faculty of Science & Engineering. Linköping University, HELIX Vinn Excellence Centre.
    Interrupted Time Series Versus Statistical Process Control in Quality Improvement Projects2016In: Journal of Nursing Care Quality, ISSN 1057-3631, E-ISSN 1550-5065, Vol. 31, no 1, p. E1-E8Article in journal (Refereed)
    Abstract [en]

    To measure the effect of quality improvement interventions, it is appropriate to use analysis methods that measure data over time. Examples of such methods include statistical process control analysis and interrupted time series with segmented regression analysis. This article compares the use of statistical process control analysis and interrupted time series with segmented regression analysis for evaluating the longitudinal effects of quality improvement interventions, using an example study on an evaluation of a computerized decision support system.

  • 15.
    Andersson, Henrik
    et al.
    Linköping University, Department of Mathematics, Applied Mathematics. Linköping University, Faculty of Science & Engineering.
    Bakke Cato, Robin
    Linköping University, Department of Mathematics, Applied Mathematics. Linköping University, Faculty of Science & Engineering.
    Stokastisk modellering och prognosticering inom livförsäkring: En dödlighetsundersökning på Länsförsäkringar Livs bestånd2023Independent thesis Advanced level (degree of Master (Two Years)), 28 HE creditsStudent thesis
    Abstract [en]

    Studies of life expectancy and death probabilities are crucial for life insurance. Payments for life insurance are completely dependent on whether an individual is alive or not, or is in various health conditions. In order to be able to price premiums correctly and set aside reserves, it is therefore of great importance to model life expectancy in the most accurate way possible. The insurance industry today uses historically proven well-functioning models that go as far back in time as 200 years. There are models even further back in time, but the models used today are mainly Gompertz (1826), Makeham (1860) and Lee-Carter (1992). Although these models perform well, it is always necessary to investigate whether there may be alternative models that model mortality better.

    In this thesis, affine short-term interest rate models are applied for modeling the force of mortality that forms the basis for most interesting actuarial variables. As these models introduce stochastic mortality, the uncertainty and dependence over time can thus be described. The three short-term interest rate models examined in this project, which are common in financial theory; are Ornstein-Uhlenbeck, Feller and Hull-White. These models are then compared against each other in terms of the modeled force of mortality as well as the expected remaining life expectancy and the one-year probability of death. One aspect of stochastic mortality modeling that is not found in the existing literature but which is examined in this thesis is the modeling of mortality over time as this is one of the most important aspects in the life insurance mathematical industry. Finally, for validation purposes, all short-term interest rate models are evaluated using back-testing. The second main part of the work consists of generating results for the same quantities as above based on the DUS method in order to compare a commercial method with more theoretical and less approved ones.

    The results show a great potential in several of the short-term interest rate models versus DUS both in terms of modeling over ages and calendar years. However, the results are not completely impeccable for individual calendar years where large spikes occur due to inaccurate parameter calibration. The satisfactory modeling of the short-term interest rate models over time was above the expectations as the models are not designed to capture decreasing trends. This is something that can be considered a great flexibility of the short-term interest rate models as they are more or less as accurate as the Lee-Carter model used in DUS, both in terms of age and time modeling of mortality.

    Download full text (pdf)
    Stokastisk modellering och prognosticering inom livförsäkring - Robin Bakke Cato - Henrik Andersson
  • 16.
    Andersson, Kasper
    Linköping University, Department of Mathematics, Mathematical Statistics . Linköping University, Faculty of Science & Engineering.
    A Review of Gaussian Random Matrices2020Independent thesis Basic level (degree of Bachelor), 14 HE creditsStudent thesis
    Abstract [en]

    While many university students get introduced to the concept of statistics early in their education, random matrix theory (RMT) usually first arises (if at all) in graduate level classes. This thesis serves as a friendly introduction to RMT, which is the study of matrices with entries following some probability distribution. Fundamental results, such as Gaussian and Wishart ensembles, are introduced and a discussion of how their corresponding eigenvalues are distributed is presented. Two well-studied applications, namely neural networks and PCA, are discussed where we present how RMT can be applied

    Download full text (pdf)
    fulltext
  • 17.
    Andersson, Kasper
    Linköping University, Department of Mathematics, Applied Mathematics. Linköping University, Faculty of Science & Engineering.
    Classification of Repeated Measurement Data Using Growth Curves and Neural Networks2022Independent thesis Advanced level (degree of Master (Two Years)), 28 HE creditsStudent thesis
    Abstract [en]

    This thesis focuses on statistical and machine learning methods designed for sequential and repeated measurement data. We start off by considering the classic general linear model (MANOVA) followed by its generalization, the growth curve model (GMANOVA), designed for analysis of repeated measurement data. By considering a binary classification problem of normal data together with the corresponding maximum likelihood estimators for the growth curve model, we demonstrate how a classification rule based on linear discriminant analysis can be derived which can be used for repeated measurement data in a meaningful way.

    We proceed to the topics of neural networks which serve as our second method of classification. The reader is introduced to classic neural networks and relevant subtopics are discussed. We present a generalization of the classic neural network model to the recurrent neural network model and the LSTM model which are designed for sequential data.

    Lastly, we present three types of data sets with an total of eight cases where the discussed classification methods are tested.

    Download full text (pdf)
    fulltext
  • 18.
    Andersson Naesseth, Christian
    Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, The Institute of Technology.
    Nowcasting using Microblog Data2012Independent thesis Basic level (degree of Bachelor), 10,5 credits / 16 HE creditsStudent thesis
    Abstract [en]

    The explosion of information and user generated content made publicly available through the internet has made it possible to develop new ways of inferring interesting phenomena automatically. Some interesting examples are the spread of a contagious disease, earth quake occurrences, rainfall rates, box office results, stock market fluctuations and many many more. To this end a mathematical framework, based on theory from machine learning, has been employed to show how frequencies of relevant keywords in user generated content can estimate daily rainfall rates of different regions in Sweden using microblog data.

    Microblog data are collected using a microblog crawler. Properties of the data and data collection methods are both discussed extensively. In this thesis three different model types are studied for regression, linear and nonlinear parametric models as well as a nonparametric Gaussian process model. Using cross-validation and optimization the relevant parameters of each model are estimated and the model is evaluated on independent test data. All three models show promising results for nowcasting rainfall rates.

    Download full text (pdf)
    fulltext
  • 19.
    Andersson Naesseth, Christian
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.
    Lindsten, Fredrik
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.
    Schon, Thomas B.
    Uppsala Univ, Sweden.
    High-Dimensional Filtering Using Nested Sequential Monte Carlo2019In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 67, no 16, p. 4177-4188Article in journal (Refereed)
    Abstract [en]

    Sequential Monte Carlo (SMC) methods comprise one of the most successful approaches to approximate Bayesian filtering. However, SMC without a good proposal distribution can perform poorly, in particular in high dimensions. We propose nested sequential Monte Carlo, a methodology that generalizes the SMC framework by requiring only approximate, properly weighted, samples from the SMC proposal distribution, while still resulting in a correctSMCalgorithm. This way, we can compute an "exact approximation" of, e. g., the locally optimal proposal, and extend the class of models forwhichwe can perform efficient inference using SMC. We showimproved accuracy over other state-of-the-art methods on several spatio-temporal state-space models.

    Download full text (pdf)
    fulltext
  • 20.
    Andersson Naesseth, Christian
    et al.
    Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, The Institute of Technology.
    Lindsten, Fredrik
    Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, The Institute of Technology.
    Schön, Thomas
    Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, The Institute of Technology.
    Capacity estimation of two-dimensional channels using Sequential Monte Carlo2014In: 2014 IEEE Information Theory Workshop, 2014, p. 431-435Conference paper (Refereed)
    Abstract [en]

    We derive a new Sequential-Monte-Carlo-based algorithm to estimate the capacity of two-dimensional channel models. The focus is on computing the noiseless capacity of the 2-D (1, ∞) run-length limited constrained channel, but the underlying idea is generally applicable. The proposed algorithm is profiled against a state-of-the-art method, yielding more than an order of magnitude improvement in estimation accuracy for a given computation time.

    Download full text (pdf)
    fulltext
  • 21.
    Andersson Naesseth, Christian
    et al.
    Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, Faculty of Science & Engineering.
    Lindsten, Fredrik
    The University of Cambridge, Cambridge, United Kingdom.
    Schön, Thomas
    Uppsala University, Uppsala, Sweden.
    Nested Sequential Monte Carlo Methods2015In: Proceedings of The 32nd International Conference on Machine Learning / [ed] Francis Bach, David Blei, Journal of Machine Learning Research (Online) , 2015, Vol. 37, p. 1292-1301Conference paper (Refereed)
    Abstract [en]

    We propose nested sequential Monte Carlo (NSMC), a methodology to sample from sequences of probability distributions, even where the random variables are high-dimensional. NSMC generalises the SMC framework by requiring only approximate, properly weighted, samples from the SMC proposal distribution, while still resulting in a correct SMC algorithm. Furthermore, NSMC can in itself be used to produce such properly weighted samples. Consequently, one NSMC sampler can be used to construct an efficient high-dimensional proposal distribution for another NSMC sampler, and this nesting of the algorithm can be done to an arbitrary degree. This allows us to consider complex and high-dimensional models using SMC. We show results that motivate the efficacy of our approach on several filtering problems with dimensions in the order of 100 to 1 000.

    Download full text (pdf)
    fulltext
  • 22.
    Andersson Naesseth, Christian
    et al.
    Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, The Institute of Technology.
    Lindsten, Fredrik
    University of Cambridge, Cambridge, UK.
    Schön, Thomas
    Uppsala University, Uppsala, Sweden.
    Sequential Monte Carlo for Graphical Models2014In: Advances in Neural Information Processing Systems, 2014, p. 1862-1870Conference paper (Refereed)
    Abstract [en]

    We propose a new framework for how to use sequential Monte Carlo (SMC) algorithms for inference in probabilistic graphical models (PGM). Via a sequential decomposition of the PGM we find a sequence of auxiliary distributions defined on a monotonically increasing sequence of probability spaces. By targeting these auxiliary distributions using SMC we are able to approximate the full joint distribution defined by the PGM. One of the key merits of the SMC sampler is that it provides an unbiased estimate of the partition function of the model. We also show how it can be used within a particle Markov chain Monte Carlo framework in order to construct high-dimensional block-sampling algorithms for general PGMs.

    Download full text (pdf)
    fulltext
  • 23.
    Andersson, Niklas
    et al.
    Linköping University, Department of Computer and Information Science, Statistics.
    Hansson, Josef
    Linköping University, Department of Computer and Information Science, Statistics.
    Metodik för detektering av vägåtgärder via tillståndsdata2010Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The Swedish Transport Administration has, and manages, a database containing information of the status of road condition on all paved and governmental operated Swedish roads. The purpose of the database is to support the Pavement Management System (PMS). The PMS is used to identify sections of roads where there is a need for treatment, how to allocate resources and to get a general picture of the state of the road network condition. All major treatments should be reported which has not always been done.

    The road condition is measured using a number of indicators on e.g. the roads unevenness. Rut depth is an indicator of the roads transverse unevenness. When a treatment has been done the condition drastically changes, which is also reflected by these indicators.

    The purpose of this master thesis is to; by using existing indicators make predictions to find points in time when a road has been treated.

    We have created a SAS-program based on simple linear regression to analyze rut depth changes over time. The function of the program is to find levels changes in the rut depth trend. A drastic negative change means that a treatment has been made.

    The proportion of roads with an alleged date for the latest treatment earlier than the programs latest detected date was 37 percent. It turned out that there are differences in the proportions of possible treatments found by the software and actually reported roads between different regions. The regions North and Central have the highest proportion of differences. There are also differences between the road groups with various amount of traffic. The differences between the regions do not depend entirely on the fact that the proportion of heavily trafficked roads is greater for some regions.

    Download full text (pdf)
    FULLTEXT01
  • 24.
    Ansell, Ricky
    et al.
    Linköping University, Department of Physics, Chemistry and Biology, Biology. Linköping University, Faculty of Science & Engineering. Polismyndigheten - Nationellt Forensiskt Centrum.
    Nordgaard, Anders
    Linköping University, Department of Computer and Information Science, Statistics. Linköping University, Faculty of Arts and Sciences. Polismyndigheten - Nationellt Forensiskt Centrum.
    Hedell, Ronny
    Polismyndigheten - Nationellt Forensiskt Centrum.
    Interpretation of DNA Evidence: Implications of Thresholds Used in the Forensic Laboratory2014Conference paper (Other academic)
    Abstract [en]

    Evaluation of forensic evidence is a process lined with decisions and balancing, not infrequently with a substantial deal of subjectivity. Already at the crime scene a lot of decisions have to be made about search strategies, the amount of evidence and traces recovered, later prioritised and sent further to the forensic laboratory etc. Within the laboratory there must be several criteria (often in terms of numbers) on how much and what parts of the material should be analysed. In addition there is often a restricted timeframe for delivery of a statement to the commissioner, which in reality might influence on the work done. The path of DNA evidence from the recovery of a trace at the crime scene to the interpretation and evaluation made in court involves several decisions based on cut-offs of different kinds. These include quality assurance thresholds like limits of detection and quantitation, but also less strictly defined thresholds like upper limits on prevalence of alleles not observed in DNA databases. In a verbal scale of conclusions there are lower limits on likelihood ratios for DNA evidence above which the evidence can be said to strongly support, very strongly support, etc. a proposition about the source of the evidence. Such thresholds may be arbitrarily chosen or based on logical reasoning with probabilities. However, likelihood ratios for DNA evidence depend strongly on the population of potential donors, and this may not be understood among the end-users of such a verbal scale. Even apparently strong DNA evidence against a suspect may be reported on each side of a threshold in the scale depending on whether a close relative is part of the donor population or not. In this presentation we review the use of thresholds and cut-offs in DNA analysis and interpretation and investigate the sensitivity of the final evaluation to how such rules are defined. In particular we show what are the effects of cut-offs when multiple propositions about alternative sources of a trace cannot be avoided, e.g. when there are close relatives to the suspect with high propensities to have left the trace. Moreover, we discuss the possibility of including costs (in terms of time or money) for a decision-theoretic approach in which expected values of information could be analysed.

  • 25.
    Argillander, Joakim
    et al.
    Linköping University, Department of Electrical Engineering, Information Coding. Linköping University, Faculty of Science & Engineering.
    Alarcon, Alvaro
    Linköping University, Department of Electrical Engineering, Information Coding. Linköping University, Faculty of Science & Engineering.
    Xavier, Guilherme B.
    Linköping University, Department of Electrical Engineering, Information Coding. Linköping University, Faculty of Science & Engineering.
    A tunable quantum random number generator based on a fiber-optical Sagnac interferometer2022In: Journal of Optics, ISSN 2040-8978, E-ISSN 2040-8986, Vol. 24, no 6, article id 064010Article in journal (Refereed)
    Abstract [en]

    Quantum random number generators (QRNGs) are based on naturally random measurementresults performed on individual quantum systems. Here, we demonstrate a branching-pathphotonic QRNG implemented using a Sagnac interferometer with a tunable splitting ratio. Thefine-tuning of the splitting ratio allows us to maximize the entropy of the generated sequence ofrandom numbers and effectively compensate for tolerances in the components. By producingsingle-photons from attenuated telecom laser pulses, and employing commercially-availablecomponents we are able to generate a sequence of more than 2 gigabytes of random numberswith an average entropy of 7.99 bits/byte directly from the raw measured data. Furthermore, oursequence passes randomness tests from both the NIST and Dieharder statistical test suites, thuscertifying its randomness. Our scheme shows an alternative design of QRNGs based on thedynamic adjustment of the uniformity of the produced random sequence, which is relevant forthe construction of modern generators that rely on independent real-time testing of itsperformance.

    Download full text (pdf)
    fulltext
  • 26.
    Arvid, Odencrants
    et al.
    Linköping University, Department of Computer and Information Science, Statistics. Linköping University, The Institute of Technology.
    Dennis, Dahl
    Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.
    Utvärdering av Transportstyrelsens flygtrafiksmodeller2014Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    The Swedish Transport Agency has for a long time collected data on a monthly basis for different variables that are used to make predictions, short projections as well as longer projections. They have used SAS for producing statistical models in air transport. The model with the largest value of coefficient of determination is the method that has been used for a long time. The Swedish Transport Agency felt it was time for an evaluation of their models and methods of how projections is estimated, they would also explore the possibilities to use different, completely new models for forecasting air travel. This Bachelor thesis examines how the Holt-Winters method does compare with SARIMA, error terms such as RMSE, MAPE, R2, AIC and BIC  will be compared between the methods. 

    The results which have been produced showing that there may be a risk that the Holt-Winters models adepts a bit too well in a few variables in which Holt-Winters method has been adapted. But overall the Holt-Winters method generates better forecasts .

    Download full text (pdf)
    UtvarderingavTransportstyrelsensflygtrafiksmodeller
  • 27.
    Asokan, Mowniesh
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning.
    A study of forecasts in Financial Time Series using Machine Learning methods2022Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Forecasting financial time series is one of the most challenging problems in economics and business. Markets are highly complex due to non-linear factors in data and uncertainty. It moves up and down without any pattern. Based on historical univariate close prices from the S\&P 500, SSE, and FTSE 100 indexes, this thesis forecasts future values using two different approaches: one using a classical method, a Seasonal ARIMA model, and a hybrid ARIMA-GARCH model, while the other uses an LSTM neural network. Each method is used to perform at different forecast horizons. Experimental results have proven that the LSTM and Hybrid ARIMA-GARCH model performs better than the SARIMA model. To measure the model performance we used the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE).

    Download full text (pdf)
    A study of forecasts in Financial Time Series using Machine Learning methods
  • 28.
    Bacharach, Lucien
    et al.
    Univ Toulouse Isae Supaero, France.
    Fritsche, Carsten
    Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, Faculty of Science & Engineering.
    Orguner, Umut
    Middle East Tech Univ, Turkey.
    Chaumette, Eric
    Univ Toulouse Isae Supaero, France.
    A TIGHTER BAYESIAN CRAMER-RAO BOUND2019In: 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE , 2019, p. 5277-5281Conference paper (Refereed)
    Abstract [en]

    It has been shown lately that any "standard" Bayesian lower bound (BLB) on the mean squared error (MSE) of the Weiss-Weinstein family (WWF) admits a "tighter" form which upper bounds the "standard" form. Applied to the Bayesian Cramer-Rao bound (BCRB), this result suggests to redefine the concept of efficient estimator relatively to the tighter form of the BCRB, an update supported by a noteworthy example. This paper lays the foundation to revisit some Bayesian estimation problems where the BCRB is not tight in the asymptotic region.

  • 29.
    Barakat, Arian
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning.
    What makes an (audio)book popular?2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Audiobook reading has traditionally been used for educational purposes but has in recent times grown into a popular alternative to the more traditional means of consuming literature. In order to differentiate themselves from other players in the market, but also provide their users enjoyable literature, several audiobook companies have lately directed their efforts on producing own content. Creating highly rated content is, however, no easy task and one reoccurring challenge is how to make a bestselling story. In an attempt to identify latent features shared by successful audiobooks and evaluate proposed methods for literary quantification, this thesis employs an array of frameworks from the field of Statistics, Machine Learning and Natural Language Processing on data and literature provided by Storytel - Sweden’s largest audiobook company.

    We analyze and identify important features from a collection of 3077 Swedish books concerning their promotional and literary success. By considering features from the aspects Metadata, Theme, Plot, Style and Readability, we found that popular books are typically published as a book series, cover 1-3 central topics, write about, e.g., daughter-mother relationships and human closeness but that they also hold, on average, a higher proportion of verbs and a lower degree of short words. Despite successfully identifying these, but also other factors, we recognized that none of our models predicted “bestseller” adequately and that future work may desire to study additional factors, employ other models or even use different metrics to define and measure popularity.

    From our evaluation of the literary quantification methods, namely topic modeling and narrative approximation, we found that these methods are, in general, suitable for Swedish texts but that they require further improvement and experimentation to be successfully deployed for Swedish literature. For topic modeling, we recognized that the sole use of nouns provided more interpretable topics and that the inclusion of character names tended to pollute the topics. We also identified and discussed the possible problem of word inflections when modeling topics for more morphologically complex languages, and that additional preprocessing treatments such as word lemmatization or post-training text normalization may improve the quality and interpretability of topics. For the narrative approximation, we discovered that the method currently suffers from three shortcomings: (1) unreliable sentence segmentation, (2) unsatisfactory dictionary-based sentiment analysis and (3) the possible loss of sentiment information induced by translations. Despite only examining a handful of literary work, we further found that books written initially in Swedish had narratives that were more cross-language consistent compared to books written in English and then translated to Swedish.

    Download full text (pdf)
    what_makes_an_audiobook_popular
  • 30. Order onlineBuy this publication >>
    Barkhagen, Mathias
    Linköping University, Department of Management and Engineering, Production Economics. Linköping University, The Institute of Technology.
    Risk-Neutral and Physical Estimation of Equity Market Volatility2013Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    The overall purpose of the PhD project is to develop a framework for making optimal decisions on the equity derivatives markets. Making optimal decisions refers e.g. to how to optimally hedge an options portfolio or how to make optimal investments on the equity derivatives markets. The framework for making optimal decisions will be based on stochastic programming (SP) models, which means that it is necessary to generate high-quality scenarios of market prices at some future date as input to the models. This leads to a situation where the traditional methods, described in the literature, for modeling market prices do not provide scenarios of sufficiently high quality as input to the SP model. Thus, the main focus of this thesis is to develop methods that improve the estimation of option implied surfaces from a cross-section of observed option prices compared to the traditional methods described in the literature. The estimation is complicated by the fact that observed option prices contain a lot of noise and possibly also arbitrage. This means that in order to be able to estimate option implied surfaces which are free of arbitrage and of high quality, the noise in the input data has to be adequately handled by the estimation method.

    The first two papers of this thesis develop a non-parametric optimization based framework for the estimation of high-quality arbitrage-free option implied surfaces. The first paper covers the estimation of the risk-neutral density (RND) surface and the second paper the local volatility surface. Both methods provide smooth and realistic surfaces for market data. Estimation of the RND is a convex optimization problem, but the result is sensitive to the parameter choice. When the local volatility is estimated the parameter choice is much easier but the optimization problem is non-convex, even though the algorithm does not seem to get stuck in local optima. The SP models used to make optimal decisions on the equity derivatives markets also need generated scenarios for the underlying stock prices or index levels as input. The third paper of this thesis deals with the estimation and evaluation of existing equity market models. The third paper gives preliminary results which show that, out of the compared models, a GARCH(1,1) model with Poisson jumps provides a better fit compared to more complex models with stochastic volatility for the Swedish OMXS30 index.

    List of papers
    1. Non-parametric estimation of the option implied risk-neutral density surface
    Open this publication in new window or tab >>Non-parametric estimation of the option implied risk-neutral density surface
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    Accurate pricing of exotic or illiquid derivatives which is consistent with noisy market prices presents a major challenge. The pricing accuracy will crucially depend on using arbitrage free inputs to the pricing engine. This paper develops a general optimization based framework for estimation of the option implied risk-neutral density (RND), while satisfying no-arbitrage constraints. Our developed framework is a generalization of the RNDs implied by existing parametric models such as the Heston model. Thus, the method considers all types of realistic surfaces and is hence not constrained to a certain function class. When solving the problem the RND is discretized making it possible to use general purpose optimization algorithms. The approach leads to an optimization model where it is possible to formulate the constraints as linear constraints making the resulting optimization problem convex. We show that our method produces smooth local volatility surfaces that can be used for pricing and hedging of exotic derivatives. By perturbing input data with random errors we demonstrate that our method gives better results than the Heston model in terms of yielding stable RNDs.

    Keywords
    Risk-neutral density surface, Non-parametric estimation, Optimization, No-arbitrage constraints, Implied volatility surface, Local volatility
    National Category
    Economics and Business Probability Theory and Statistics
    Identifiers
    urn:nbn:se:liu:diva-94357 (URN)
    Available from: 2013-06-25 Created: 2013-06-25 Last updated: 2023-12-28Bibliographically approved
    2. Non-parametric estimation of local variance surfaces
    Open this publication in new window or tab >>Non-parametric estimation of local variance surfaces
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    In this paper we develop a general optimization based framework for estimation of the option implied local variance surface. Given a specific level of consistency with observed market prices there exist an infinite number of possible surfaces. Instead of assuming shape constraints for the surface, as in many traditional methods, we seek the solution in the subset of realistic surfaces. We select local volatilities as variables in the optimization problem since it makes it easy to ensure absence of arbitrage, and realistic local volatilities imply realistic risk-neutral density- (RND), implied volatility- and price surfaces. The objective function combines a measure of consistency with market prices, and a weighted integral of the squared second derivatives of local volatility in the strike and the time-to-maturity direction. Derivatives prices in the optimization model are calculated efficiently with a finite difference scheme on a non-uniform grid. The framework has previously been successfully applied to the estimation of RND surfaces. Compared to when modeling the RND, it is for local volatility much easier to choose the parameters in the model. Modeling the RND produces a convex optimization problem which is not the case when modeling local volatility, but empirical tests indicate that the solution does not get stuck in local optima. We show that our method produces local volatility surfaces with very high quality and which are consistent with observed option quotes. Thus, unlike many methods described in the literature, our method does not produce a local volatility surface with irregular shape and many spikes or a non-smooth and multimodal RND for input data with a lot of noise.

    Keywords
    Local volatility surface; Non-parametric estimation; Optimization; No-arbitrage conditions
    National Category
    Economics and Business Probability Theory and Statistics
    Identifiers
    urn:nbn:se:liu:diva-94358 (URN)
    Available from: 2013-06-25 Created: 2013-06-25 Last updated: 2023-12-28Bibliographically approved
    3. Statistical tests for selected equity market models
    Open this publication in new window or tab >>Statistical tests for selected equity market models
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    In this paper we evaluate which of four candidate equity market models that provide the best fit to observed closing data for the OMXS30 index from 30 September 1986 to 6 May 2013. The candidate models are two GARCH type models and two stochastic volatility models. The stochastic volatility models are estimated with the help of Markov Chain Monte Carlo methods. We provide the full derivations of the posterior distributions for the two stochastic volatility models, which to our knowledge have not been provided in the literature before. With the help of statistical tests we conclude that, out of the four candidate models, a GARCH model which includes jumps in the index level provides the best fit to the observed OMXS30 closing data.

    Keywords
    GARCH models, stochastic volatility models, Markov Chain Monte Carlo methods, statistical tests
    National Category
    Economics and Business Probability Theory and Statistics
    Identifiers
    urn:nbn:se:liu:diva-94359 (URN)
    Available from: 2013-06-25 Created: 2013-06-25 Last updated: 2013-06-26Bibliographically approved
    Download full text (pdf)
    Risk-Neutral and Physical Estimation of Equity Market Volatility
    Download (pdf)
    omslag
  • 31.
    Barkhagen, Mathias
    Linköping University, Department of Management and Engineering, Production Economics. Linköping University, The Institute of Technology.
    Statistical tests for selected equity market modelsManuscript (preprint) (Other academic)
    Abstract [en]

    In this paper we evaluate which of four candidate equity market models that provide the best fit to observed closing data for the OMXS30 index from 30 September 1986 to 6 May 2013. The candidate models are two GARCH type models and two stochastic volatility models. The stochastic volatility models are estimated with the help of Markov Chain Monte Carlo methods. We provide the full derivations of the posterior distributions for the two stochastic volatility models, which to our knowledge have not been provided in the literature before. With the help of statistical tests we conclude that, out of the four candidate models, a GARCH model which includes jumps in the index level provides the best fit to the observed OMXS30 closing data.

  • 32.
    Barkhagen, Mathias
    et al.
    Linköping University, Department of Management and Engineering, Production Economics. Linköping University, Faculty of Science & Engineering.
    Blomvall, Jörgen
    Linköping University, Department of Management and Engineering, Production Economics. Linköping University, Faculty of Science & Engineering.
    Modeling and evaluation of the option book hedging problem using stochastic programming2016In: Quantitative finance (Print), ISSN 1469-7688, E-ISSN 1469-7696, Vol. 16, no 2, p. 259-273Article in journal (Refereed)
    Abstract [en]

    Hedging of an option book in an incomplete market with transaction costs is an important problem in finance that many banks have to solve on a daily basis. In this paper, we develop a stochastic programming (SP) model for the hedging problem in a realistic setting, where all transactions take place at observed bid and ask prices. The SP model relies on a realistic modeling of the important risk factors for the application, the price of the underlying security and the volatility surface. The volatility surface is unobservable and must be estimated from a cross section of observed option quotes that contain noise and possibly arbitrage. In order to produce arbitrage-free volatility surfaces of high quality as input to the SP model, a novel non-parametric estimation method is used. The dimension of the volatility surface is infinite and in order to be able solve the problem numerically, we use discretization and principal component analysis to reduce the dimensions of the problem. Testing the model out-of-sample for options on the Swedish OMXS30 index, we show that the SP model is able to produce a hedge that has both a lower realized risk and cost compared with dynamic delta and delta-vega hedging strategies.

  • 33.
    Barkhagen, Mathias
    et al.
    Linköping University, Department of Management and Engineering, Production Economics. Linköping University, The Institute of Technology.
    Blomvall, Jörgen
    Linköping University, Department of Management and Engineering, Production Economics. Linköping University, The Institute of Technology.
    Non-parametric estimation of local variance surfacesManuscript (preprint) (Other academic)
    Abstract [en]

    In this paper we develop a general optimization based framework for estimation of the option implied local variance surface. Given a specific level of consistency with observed market prices there exist an infinite number of possible surfaces. Instead of assuming shape constraints for the surface, as in many traditional methods, we seek the solution in the subset of realistic surfaces. We select local volatilities as variables in the optimization problem since it makes it easy to ensure absence of arbitrage, and realistic local volatilities imply realistic risk-neutral density- (RND), implied volatility- and price surfaces. The objective function combines a measure of consistency with market prices, and a weighted integral of the squared second derivatives of local volatility in the strike and the time-to-maturity direction. Derivatives prices in the optimization model are calculated efficiently with a finite difference scheme on a non-uniform grid. The framework has previously been successfully applied to the estimation of RND surfaces. Compared to when modeling the RND, it is for local volatility much easier to choose the parameters in the model. Modeling the RND produces a convex optimization problem which is not the case when modeling local volatility, but empirical tests indicate that the solution does not get stuck in local optima. We show that our method produces local volatility surfaces with very high quality and which are consistent with observed option quotes. Thus, unlike many methods described in the literature, our method does not produce a local volatility surface with irregular shape and many spikes or a non-smooth and multimodal RND for input data with a lot of noise.

  • 34.
    Barkhagen, Mathias
    et al.
    Linköping University, Department of Management and Engineering, Production Economics. Linköping University, The Institute of Technology.
    Blomvall, Jörgen
    Linköping University, Department of Management and Engineering, Production Economics. Linköping University, The Institute of Technology.
    Non-parametric estimation of the option implied risk-neutral density surfaceManuscript (preprint) (Other academic)
    Abstract [en]

    Accurate pricing of exotic or illiquid derivatives which is consistent with noisy market prices presents a major challenge. The pricing accuracy will crucially depend on using arbitrage free inputs to the pricing engine. This paper develops a general optimization based framework for estimation of the option implied risk-neutral density (RND), while satisfying no-arbitrage constraints. Our developed framework is a generalization of the RNDs implied by existing parametric models such as the Heston model. Thus, the method considers all types of realistic surfaces and is hence not constrained to a certain function class. When solving the problem the RND is discretized making it possible to use general purpose optimization algorithms. The approach leads to an optimization model where it is possible to formulate the constraints as linear constraints making the resulting optimization problem convex. We show that our method produces smooth local volatility surfaces that can be used for pricing and hedging of exotic derivatives. By perturbing input data with random errors we demonstrate that our method gives better results than the Heston model in terms of yielding stable RNDs.

  • 35.
    Bartoszek, Krzysztof
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    A Central Limit Theorem for punctuated equilibrium2020In: Stochastic Models, ISSN 1532-6349, E-ISSN 1532-4214, Vol. 36, no 3, p. 473-517Article in journal (Refereed)
    Abstract [en]

    Current evolutionary biology models usually assume that a phenotype undergoes gradual change. This is in stark contrast to biological intuition, which indicates that change can also be punctuated-the phenotype can jump. Such a jump could especially occur at speciation, i.e., dramatic change occurs that drives the species apart. Here we derive a Central Limit Theorem for punctuated equilibrium. We show that, if adaptation is fast, for weak convergence to normality to hold, the variability in the occurrence of change has to disappear with time.

    Download full text (pdf)
    fulltext
  • 36.
    Bartoszek, Krzysztof
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Exact and approximate limit behaviour of the Yule trees cophenetic index2018In: Mathematical Biosciences, ISSN 0025-5564, E-ISSN 1879-3134, Vol. 303, p. 26-45Article in journal (Refereed)
    Abstract [en]

    In this work we study the limit distribution of an appropriately normalized cophenetic index of the pure-birth tree conditioned on n contemporary tips. We show that this normalized phylogenetic balance index is a sub-martingale that converges almost surely and in L-2. We link our work with studies on trees without branch lengths and show that in this case the limit distribution is a contraction-type distribution, similar to the Quicksort limit distribution. In the continuous branch case we suggest approximations to the limit distribution. We propose heuristic methods of simulating from these distributions and it may be observed that these algorithms result in reasonable tails. Therefore, we propose a way based on the quantiles of the derived distributions for hypothesis testing, whether an observed phylogenetic tree is consistent with the pure-birth process. Simulating a sample by the proposed heuristics is rapid, while exact simulation (simulating the tree and then calculating the index) is a time-consuming procedure. We conduct a power study to investigate how well the cophenetic indices detect deviations from the Yule tree and apply the methodology to empirical phylogenies.

    Download full text (pdf)
    fulltext
  • 37.
    Bartoszek, Krzysztof
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Limit distribution of the quartet balance index for Aldous’s $(\beta \ge 0)$-model2020In: Applicationes Mathematicae, ISSN 1233-7234, E-ISSN 1730-6280, Vol. 6, p. 29-44Article in journal (Refereed)
    Abstract [en]

    This paper builds on T. Martínez-Coronado, A. Mir, F. Rosselló and G. Valiente’s 2018 work, introducing a new balance index for trees. We show that this balance index, in the case of Aldous’s $(\beta \ge 0)$-model, converges weakly to a distribution that can be characterized as the fixed point of a contraction operator on a class of distributions.

  • 38.
    Bartoszek, Krzysztof
    Department of Mathematics, Uppsala University, Uppsala, Sweden.
    Phylogenetic effective sample size2016In: Journal of Theoretical Biology, ISSN 0022-5193, E-ISSN 1095-8541, Vol. 407, p. 371-386Article in journal (Refereed)
    Abstract [en]

    In this paper I address the question—how large is a phylogenetic sample? I propose a definition of a phylogenetic effective sample size for Brownian motion and Ornstein-Uhlenbeck processes-the regression effective sample size. I discuss how mutual information can be used to define an effective sample size in the non-normal process case and compare these two definitions to an already present concept of effective sample size (the mean effective sample size). Through a simulation study I find that the AICc is robust if one corrects for the number of species or effective number of species. Lastly I discuss how the concept of the phylogenetic effective sample size can be useful for biodiversity quantification, identification of interesting clades and deciding on the importance of phylogenetic correlations.

  • 39.
    Bartoszek, Krzysztof
    Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg, Gothenburg, Sweden.
    Quantifying the effects of anagenetic and cladogenetic evolution2014In: Mathematical Biosciences, ISSN 0025-5564, E-ISSN 1879-3134, Vol. 254, p. 42-57Article in journal (Refereed)
    Abstract [en]

    An ongoing debate in evolutionary biology is whether phenotypic change occurs predominantly around the time of speciation or whether it instead accumulates gradually over time. In this work I propose a general framework incorporating both types of change, quantify the effects of speciational change via the correlation between species and attribute the proportion of change to each type. I discuss results of parameter estimation of Hominoid body size in this light. I derive mathematical formulae related to this problem, the probability generating functions of the number of speciation events along a randomly drawn lineage and from the most recent common ancestor of two randomly chosen tip species for a conditioned Yule tree. Additionally I obtain in closed form the variance of the distance from the root to the most recent common ancestor of two randomly chosen tip species.

  • 40.
    Bartoszek, Krzysztof
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Revisiting the Nowosiółka skull with RMaCzek2023In: Mathematica Applicanda, ISSN 1730-2668, Vol. 50, no 2, p. 255-266Article in journal (Refereed)
    Abstract [en]

    One of the first fully quantitative distance matrix visualization methods was proposed by Jan Czekanowski at the beginning of the previous century. Recently, a software package, RMaCzek, was made available that allows for producing such diagrams in R. Here we reanalyze the original data that Czekanowski used for introducing his method, and in the accompanying code show how the user can specify their own custom distance functions in the package.

  • 41.
    Bartoszek, Krzysztof
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Simulating an infinite mean waiting time2019In: Mathematica Applicanda, ISSN 1730-2668, Vol. 47, no 1, p. 93-102Article in journal (Refereed)
    Abstract [en]

    We consider a hybrid method to simulate the return time to the initial state in a critical-case birth-death process. The expected value of this return time is infinite, but its distribution asymptotically follows a power-law. Hence, the simulation approach is to directly simulate the process, unless the simulated time exceeds some threshold and if it does, draw the return time from the tail of the power law.

    Download full text (pdf)
    Simulating an infinite mean waiting time
  • 42.
    Bartoszek, Krzysztof
    Gdansk University of Technology, Poland.
    The Bootstrap and Other Methods of Testing Phylogenetic Trees2007In: Zeszyty Naukowe Wydzialu ETI Politechniki Gdanskiej, 2007, p. 103-108Conference paper (Refereed)
    Abstract [en]

    The final step of a phylogenetic analysis is the test of the generated tree. This is not a easy task for which there is an obvious methodology because we do not know the full probabilistic model of evolution. A number of methods have been proposed but there is a wide debate concerning the interpretations of the results they produce.

  • 43.
    Bartoszek, Krzysztof
    Department of Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg, Göteborg Sweden.
    The Laplace Motion in Phylogenetic Comparative Methods2012In: Proceedings of the 18th National Conference on Applications of Mathematics in Biology and Medicine, 2012, p. 25-30Conference paper (Refereed)
    Abstract [en]

    The majority of current phylogenetic comparative methods assume that the stochastic evolutionaryprocess is homogeneous over the phylogeny or offer relaxations of this in rather limited and usually parameter expensive ways. Here we make a preliminary investigation, bymeans of a numerical experiment, whether the Laplace motion process can offer an alternative approach.

  • 44.
    Bartoszek, Krzysztof
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Trait evolution with jumps: illusionary normality2017In: Proceedings of the XXIII National Conference on Applications of Mathematics in Biology and Medicine, 2017, p. 23-28Conference paper (Refereed)
    Abstract [en]

    Phylogenetic comparative methods for real-valued traits usually make use of stochastic process whose trajectories are continuous.This is despite biological intuition that evolution is rather punctuated thangradual. On the other hand, there has been a number of recent proposals of evolutionarymodels with jump components. However, as we are only beginning to understandthe behaviour of branching Ornstein-Uhlenbeck (OU) processes the asymptoticsof branching  OU processes with jumps is an even greater unknown. In thiswork we build up on a previous study concerning OU with jumps evolution on a pure birth tree.We introduce an extinction component and explore via simulations, its effects on the weak convergence of such a process.We furthermore, also use this work to illustrate the simulation and graphic generation possibilitiesof the mvSLOUCH package.

  • 45.
    Bartoszek, Krzysztof
    et al.
    Gdansk University of Technology, Poland.
    Bartoszek, Wojciech
    Gdansk University of Technology, Poland.
    On the Time Behaviour of Okazaki Fragments2006In: Journal of Applied Probability, ISSN 0021-9002, E-ISSN 1475-6072, Vol. 43, no 2, p. 500-509Article in journal (Refereed)
    Abstract [en]

    We find explicit analytical formulae for the time dependence of the probability of the number of Okazaki fragments produced during the process of DNA replication. This extends a result of Cowan on the asymptotic probability distribution of these fragments.

  • 46.
    Bartoszek, Krzysztof
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Matematiska institutionen, Analys och sannolikhetsteori.
    Domsta, Joachim
    State Univ Appl Sci Elblag, Krzysztof Brzeski Inst Appl Informat, Ul Wojska Polskiego 1, PL-82300 Elblag, Poland.
    Pulka, Malgorzata
    Gdansk Univ Technol, Dept Probabil & Biomath, Ul Narutowicza 11-12, PL-80233 Gdansk, Poland.
    Weak Stability of Centred Quadratic Stochastic Operators2019In: BULLETIN OF THE MALAYSIAN MATHEMATICAL SCIENCES SOCIETY, ISSN 0126-6705, Vol. 42, no 4, p. 1813-1830Article in journal (Refereed)
    Abstract [en]

    We consider the weak convergence of iterates of so-called centred quadratic stochastic operators. These iterations allow us to study the discrete time evolution of probability distributions of vector-valued traits in populations of inbreeding or hermaphroditic species, whenever the offsprings trait is equal to an additively perturbed arithmetic mean of the parents traits. It is shown that for the existence of a weak limit, it is sufficient that the distributions of the trait and the perturbation have a finite variance or have tails controlled by a suitable power function. In particular, probability distributions from the domain of attraction of stable distributions have found an application, although in general the limit is not stable.

    Download full text (pdf)
    Weak Stability of Centred Quadratic Stochastic Operators
  • 47.
    Bartoszek, Krzysztof
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Erhardsson, Torkel
    Linköping University, Department of Mathematics, Applied Mathematics. Linköping University, Faculty of Science & Engineering.
    NORMAL APPROXIMATION FOR MIXTURES OF NORMAL DISTRIBUTIONS AND THE EVOLUTION OF PHENOTYPIC TRAITS2021In: Advances in Applied Probability, ISSN 0001-8678, E-ISSN 1475-6064, Vol. 53, no 1, p. 162-188Article in journal (Refereed)
    Abstract [en]

    Explicit bounds are given for the Kolmogorov andWasserstein distances between a mixture of normal distributions, by which we mean that the conditional distribution given some sigma-algebra is normal, and a normal distribution with properly chosen parameter values. The bounds depend only on the first two moments of the first two conditional moments given the sigma-algebra. The proof is based on Steins method. As an application, we consider the Yule-Ornstein-Uhlenbeck model, used in the field of phylogenetic comparative methods. We obtain bounds for both distances between the distribution of the average value of a phenotypic trait over n related species, and a normal distribution. The bounds imply and extend earlier limit theorems by Bartoszek and Sagitov.

  • 48.
    Bartoszek, Krzysztof
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Fuentes Gonzalez, Jesualdo
    Florida International University, Miami, USA..
    Mitov, Venelin
    IntiQuan GmbH, Basel, Switzerland..
    Pienaar, Jason
    Florida International University, Miami, USA..
    Piwczyński, Marcin
    Nicolaus Copernicus University, Toruń, Poland..
    Puchałka, Radosław,
    Nicolaus Copernicus University, Toruń, Poland..
    Spalik, Krzysztof
    University of Warsaw, Warszawa, Poland..
    Voje, Kjetil
    University of Oslo, Oslo, Norway..
    Fast mvSLOUCH: Model comparison for multivariate Ornstein--Uhlenbeck-based models of trait evolution on large phylogenies2023Data set
    Abstract [en]

    These are the Supplementary Material, R scripts and numerical results accompanying Bartoszek, Fuentes Gonzalez, Mitov, Pienaar, Piwczyński, Puchałka, Spalik and Voje "Model Selection Performance in Phylogenetic Comparative Methods under multivariate Ornstein–Uhlenbeck Models of Trait Evolution".

    The four data files concern two datasets. Ungulates: measurements of muzzle width, unworn lower third molar crown height, unworn lower third molar crown width and feeding style and their phylogeny; Ferula: measurements of ratio of canals, periderm thickness, wing area, wing thickness,  and fruit mass, and their phylogeny.

    Methods

    Ungulates

    The compiled ungulate dataset involves two key components: phenotypic data (Data.csv) and phylogenetic tree (Tree.tre), which consist on the following (full references for the citations presented below are provided in the paper linked to this repository, which also provides further details on the compiled dataset):The phenotypic data includes three continuous variables and one categorical variable. The continuous variables (MZW: muzzle width; HM3: unworn lower third molar crown height; WM3: unworn lower third molar crown width), measured in cm, come from Mendoza et al. (2002; J. Zool.). The categorical variable (FS, i.e. feeding style: B=browsers, G=grazers, M=mixed feeders) is based on Pérez–Barbería and Gordon (2001; Proc. R. Soc. B: Biol. Sci.). Taxonomic mismatches between these two sources were resolved based on Wilson and Reeder (2005; Johns Hopkins University Press). Only taxa with full entries for all these variables were included (i.e. no missing data allowed).

    The phylogenetic tree is pruned from the unsmoothed mammalian timetree of Hedges et al. (2015; MBE) to only include the 104 ungulate species for which there is complete phenotypic data available. Wilson and Reeder (2005; Johns Hopkins University Press) was used again to resolve taxonomic mismatches with the phenotypic data. The branch lengths of the tree are scaled to unit height and thus informative of relative time.

    Ferula

    1) The phenotypic data are divided into two data sets: first containing five continuous variables (no_ME) measured on mericarps (the dispersal unit of fruit in Apiaceae), whereas the second having the same variables together with measurement error (ME; see paper for computational details) for 75 species of Ferula and three species of Leutea. Three continuous variables were measured on anatomical cross sections (ratio_canals_ln – the proportion of oil ducts covering the space between median and lateral ribs [dimensionless], mean_gr_peri_ln_um – periderm (fruit wall) thickness [μm], thick_wings_ln_um – wing thickness [μm]); the remaining two on whole mericarps (Wings_area_ln_mm – wings area [mm2], Seed_mass_ln_mg – seed mass [mg])

    2) The phylogenetic tree was pruned from the tree obtained from the recent taxonomic revision of the genus (Panahi et al. 2018) to only include the 78 species for which the phenotypic data were generated. This tree and the associated alignment, composed of one nuclear and three plastid markers (Panahi et al. 2018), constituted an input to mcmctree software (Yang 2007) to obtain dated tree using a secondary calibration point for the root based on Banasiak et al.’s (2013) work. The branch lengths of the final tree (Ferula_fruits_tree.txt) were scaled to unit height and thus informative of relative time.

    The R setup for the manuscript was as follows:

    R version 3.6.1 (2019-09-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: openSUSE Leap 42.3

    The exact output can depend on the random seed. However, in the script we have the option of rerunning the analyses as it was in the manuscript, i.e.the random seeds that were used to generate the results are saved, included and can be read in.

    The code is divided into several directories with scripts, random seeds and result files.

    1) LikelihoodTestingDirectory contains the script test_rotation_invariance_mvSLOUCH.R that demonstrates that mvSLOUCH's likelihood calculations are rotation invariant.        

    2) Carnivorans

    Directory contains files connected to the Carnivrons' vignette in mvSLOUCH.       

    2.1) Carnivora_mvSLOUCH_objects_Full.RData

    Full output of  running the R code in the vignette.With mvSLOUCH is a very bare-minimum subset of this file that allows for the creation of the vignette.           

     2.2) Carnivora_mvSLOUCH_objects.RData              

    Reduced objects from Carnivora_mvSLOUCH_objects_Full.RData that are included with mvSLOUCH's vignette.                            

    2.3) Carnivora_mvSLOUCH_objects_remove_script.R               

    R script to reduce Carnivora_mvSLOUCH_objects_Full.RData to Carnivora_mvSLOUCH_objects.RData.     

    2.4) mvSLOUCH_Carnivorans.Rmd               

    The vignette itself.           

    2.5) refs_mvSLOUCH.bib               

    Bib file for the vignette.           

    2.6) ScaledTree.png, ScaledTree2.png, ScaledTree3.png, ScaledTree4.png   

    Plots of phylogenetic trees for vignette.

    3) SimulationStudy

    Directory contains all the output of the simulation study presented in the manuscript and scripts that allow for replication (the random number generator seeds are also provided) or running ones own simulation study, and scripts to generate graphs, and model comparison summary. This study was done using version 2.6.2 of mvSLOUCH. If one reruns using mvSLOUCH >= 2.7, then one will obtain different (corrected) values of R2 and an additional R2 version.    

    4) Ungulates

    Directory contains files connected to the "Feeding styles and oral morphology in ungulates" analyses performed for the manuscript.       

    4.1) Data.csv       

    The phenotypic data includes three continuous variables and one categorical variable. Continuous variables (MZW: muzzle width; HM3: unworn lower third molar crown height; WM3: unworn lower third molar crown width) from Mendoza et al. (2002), measured in cm. Categorical variable (FS, i.e. feeding style: B=browsers, G=grazers, M=mixed feeders) based on Pérez–Barbería and Gordon (2001). Phylogeny pruned from Hedges et al. (2015).

    Taxonomic mismatches among these sources were resolved based on Wilson and Reeder (2005). Hedges, S. B., J. Marin, M. Suleski, M. Paymer, and S. Kumar. 2015. Tree of life reveals clock-like speciation and diversification. Molecular Biology and Evolution 32:835-845. Mendoza, M., C. M. Janis, and P. Palmqvist. 2002. Characterizing complex craniodental patterns related to feeding behaviour in ungulates:a multivariate approach. Journal of Zoology 258:223-246 Pérez–Barbería, F. J., and I. J. Gordon. 2001. Relationships between oral morphology and feeding style in the Ungulata: a phylogenetically controlled evaluation. Proceedings of the Royal Society of London. Series B: Biological Sciences 268:1023-1032. Wilson, D. E., and D. M. Reeder. 2005. Mammal species of the world: A taxonomic and geographic reference. Johns Hopkins University Press, Baltimore, Maryland.                

    4.2) Tree.tre       

    Ungulates' phylogeny, extracted from the mammalian phylogeny of Hedges, S. B., J. Marin, M. Suleski, M. Paymer, and S. Kumar. 2015. Tree of life reveals clock–like speciation and diversification. Mol. Biol. Evol. 32:835–845.           

    4.3) OUB.R, OUF.R, OUG.R       

    R scripts for the analyses performed in the manuscript. Different files correspond to different regime setups of the feeding style variable.           

    4.4) OU1.txt, OUB.txt, OUF.txt, OUG.txt       

    Outputs of the model comparison conducted under the R scripts presented above (4.3). Different files correspond to different regime setups of the feeding style variable.        

    5) Ferula analyses

    In the models_ME directory there are input and output files from the mvSLOUCH analyzes of Ferula data with measurement error included, while in the models_no_ME directory the analyzes of data without measurement error. In each directory, one can find the following files:

    - input files: Data_ME.csv (with mesurment error) or Data_no_ME.csv (without measurement error) and tree file in Newick format (Ferula_fruits_tree.txt); the trait names in data files are abbreviated as follows: ration_canals – the proportion of oil ducts covering the space between median and lateral ribs, mean_gr_peri – periderm thickness, wings_area – wing area, thick_wings – wing thickness and seed_mass – seed mass,

    - the results for 8 analyzed models (see Fig. 2 in the main text), each in separate directory named model1, model2 and so on,

    - each model directory comprises the following files: two R scripts (for analyzes with diagonal and with upper triangular matrix Σyy; each model was run 1000 times), two csv files included information such as number of repetition (i), seed for preliminary analyzes generating starting point (seed_start_point), seed for the main analyses (seed) and AIC, AICc, SIC, BIC, R2 and loglik for each model run (these csv files are sorted according to AICc values), two directories containing results for 1000 analyzes, and two files extracted from these directories showing parameter estimation for the best models (with UpperTri and Diagonal matrix Σyy)

  • 49.
    Bartoszek, Krzysztof
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Gonzalez, Jesualdo Fuentes
    Department of Biological Sciences, Florida International University, Miami, Fl 33199, USA.
    Mitov, Venelin
    IntiQuan GmbH, Basel, Switzerland.
    Pienaar, Jason
    Department of Biological Sciences and the Institute of Environment, Florida International University, Miami, Fl 33199, USA.
    Piwczyński, Marcin
    Department of Ecology and Biogeography, Nicolaus Copernicus University in Toruń, Toruń, Poland.
    Puchałka, Radosław
    Department of Ecology and Biogeography, Nicolaus Copernicus University in Toruń, Toruń, Poland.
    Spalik, Krzysztof
    Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warszawa, Poland.
    Voje, Kjetil Lysne
    Natural History Museum, University of Oslo, Oslo, Norway.
    Model Selection Performance in Phylogenetic Comparative Methods Under Multivariate Ornstein–Uhlenbeck Models of Trait Evolution2023In: Systematic Biology, ISSN 1063-5157, E-ISSN 1076-836X, Vol. 72, no 2, p. 275-293Article in journal (Refereed)
    Abstract [en]

    The advent of fast computational algorithms for phylogenetic comparative methods allows for considering multiple hypotheses concerning the co-adaptation of traits and also for studying if it is possible to distinguish between such models based on contemporary species measurements. Here we demonstrate how one can perform a study with multiple competing hypotheses using mvSLOUCH by analyzing two data sets, one concerning feeding styles and oral morphology in ungulates, and the other concerning fruit evolution in Ferula (Apiaceae). We also perform simulations to determine if it is possible to distinguish between various adaptive hypotheses. We find that Akaikes information criterion corrected for small sample size has the ability to distinguish between most pairs of considered models. However, in some cases there seems to be bias towards Brownian motion or simpler Ornstein-Uhlenbeck models. We also find that measurement error and forcing the sign of the diagonal of the drift matrix for an Ornstein-Uhlenbeck process influences identifiability capabilities. It is a cliche that some models, despite being imperfect, are more useful than others. Nonetheless, having a much larger repertoire of models will surely lead to a better understanding of the natural world, as it will allow for dissecting in what ways they are wrong. [Adaptation; AICc; model selection; multivariate Ornstein-Uhlenbeck process; multivariate phylogenetic comparative methods; mvSLOUCH.]

  • 50.
    Bartoszek, Krzysztof
    et al.
    Gdansk University of Technology.
    Izydorek, Bartosz
    Gdansk University of Technology.
    Ratajczak, Tadeusz
    Gdansk University of Technology, Poland.
    Skokowski, Jaroslaw
    Medical University of Gdansk, Poland.
    Szwaracki, Karol
    Gdansk University of Technology, Poland.
    Tomczak, Wiktor
    Gdansk University of Technology, Poland.
    Neural Network Breast Cancer Relapse Time Prognosis2006In: ASO Summer School 2006 abstract book Ostrzyce 30.06-2.07. 2006 / [ed] J. Skokowski and K. Drucis, 2006, p. 8-10Conference paper (Other academic)
    Abstract [en]

    This paper is a result of a project at the Faculty of Electronics, Telecommunication and Computer Science (Technical University of Gdansk). The aim of the project was to create a neural network to predict the relapsetime of breast cancer. The neural network was to be trained on data collected over the past 20 years by dr. Jarosław Skokowski. The data includes 439 patient records described by about 40 parameters. For our neuralnetwork we only considered 6 medically most significant parameters the number of nodes showing evidence of cancer, size of tumour (in mm.), age, bloom score, estrogen receptors and proestrogen receptors and the relapsetime as the outcome. Our neural network was created in the MATLAB environment.

1234567 1 - 50 of 610
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf