liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Prediction Methods for High Dimensional Data with Censored Covariates
Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.ORCID iD: 0000-0003-4271-6683
2022 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Prediktionsmetoder för högdimensionella data med censurerade kovariater (Swedish)
Abstract [en]

While access to data steadily increases, not all data are straight-forward to use for prediction. Censored data are common in several industrial scenarios, and typically arise when there are some limitations to measuring equipment such as for instance concentration measuring equipment in chemistry or signal receivers in signal processing. 

In this thesis, we take several angles to censored covariate data for prediction problem. We explore the impact on both covariates and the response when the censored covariates are imputed. We consider linear approaches as well as non-linear approaches, and we explore how both frequentist models as well as Bayesian models perform with censored covariate data. While the focus is using the imputed covariate data for prediction, we also investigate model parameter inference and uncertainty inferred by the imputations. 

We use real, censored covariate telecommunications data for prediction with some of the most commonly used prediction models and evaluate the performance when single imputations are made. We propose a selective multiple imputation approach which is suitable for high dimensional data that perform well with heavy censoring. We take a Bayesian linear regression approach leveraging information from auxiliary variables using multivariate regression and introduce multivariate draws from conditional distributions to update censored values in the covariates. We fnally offer a bridge between the fexibility of Neural Networks and the probabilistic nature of Bayesian methods by taking a Variational Autoencoder approach and introducing Zero-Infated Truncated Gaussian likelihoods for the covariates to better ft the censored distributions. 

Abstract [sv]

I många industriella sammanhang finns stora mängder data att tillgå. Dessa data är dock ofta inkompletta, och strategier behövs för kunna nyttja data på bästa sätt när de används för prediktion. Mycket forskning har fortgått för att hantera saknade data i responsvariabeln, den variabel som ska predikteras, medan mindre forskning inriktats på saknade värden i kovariater, variablerna som används för att prediktera responsvariabeln. Ännu mindre forskning har fokuserat på så kallade censurerade data. Censurerade data är ett specialfall av saknade data där data är partiellt observerat, men som inte kan observeras fullt då exempelvis värden under en specifik tröskel inte går att mäta. Detta är vanligt i exempelvis signaldata, där mottagaren av signalen har en undre gräns för hörbarhet.

I denna avhandling bidrar vi till forskning för censurerade kovariater i prediktionsmodeller genom att introducera strategier som är snabbare och kan hantera mer komplexa beroenden i data än befintliga metoder. Vi angriper problemet från flertalet vinklar, och detta arbete presenterar metoder för att både kunna prediktera data, återställa de censurerade värdena och parametrar från datagenereringsprocessen med god precision.

Vi ställer olika traditionella metoder mot varandra och utvärderar hur enkla metoder för att ersätta, så kallat imputera, censurerade värden påverkar osäkerheten i prediktioner och presenterar alternativ till att ta specifika beslut under stor osäkerhet. Vi visar att det kan vara en fördel att vid tung censurering inte imputera alla censurerade värden och på så sätt åstadkomma kortare beräkningstider. Vi presenterar hur man kan använda beroenden mellan kovariater för att åstadkomma mer effektiva beräkningar och mer precisa imputationer. Slutligen visar vi hur man kan ändra antaganden för sannolikhetsfördelningarna för censorerad data för att kunna imputera med bättre precision. Vi gör detta med en metod som är snabb, flexibel för komplexa data och som kan generera skattningar på osäkerhet.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2022. , p. 26
Series
Linköping Studies in Arts and Sciences, ISSN 0282-9800 ; 839Linköping Studies in Statistics, ISSN 1651-1700 ; 16
Keywords [en]
statistics, machine learning, censored covariates
Keywords [sv]
statistik, maskininlärning, censurerade kovariater
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:liu:diva-187763DOI: 10.3384/9789179293994ISBN: 9789179293987 (print)ISBN: 9789179293994 (electronic)OAI: oai:DiVA.org:liu-187763DiVA, id: diva2:1689469
Public defence
2022-10-04, Ada Lovelace, B-huset, Campus Valla, Linköping, 13:15
Opponent
Supervisors
Available from: 2022-08-23 Created: 2022-08-23 Last updated: 2022-09-08Bibliographically approved
List of papers
1. Inter-Frequency Radio Signal Quality Prediction for Handover, Evaluated in 3GPP LTE
Open this publication in new window or tab >>Inter-Frequency Radio Signal Quality Prediction for Handover, Evaluated in 3GPP LTE
Show others...
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Radio resource management in cellular networks is typically based on device measurements reported to the serving base station. Frequent measuring of signal quality on available frequencies would allow for highly reliable networks and optimal connection at

Series
IEEE VEHICULAR TECHNOLOGY CONFERENCE, ISSN 1550-2252
National Category
Computer Sciences Communication Systems
Identifiers
urn:nbn:se:liu:diva-159307 (URN)10.1109/VTCSpring.2019.8746369 (DOI)000482655600080 ()2-s2.0-85068961000 (Scopus ID)
Conference
2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring), Kuala Lumpur, Malaysia, Malaysia, 28 April-1 May 2019
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Funding agencies: Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation

Available from: 2019-08-06 Created: 2019-08-06 Last updated: 2022-08-23Bibliographically approved
2. Selective Imputation of Covariates in High Dimensional Censored Data
Open this publication in new window or tab >>Selective Imputation of Covariates in High Dimensional Censored Data
2022 (English)In: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715, Vol. 31, no 4, p. 1397-1405Article in journal (Refereed) Published
Abstract [en]

Efficient modeling of censored data, that is, data which are restricted by some detection limit or truncation, is important for many applications. Ignoring the censoring can be problematic as valuable information may be missing and restoration of these censored values may significantly improve the quality of models. There are many scenarios where one may encounter censored data: survival data, interval-censored data or data with a lower limit of detection. Strategies to handle censored data are plenty, however, little effort has been made to handle censored data of high dimension. In this article, we present a selective multiple imputation approach for predictive modeling when a larger number of covariates are subject to censoring. Our method allows for iterative, subject-wise selection of covariates to impute in order to achieve a fast and accurate predictive model. The algorithm furthermore selects values for imputation which are likely to provide important information if imputed. In contrast to previously proposed methods, our approach is fully nonparametric and therefore, very flexible. We demonstrate that, in comparison to previous work, our model achieves faster execution and often comparable accuracy in a simulated example as well as predicting signal strength in radio network data. for this article are available online.

Place, publisher, year, edition, pages
Taylor & Francis Inc, 2022
Keywords
Censored covariates; Nonparametric model; Random forest; Wireless networks
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-184118 (URN)10.1080/10618600.2022.2035233 (DOI)000772776700001 ()
Note

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2023-03-06

Open Access in DiVA

Prediction Methods for High Dimensional Data with Censored Covariates(3937 kB)286 downloads
File information
File name FULLTEXT01.pdfFile size 3937 kBChecksum SHA-512
d3e9b6f5225b8918d2f003379a88cb9ea4cf48b2290276575fe5463da424e257fd1cc0b868f251d586067fcac29d110d3e6649dd7c50d3e514d0e9359f83ab10
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Svahn, Caroline

Search in DiVA

By author/editor
Svahn, Caroline
By organisation
The Division of Statistics and Machine LearningFaculty of Arts and Sciences
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 286 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1735 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf