liu.seSearch for publications in DiVA
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Exploring cancer register data to find risk factors for recurrence of breast cancer: Application of Canonical Correlation Analysis
Linköpings universitet, Institutionen för medicinsk teknik, Medicinsk informatiK. Linköpings universitet, Tekniska högskolan.
Linköpings universitet, Institutionen för medicinsk teknik, Medicinsk informatiK. Linköpings universitet, Tekniska högskolan.
Linköpings universitet, Institutionen för klinisk och experimentell medicin, Onkologi. Linköpings universitet, Hälsouniversitetet.
Department of Surgery, County Hospital, Kalmar, Sweden.
Vise andre og tillknytning
2005 (engelsk)Inngår i: BMC Medical Informatics and Decision Making, ISSN 1472-6947, Vol. 5, nr 29, s. 29-35Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Background

A common approach in exploring register data is to find relationships between outcomes and predictors by using multiple regression analysis (MRA). If there is more than one outcome variable, the analysis must then be repeated, and the results combined in some arbitrary fashion. In contrast, Canonical Correlation Analysis (CCA) has the ability to analyze multiple outcomes at the same time.

One essential outcome after breast cancer treatment is recurrence of the disease. It is important to understand the relationship between different predictors and recurrence, including the time interval until recurrence. This study describes the application of CCA to find important predictors for two different outcomes for breast cancer patients, loco-regional recurrence and occurrence of distant metastasis and to decrease the number of variables in the sets of predictors and outcomes without decreasing the predictive strength of the model.

Methods

Data for 637 malignant breast cancer patients admitted in the south-east region of Sweden were analyzed. By using CCA and looking at the structure coefficients (loadings), relationships between tumor specifications and the two outcomes during different time intervals were analyzed and a correlation model was built.

Results

The analysis successfully detected known predictors for breast cancer recurrence during the first two years and distant metastasis 2–4 years after diagnosis. Nottingham Histologic Grading (NHG) was the most important predictor, while age of the patient at the time of diagnosis was not an important predictor.

Conclusion

In cancer registers with high dimensionality, CCA can be used for identifying the importance of risk factors for breast cancer recurrence. This technique can result in a model ready for further processing by data mining methods through reducing the number of variables to important ones.

sted, utgiver, år, opplag, sider
2005. Vol. 5, nr 29, s. 29-35
HSV kategori
Identifikatorer
URN: urn:nbn:se:liu:diva-12706DOI: 10.1186/1472-6947-5-29OAI: oai:DiVA.org:liu-12706DiVA, id: diva2:16890
Tilgjengelig fra: 2009-02-22 Laget: 2008-10-24 Sist oppdatert: 2009-03-10bibliografisk kontrollert
Inngår i avhandling
1. Applications of Knowledge Discovery in Quality Registries - Predicting Recurrence of Breast Cancer and Analyzing Non-compliance with a Clinical Guideline
Åpne denne publikasjonen i ny fane eller vindu >>Applications of Knowledge Discovery in Quality Registries - Predicting Recurrence of Breast Cancer and Analyzing Non-compliance with a Clinical Guideline
2007 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

In medicine, data are produced from different sources and continuously stored in data depositories. Examples of these growing databases are quality registries. In Sweden, there are many cancer registries where data on cancer patients are gathered and recorded and are used mainly for reporting survival analyses to high level health authorities.

In this thesis, a breast cancer quality registry operating in South-East of Sweden is used as the data source for newer analytical techniques, i.e. data mining as a part of knowledge discovery in databases (KDD) methodology. Analyses are done to sift through these data in order to find interesting information and hidden knowledge. KDD consists of multiple steps, starting with gathering data from different sources and preparing them in data pre-processing stages prior to data mining.

Data were cleaned from outliers and noise and missing values were handled. Then a proper subset of the data was chosen by canonical correlation analysis (CCA) in a dimensionality reduction step. This technique was chosen because there were multiple outcomes, and variables had complex relationship to one another.

After data were prepared, they were analyzed with a data mining method. Decision tree induction as a simple and efficient method was used to mine the data. To show the benefits of proper data pre-processing, results from data mining with pre-processing of the data were compared with results from data mining without data pre-processing. The comparison showed that data pre-processing results in a more compact model with a better performance in predicting the recurrence of cancer.

An important part of knowledge discovery in medicine is to increase the involvement of medical experts in the process. This starts with enquiry about current problems in their field, which leads to finding areas where computer support can be helpful. The experts can suggest potentially important variables and should then approve and validate new patterns or knowledge as predictive or descriptive models. If it can be shown that the performance of a model is comparable to domain experts, it is more probable that the model will be used to support physicians in their daily decision-making. In this thesis, we validated the model by comparing predictions done by data mining and those made by domain experts without finding any significant difference between them.

Breast cancer patients who are treated with mastectomy are recommended to receive radiotherapy. This treatment is called postmastectomy radiotherapy (PMRT) and there is a guideline for prescribing it. A history of this treatment is stored in breast cancer registries. We analyzed these datasets using rules from a clinical guideline and identified cases that had not been treated according to the PMRT guideline. Data mining revealed some patterns of non-compliance with the PMRT guideline. Further analysis with data mining revealed some reasons for guideline non-compliance. These patterns were then compared with reasons acquired from manual inspection of patient records. The comparisons showed that patterns resulting from data mining were limited to the stored variables in the registry. A prerequisite for better results is availability of comprehensive datasets.

Medicine can take advantage of KDD methodology in different ways. The main advantage is being able to reuse information and explore hidden knowledge that can be obtained using advanced analysis techniques. The results depend on good collaboration between medical informaticians and domain experts and the availability of high quality data.

sted, utgiver, år, opplag, sider
Institutionen för medicinsk teknik, 2007. s. 58
Serie
Linköping University Medical Dissertations, ISSN 0345-0082 ; 1018
Emneord
Breast cancer, Clinical guidelines, Canonical correlation analysis, Data Mining, Data pre-processing, Decision tree induction, Knowledge Discovery in Databases
HSV kategori
Identifikatorer
urn:nbn:se:liu:diva-10142 (URN)978-91-85895-81-6 (ISBN)
Disputas
2007-11-22, Elsa Brändström, Campus US, Linköpings universitet, Linköping, 09:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2007-10-30 Laget: 2007-10-30 Sist oppdatert: 2009-05-12

Open Access i DiVA

fulltekst(306 kB)1389 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 306 kBChecksum SHA-512
5eb14fc4a5bac29e3fb996440b3306dc6c0a781e8a72d1356a128b50f9a6ccd1dfa6ff87f0aaa121900f118218c0e6ee57ab30c34bc46d73e9b35084e850784b
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekstLink to Ph.D. thesis

Personposter BETA

Razavi, Amir RezaGill, HansStål, OlleSundquist, MarieThorstenson, StenÅhlfeldt, HansShahsavar, Nosrat

Søk i DiVA

Av forfatter/redaktør
Razavi, Amir RezaGill, HansStål, OlleSundquist, MarieThorstenson, StenÅhlfeldt, HansShahsavar, Nosrat
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 1389 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 776 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf