liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Predicting metastasis in breast cancer: comparing a decision tree with domain experts
Linköpings universitet, Institutionen för medicinsk teknik, Medicinsk informatik. Linköpings universitet, Tekniska högskolan.
Linköpings universitet, Institutionen för medicinsk teknik, Medicinsk informatik. Linköpings universitet, Tekniska högskolan.
Linköpings universitet, Institutionen för medicinsk teknik, Medicinsk informatik. Linköpings universitet, Tekniska högskolan.
Linköpings universitet, Institutionen för medicinsk teknik, Medicinsk informatik. Linköpings universitet, Tekniska högskolan.
2007 (Engelska)Ingår i: Journal of Medical Systems, ISSN 0148-5598, Vol. 31, nr 4, s. 263-273Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Breast malignancy is the second most common cause of cancer death among women in Western countries. Identifying high-risk patients is vital in order to provide them with specialized treatment. In some situations, such as when access to experienced oncologists is not possible, decision support methods can be helpful in predicting the recurrence of cancer. Three thousand six hundred ninety-nine breast cancer patients admitted in south-east Sweden from 1986 to 1995 were studied. A decision tree was trained with all patients except for 100 cases and tested with those 100 cases. Two domain experts were asked for their opinions about the probability of recurrence of a certain outcome for these 100 patients. ROC curves, area under the ROC curves, and calibration for predictions were computed and compared. After comparing the predictions from a model built by data mining with predictions made by two domain experts, no significant differences were noted. In situations where experienced oncologists are not available, predictive models created with data mining techniques can be used to support physicians in decision making with acceptable accuracy.

Ort, förlag, år, upplaga, sidor
2007. Vol. 31, nr 4, s. 263-273
Nyckelord [en]
Data mining, Decision tree induction (DTI), Breast cancer, Classification, Prediction, Domain expert, Decision support
Nationell ämneskategori
Biomedicinsk laboratorievetenskap/teknologi
Identifikatorer
URN: urn:nbn:se:liu:diva-12708DOI: 10.1007/s10916-007-9064-1OAI: oai:DiVA.org:liu-12708DiVA, id: diva2:16892
Tillgänglig från: 2007-10-30 Skapad: 2007-10-30 Senast uppdaterad: 2009-05-12
Ingår i avhandling
1. Applications of Knowledge Discovery in Quality Registries - Predicting Recurrence of Breast Cancer and Analyzing Non-compliance with a Clinical Guideline
Öppna denna publikation i ny flik eller fönster >>Applications of Knowledge Discovery in Quality Registries - Predicting Recurrence of Breast Cancer and Analyzing Non-compliance with a Clinical Guideline
2007 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

In medicine, data are produced from different sources and continuously stored in data depositories. Examples of these growing databases are quality registries. In Sweden, there are many cancer registries where data on cancer patients are gathered and recorded and are used mainly for reporting survival analyses to high level health authorities.

In this thesis, a breast cancer quality registry operating in South-East of Sweden is used as the data source for newer analytical techniques, i.e. data mining as a part of knowledge discovery in databases (KDD) methodology. Analyses are done to sift through these data in order to find interesting information and hidden knowledge. KDD consists of multiple steps, starting with gathering data from different sources and preparing them in data pre-processing stages prior to data mining.

Data were cleaned from outliers and noise and missing values were handled. Then a proper subset of the data was chosen by canonical correlation analysis (CCA) in a dimensionality reduction step. This technique was chosen because there were multiple outcomes, and variables had complex relationship to one another.

After data were prepared, they were analyzed with a data mining method. Decision tree induction as a simple and efficient method was used to mine the data. To show the benefits of proper data pre-processing, results from data mining with pre-processing of the data were compared with results from data mining without data pre-processing. The comparison showed that data pre-processing results in a more compact model with a better performance in predicting the recurrence of cancer.

An important part of knowledge discovery in medicine is to increase the involvement of medical experts in the process. This starts with enquiry about current problems in their field, which leads to finding areas where computer support can be helpful. The experts can suggest potentially important variables and should then approve and validate new patterns or knowledge as predictive or descriptive models. If it can be shown that the performance of a model is comparable to domain experts, it is more probable that the model will be used to support physicians in their daily decision-making. In this thesis, we validated the model by comparing predictions done by data mining and those made by domain experts without finding any significant difference between them.

Breast cancer patients who are treated with mastectomy are recommended to receive radiotherapy. This treatment is called postmastectomy radiotherapy (PMRT) and there is a guideline for prescribing it. A history of this treatment is stored in breast cancer registries. We analyzed these datasets using rules from a clinical guideline and identified cases that had not been treated according to the PMRT guideline. Data mining revealed some patterns of non-compliance with the PMRT guideline. Further analysis with data mining revealed some reasons for guideline non-compliance. These patterns were then compared with reasons acquired from manual inspection of patient records. The comparisons showed that patterns resulting from data mining were limited to the stored variables in the registry. A prerequisite for better results is availability of comprehensive datasets.

Medicine can take advantage of KDD methodology in different ways. The main advantage is being able to reuse information and explore hidden knowledge that can be obtained using advanced analysis techniques. The results depend on good collaboration between medical informaticians and domain experts and the availability of high quality data.

Ort, förlag, år, upplaga, sidor
Institutionen för medicinsk teknik, 2007. s. 58
Serie
Linköping University Medical Dissertations, ISSN 0345-0082 ; 1018
Nyckelord
Breast cancer, Clinical guidelines, Canonical correlation analysis, Data Mining, Data pre-processing, Decision tree induction, Knowledge Discovery in Databases
Nationell ämneskategori
Biomedicinsk laboratorievetenskap/teknologi
Identifikatorer
urn:nbn:se:liu:diva-10142 (URN)978-91-85895-81-6 (ISBN)
Disputation
2007-11-22, Elsa Brändström, Campus US, Linköpings universitet, Linköping, 09:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2007-10-30 Skapad: 2007-10-30 Senast uppdaterad: 2020-03-29

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextLink to Ph.D. Thesis

Person

Razavi, Amir RezaGill, HansÅhlfeldt, HansShahsavar, Nosrat

Sök vidare i DiVA

Av författaren/redaktören
Razavi, Amir RezaGill, HansÅhlfeldt, HansShahsavar, Nosrat
Av organisationen
Medicinsk informatikTekniska högskolan
Biomedicinsk laboratorievetenskap/teknologi

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 835 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf