liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Applications of Knowledge Discovery in Quality Registries - Predicting Recurrence of Breast Cancer and Analyzing Non-compliance with a Clinical Guideline
Linköping University, Department of Biomedical Engineering, Medical Informatics. Linköping University, The Institute of Technology.
2007 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In medicine, data are produced from different sources and continuously stored in data depositories. Examples of these growing databases are quality registries. In Sweden, there are many cancer registries where data on cancer patients are gathered and recorded and are used mainly for reporting survival analyses to high level health authorities.

In this thesis, a breast cancer quality registry operating in South-East of Sweden is used as the data source for newer analytical techniques, i.e. data mining as a part of knowledge discovery in databases (KDD) methodology. Analyses are done to sift through these data in order to find interesting information and hidden knowledge. KDD consists of multiple steps, starting with gathering data from different sources and preparing them in data pre-processing stages prior to data mining.

Data were cleaned from outliers and noise and missing values were handled. Then a proper subset of the data was chosen by canonical correlation analysis (CCA) in a dimensionality reduction step. This technique was chosen because there were multiple outcomes, and variables had complex relationship to one another.

After data were prepared, they were analyzed with a data mining method. Decision tree induction as a simple and efficient method was used to mine the data. To show the benefits of proper data pre-processing, results from data mining with pre-processing of the data were compared with results from data mining without data pre-processing. The comparison showed that data pre-processing results in a more compact model with a better performance in predicting the recurrence of cancer.

An important part of knowledge discovery in medicine is to increase the involvement of medical experts in the process. This starts with enquiry about current problems in their field, which leads to finding areas where computer support can be helpful. The experts can suggest potentially important variables and should then approve and validate new patterns or knowledge as predictive or descriptive models. If it can be shown that the performance of a model is comparable to domain experts, it is more probable that the model will be used to support physicians in their daily decision-making. In this thesis, we validated the model by comparing predictions done by data mining and those made by domain experts without finding any significant difference between them.

Breast cancer patients who are treated with mastectomy are recommended to receive radiotherapy. This treatment is called postmastectomy radiotherapy (PMRT) and there is a guideline for prescribing it. A history of this treatment is stored in breast cancer registries. We analyzed these datasets using rules from a clinical guideline and identified cases that had not been treated according to the PMRT guideline. Data mining revealed some patterns of non-compliance with the PMRT guideline. Further analysis with data mining revealed some reasons for guideline non-compliance. These patterns were then compared with reasons acquired from manual inspection of patient records. The comparisons showed that patterns resulting from data mining were limited to the stored variables in the registry. A prerequisite for better results is availability of comprehensive datasets.

Medicine can take advantage of KDD methodology in different ways. The main advantage is being able to reuse information and explore hidden knowledge that can be obtained using advanced analysis techniques. The results depend on good collaboration between medical informaticians and domain experts and the availability of high quality data.

Place, publisher, year, edition, pages
Institutionen för medicinsk teknik , 2007. , 58 p.
Series
Linköping University Medical Dissertations, ISSN 0345-0082 ; 1018
Keyword [en]
Breast cancer, Clinical guidelines, Canonical correlation analysis, Data Mining, Data pre-processing, Decision tree induction, Knowledge Discovery in Databases
National Category
Biomedical Laboratory Science/Technology
Identifiers
URN: urn:nbn:se:liu:diva-10142ISBN: 978-91-85895-81-6 (print)OAI: oai:DiVA.org:liu-10142DiVA: diva2:16895
Public defence
2007-11-22, Elsa Brändström, Campus US, Linköpings universitet, Linköping, 09:00 (English)
Opponent
Supervisors
Available from: 2007-10-30 Created: 2007-10-30 Last updated: 2009-05-12
List of papers
1. Exploring cancer register data to find risk factors for recurrence of breast cancer: Application of Canonical Correlation Analysis
Open this publication in new window or tab >>Exploring cancer register data to find risk factors for recurrence of breast cancer: Application of Canonical Correlation Analysis
Show others...
2005 (English)In: BMC Medical Informatics and Decision Making, ISSN 1472-6947, Vol. 5, no 29, 29-35 p.Article in journal (Refereed) Published
Abstract [en]

Background

A common approach in exploring register data is to find relationships between outcomes and predictors by using multiple regression analysis (MRA). If there is more than one outcome variable, the analysis must then be repeated, and the results combined in some arbitrary fashion. In contrast, Canonical Correlation Analysis (CCA) has the ability to analyze multiple outcomes at the same time.

One essential outcome after breast cancer treatment is recurrence of the disease. It is important to understand the relationship between different predictors and recurrence, including the time interval until recurrence. This study describes the application of CCA to find important predictors for two different outcomes for breast cancer patients, loco-regional recurrence and occurrence of distant metastasis and to decrease the number of variables in the sets of predictors and outcomes without decreasing the predictive strength of the model.

Methods

Data for 637 malignant breast cancer patients admitted in the south-east region of Sweden were analyzed. By using CCA and looking at the structure coefficients (loadings), relationships between tumor specifications and the two outcomes during different time intervals were analyzed and a correlation model was built.

Results

The analysis successfully detected known predictors for breast cancer recurrence during the first two years and distant metastasis 2–4 years after diagnosis. Nottingham Histologic Grading (NHG) was the most important predictor, while age of the patient at the time of diagnosis was not an important predictor.

Conclusion

In cancer registers with high dimensionality, CCA can be used for identifying the importance of risk factors for breast cancer recurrence. This technique can result in a model ready for further processing by data mining methods through reducing the number of variables to important ones.

National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:liu:diva-12706 (URN)10.1186/1472-6947-5-29 (DOI)
Available from: 2009-02-22 Created: 2008-10-24 Last updated: 2009-03-10Bibliographically approved
2. A Data Pre-processing Method to Increase Efficiency and Accuracy in Data Mining
Open this publication in new window or tab >>A Data Pre-processing Method to Increase Efficiency and Accuracy in Data Mining
2005 (English)In: 10th Conference on Artificial Intelligence in Medicine, AIME2005 - Aberdeen, UK, 2005, 434-443 p.Conference paper, Published paper (Other academic)
Abstract [en]

In medicine, data mining methods such as Decision Tree Induction (DTI) can be trained for extracting rules to predict the outcomes of new patients. However, incompleteness and high dimensionality of stored data are a problem. Canonical Correlation Analysis (CCA) can be used prior to DTI as a dimension reduction technique to preserve the character of the original data by omitting non-essential data. In this study, data from 3949 breast cancer patients were analysed. Raw data were cleaned by running a set of logical rules. Missing values were replaced using the Expectation Maximization algorithm. After dimension reduction with CCA, DTI was employed to analyse the resulting dataset. The validity of the predictive model was confirmed by ten-fold cross validation and the effect of pre-processing was analysed by applying DTI to data without pre-processing. Replacing missing values and using CCA for data reduction dramatically reduced the size of the resulting tree and increased the accuracy of the prediction of breast cancer recurrence.

Series
Lecture Notes in Computer Science, ISSN 0302-9743 (Print) 1611-3349 (Online) ; 3581
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-12707 (URN)10.1007/11527770_59 (DOI)978-3-540-27831-3 (ISBN)
Available from: 2007-10-30 Created: 2007-10-30 Last updated: 2009-05-29
3. Predicting metastasis in breast cancer: comparing a decision tree with domain experts
Open this publication in new window or tab >>Predicting metastasis in breast cancer: comparing a decision tree with domain experts
2007 (English)In: Journal of Medical Systems, ISSN 0148-5598, Vol. 31, no 4, 263-273 p.Article in journal (Refereed) Published
Abstract [en]

Breast malignancy is the second most common cause of cancer death among women in Western countries. Identifying high-risk patients is vital in order to provide them with specialized treatment. In some situations, such as when access to experienced oncologists is not possible, decision support methods can be helpful in predicting the recurrence of cancer. Three thousand six hundred ninety-nine breast cancer patients admitted in south-east Sweden from 1986 to 1995 were studied. A decision tree was trained with all patients except for 100 cases and tested with those 100 cases. Two domain experts were asked for their opinions about the probability of recurrence of a certain outcome for these 100 patients. ROC curves, area under the ROC curves, and calibration for predictions were computed and compared. After comparing the predictions from a model built by data mining with predictions made by two domain experts, no significant differences were noted. In situations where experienced oncologists are not available, predictive models created with data mining techniques can be used to support physicians in decision making with acceptable accuracy.

Keyword
Data mining, Decision tree induction (DTI), Breast cancer, Classification, Prediction, Domain expert, Decision support
National Category
Biomedical Laboratory Science/Technology
Identifiers
urn:nbn:se:liu:diva-12708 (URN)10.1007/s10916-007-9064-1 (DOI)
Available from: 2007-10-30 Created: 2007-10-30 Last updated: 2009-05-12
4. A Data Mining Approach to Analyze Non-compliance with a Guideline for the Treatment of Breast Cancer
Open this publication in new window or tab >>A Data Mining Approach to Analyze Non-compliance with a Guideline for the Treatment of Breast Cancer
2007 (English)In: Studies in Health Technology and Informatics, ISSN 0926-9630, Vol. 129, 591-597 p.Article in journal (Refereed) Published
Abstract [en]

Postmastectomy radiotherapy (PMRT) is prescribed in order to reduce the local recurrence of breast cancer and improve overall survival. A guideline supports the trade-off between benefits and adverse effects of PMRT. However, this guideline is not always followed in practice. This study tries to find a method for revealing patterns of non-compliance between the actual treatment and the PMRT guideline.

Data from breast cancer patients admitted to Linköping University Hospital between 1990 and 2000 were analyzed in this study. Cases that were not treated in accordance with the guideline were selected and analyzed by decision tree induction (DTI). Thereafter, four resulting rules, as representations for groups of patients, were compared to the guideline.

Finding patterns of non-compliance with guidelines by means of rules can be an appropriate alternative to manual methods, i.e. a case-by-case comparison when studying very large datasets. The resulting rules can be used in a knowledge base of a guideline-based decision support system to alert when inconsistencies with the guidelines may appear.

National Category
Biomedical Laboratory Science/Technology
Identifiers
urn:nbn:se:liu:diva-12709 (URN)
Available from: 2007-10-30 Created: 2007-10-30 Last updated: 2009-03-16
5. Non-compliance with a postmastectomy radiotherapy guideline: Decision tree and cause analysis
Open this publication in new window or tab >>Non-compliance with a postmastectomy radiotherapy guideline: Decision tree and cause analysis
2008 (English)In: BMC Medical Informatics and Decision Making, ISSN 1472-6947, Vol. 8, no 41Article in journal (Refereed) Published
Abstract [en]

Background: The guideline for postmastectomy radiotherapy (PMRT), which is prescribed to reduce recurrence of breast cancer in the chest wall and improve overall survival, is not always followed. Identifying and extracting important patterns of non-compliance are crucial in maintaining the quality of care in Oncology.

Methods: Analysis of 759 patients with malignant breast cancer using decision tree induction (DTI) found patterns of non-compliance with the guideline. The PMRT guideline was used to separate cases according to the recommendation to receive or not receive PMRT. The two groups of patients were analyzed separately. Resulting patterns were transformed into rules that were then compared with the reasons that were extracted by manual inspection of records for the non-compliant cases.

Results: Analyzing patients in the group who should receive PMRT according to the guideline did not result in a robust decision tree. However, classification of the other group, patients who should not receive PMRT treatment according to the guideline, resulted in a tree with nine leaves and three of them were representing non-compliance with the guideline. In a comparison between rules resulting from these three non-compliant patterns and manual inspection of patient records, the following was found:

In the decision tree, presence of perigland growth is the most important variable followed by number of malignantly invaded lymph nodes and level of Progesterone receptor. DNA index, age, size of the tumor and level of Estrogen receptor are also involved but with less importance. From manual inspection of the cases, the most frequent pattern for non-compliance is age above the threshold followed by near cut-off values for risk factors and unknown reasons.

Conclusion: Comparison of patterns of non-compliance acquired from data mining and manual inspection of patient records demonstrates that not all of the non-compliances are repetitive or important. There are some overlaps between important variables acquired from manual inspection of patient records and data mining but they are not identical. Data mining can highlight non-compliance patterns valuable for guideline authors and for medical audit. Improving guidelines by using feedback from data mining can improve the quality of care in oncology.

National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:liu:diva-15222 (URN)
Note
Original publication: Amir R Razavi, Hans Gill, Hans Åhlfeldt and Nosrat Shahsavar, Non-compliance with a postmastectomy radiotherapy guideline: Decision tree and cause analysis, 2008, BMC Medical Informatics and Decision Making, (8), 41.http://dx.doi.org/10.1186/1472-6947-8-41. Copyright: The authorsAvailable from: 2008-10-24 Created: 2008-10-24 Last updated: 2012-02-22Bibliographically approved

Open Access in DiVA

cover(153 kB)97 downloads
File information
File name COVER01.pdfFile size 153 kBChecksum SHA-1
ee2e6886cb74f0b38d694e0d7729cd0d7fcd8fe13e2922eb38069bc06a04352961bdca62
Type coverMimetype application/pdf
fulltext(354 kB)2184 downloads
File information
File name FULLTEXT01.pdfFile size 354 kBChecksum SHA-1
c0b21fe462e40438e93d7896c5d3e5eb7241cc57f71fbce68a41bd0cf47a26517bbc7def
Type fulltextMimetype application/pdf

Authority records BETA

Razavi, Amir Reza

Search in DiVA

By author/editor
Razavi, Amir Reza
By organisation
Medical InformaticsThe Institute of Technology
Biomedical Laboratory Science/Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 2184 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 3024 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf