Automation of a Data Analysis Pipeline for High-content Screening Data
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
High-content screening is a part of the drug discovery pipeline dealing with the identification of substances that affect cells in a desired manner. Biological assays with a large set of compounds are developed and screened and the output is generated with a multidimensional structure. Data analysis is performed manually by an expert with a set of tools and this is considered to be too time consuming and unmanageable when the amount of data grows large. This thesis therefore investigates and proposes a way of automating the data analysis phase through a set of machine learning algorithms. The resulting implementation is a cloud based application that can support the user with the selection of which features that are relevant for further analysis. It also provides techniques for automated processing of the dataset and training of classification models which can be utilised for predicting sample labels. An investigation of the workflow for analysing data was conducted before this thesis. It resulted in a pipeline that maps the different tools and software to what goal they fulfil and which purpose they have for the user. This pipeline was then compared with a similar pipeline but with the implemented application included. This comparison demonstrates clear advantages in contrast to previous methodologies in that the application will provide support to work in a more automated way of performing data analysis.
Place, publisher, year, edition, pages
2015. , 77 p.
Machine learning, Datateknik
Media and Communication Technology
IdentifiersURN: urn:nbn:se:liu:diva-122913ISRN: LIU-ITN-TEK-A--15/053--SEOAI: oai:DiVA.org:liu-122913DiVA: diva2:874880
Subject / course