liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bioinformatic methods for characterization of viral pathogens in metagenomic samples
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, The Institute of Technology.
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Virus infections impose a huge disease burden on humanity and new viruses are continuously found. As most studies of viral disease are limited to theinvestigation of known viruses, it is important to characterize all circulating viruses. Thus, a broad and unselective exploration of the virus flora would be the most productive development of modern virology. Fueled by the reduction in sequencing costs and the unbiased nature of shotgun sequencing, viral metagenomics has rapidly become the strategy of choice for this exploration.

This thesis mainly focuses on improving key methods used in viral metagenomics as well as the complete viral characterization of two sets of samples using these methods. The major methods developed are an efficient automated analysis pipeline for metagenomics data and two novel, more accurate, alignment algorithms for 454 sequencing data. The automated pipeline facilitates rapid, complete and effortless analysis of metagenomics samples, which in turn enables detection of potential pathogens, for instance in patient samples. The two new alignment algorithms developed cover comparisons both against nucleotide and  protein databases, while retaining the underlying 454 data representation. Furthermore, a simulator for 454 data was developed in order to evaluate these methods. This simulator is currently the fastest and most complete simulator of 454 data, which enables further development of algorithms and methods. Finally, we have successfully used these methods to fully characterize a multitude of samples, including samples collected from children suffering from severe lower respiratory tract infections as well as patients diagnosed with chronic fatigue syndrome, both of which presented in this thesis. In these studies, a complete viral characterization has revealed the presence of both expected and unexpected viral pathogens as well as many potential novel viruses.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2013. , 65 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1489
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:liu:diva-86194ISBN: 978-91-7519-745-6 (print)OAI: oai:DiVA.org:liu-86194DiVA: diva2:575554
Public defence
2013-01-25, Planck, Fysikhuset, Campus Valla, Linköpings universitet, Linköping, 10:15 (English)
Opponent
Supervisors
Available from: 2012-12-10 Created: 2012-12-10 Last updated: 2012-12-10Bibliographically approved
List of papers
1. An efficient simulator of 454 data using configurable statistical models
Open this publication in new window or tab >>An efficient simulator of 454 data using configurable statistical models
2011 (English)In: BMC Research Notes, ISSN 1756-0500, E-ISSN 1756-0500, Vol. 4, no 449Article in journal (Refereed) Published
Abstract [en]

Background

Roche 454 is one of the major 2nd generation sequencing platforms. The particular characteristics of 454 sequence data   pose new challenges for bioinformatic analyses, e.g. assembly and alignment search   algorithms. Simulation of these data is therefore useful, in order to further assess   how bioinformatic applications and algorithms handle 454 data.

Findings

We developed a new application named 454sim for simulation of 454 data at high speed   and accuracy. The program is multi-thread capable and is available as C++ source code   or pre-compiled binaries. Sequence reads are simulated by 454sim using a set of statistical   models for each chemistry. 454sim simulates recorded peak intensities, peak quality   deterioration and it calculates quality values. All three generations of the Roche   454 chemistry ('GS20', 'GS FLX' and 'Titanium') are supported and defined in external   text files for easy access and tweaking.

Conclusions

We present a new platform independent application named 454sim. 454sim is generally   200 times faster compared to previous programs and it allows for simple adjustments   of the statistical models. These improvements make it possible to carry out more complex   and rigorous algorithm evaluations in a reasonable time scale.

National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:liu:diva-79435 (URN)10.1186/1756-0500-4-449 (DOI)
Available from: 2012-07-24 Created: 2012-07-24 Last updated: 2017-12-07
2. FAAST: Flow-space Assisted Alignment Search Tool
Open this publication in new window or tab >>FAAST: Flow-space Assisted Alignment Search Tool
2011 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, no 293Article in journal (Refereed) Published
Abstract [en]

Background: High throughput pyrosequencing (454 sequencing) is the major sequencing platform for producing long read high throughput data. While most other sequencing techniques produce reading errors mainly comparable with substitutions, pyrosequencing produce errors mainly comparable with gaps. These errors are less efficiently detected by most conventional alignment programs and may produce inaccurate alignments. less thanbrgreater than less thanbrgreater thanResults: We suggest a novel algorithm for calculating the optimal local alignment which utilises flowpeak information in order to improve alignment accuracy. Flowpeak information can be retained from a 454 sequencing run through interpretation of the binary SFF-file format. This novel algorithm has been implemented in a program named FAAST (Flow-space Assisted Alignment Search Tool). less thanbrgreater than less thanbrgreater thanConclusions: We present and discuss the results of simulations that show that FAAST, through the use of the novel algorithm, can gain several percentage points of accuracy compared to Smith-Waterman-Gotoh alignments, depending on the 454 data quality. Furthermore, through an efficient multi-thread aware implementation, FAAST is able to perform these high quality alignments at high speed. The tool is available at http://www.ifm.liu.se/bioinfo/

Place, publisher, year, edition, pages
BioMed Central, 2011
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-70511 (URN)10.1186/1471-2105-12-293 (DOI)000294177700001 ()
Note
Funding Agencies|Swedish Research Council, the Research School of Medical Bioinformatics||Available from: 2011-09-12 Created: 2011-09-12 Last updated: 2017-12-08
3. An unbiased metagenomic search for infectious agents using monozygotic twins discordantfor chronic fatigue
Open this publication in new window or tab >>An unbiased metagenomic search for infectious agents using monozygotic twins discordantfor chronic fatigue
Show others...
2011 (English)In: BMC Microbiology, ISSN 1471-2180, E-ISSN 1471-2180, Vol. 11, no 2Article in journal (Refereed) Published
Abstract [en]

Background: Chronic fatigue syndrome is an idiopathic syndrome widely suspected of having an infectious orimmune etiology. We applied an unbiased metagenomic approach to try to identify known or novel infectiousagents in the serum of 45 cases with chronic fatigue syndrome or idiopathic chronic fatigue. Controls were theunaffected monozygotic co-twins of cases, and serum samples were obtained at the same place and time.Results: No novel DNA or RNA viral signatures were confidently identified. Four affected twins and no unaffectedtwins evidenced viremia with GB virus C (8.9% vs. 0%, p = 0.019), and one affected twin had previously undetectedhepatitis C viremia. An excess of GB virus C viremia in cases with chronic fatigue requires confirmation.Conclusions: Current, impairing chronic fatigue was not robustly associated with viremia detectable in serum.

Place, publisher, year, edition, pages
BMC, 2011
National Category
Natural Sciences
Identifiers
urn:nbn:se:liu:diva-65704 (URN)10.1186/1471-2180-11-2 (DOI)000286331700001 ()
Note
Original Publication: Patrick F Sullivan, Tobias Allander, Fredrik Lysholm, Shan Goh, Bengt Persson, Andreas Jacks, Birgitta Evengård, Nancy L Pedersen and Björn Andersson, An unbiased metagenomic search for infectious agents using monozygotic twins discordantfor chronic fatigue, 2011, BMC Microbiology, (11), 2. http://dx.doi.org/10.1186/1471-2180-11-2 Licensee: BioMed Central http://www.biomedcentral.com/ Available from: 2011-02-17 Created: 2011-02-17 Last updated: 2017-12-11
4. Characterization of the Viral Microbiome in Patients with Severe Lower Respiratory Tract Infections, Using Metagenomic Sequencing
Open this publication in new window or tab >>Characterization of the Viral Microbiome in Patients with Severe Lower Respiratory Tract Infections, Using Metagenomic Sequencing
Show others...
2012 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 7, no 2Article in journal (Refereed) Published
Abstract [en]

The human respiratory tract is heavily exposed to microorganisms. Viral respiratory tract pathogens, like RSV, influenza and rhinoviruses cause major morbidity and mortality from respiratory tract disease. Furthermore, as viruses have limited means of transmission, viruses that cause pathogenicity in other tissues may be transmitted through the respiratory tract. It is therefore important to chart the human virome in this compartment. We have studied nasopharyngeal aspirate samples submitted to the Karolinska University Laboratory, Stockholm, Sweden from March 2004 to May 2005 for diagnosis of respiratory tract infections. We have used a metagenomic sequencing strategy to characterize viruses, as this provides the most unbiased view of the samples. Virus enrichment followed by 454 sequencing resulted in totally 703,790 reads and 110,931 of these were found to be of viral origin by using an automated classification pipeline. The snapshot of the respiratory tract virome of these 210 patients revealed 39 species and many more strains of viruses. Most of the viral sequences were classified into one of three major families; Paramyxoviridae, Picornaviridae or Orthomyxoviridae. The study also identified one novel type of Rhinovirus C, and identified a number of previously undescribed viral genetic fragments of unknown origin.

Place, publisher, year, edition, pages
Public Library of Science, 2012
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:liu:diva-77340 (URN)10.1371/journal.pone.0030875 (DOI)000302741300031 ()
Note

Funding Agencies|Swedish Research Council|2010-3754|

Available from: 2012-05-11 Created: 2012-05-11 Last updated: 2017-12-07
5. Highly improved homopolymer aware nucleotide-protein alignments with 454 data
Open this publication in new window or tab >>Highly improved homopolymer aware nucleotide-protein alignments with 454 data
2012 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, no 230Article in journal (Refereed) Published
Abstract [en]

Background

Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for translated search tools such as BLASTx where they introduce frame-shifts and result in regions of decreased identity and/or terminated alignments, which affect further analysis.

Results

To address this issue, the Homopolymer Aware Cross Alignment Tool (HAXAT) was developed. HAXAT uses a novel dynamic programming algorithm for solving the optimal local alignment between a 454 nucleotide and a protein sequence by allowing frame-shifts, guided by 454 flowpeak values. The algorithm is an efficient minimal extension of the Smith-Waterman-Gotoh algorithm that easily fits in into other tools. Experiments using HAXAT demonstrate, through the introduction of 454 specific frame-shift penalties, significantly increased accuracy of alignments spanning homopolymer sequence errors. The full effect of the new parameters introduced with this novel alignment model is explored. Experimental results evaluating homopolymer inaccuracy through alignments show a two to five-fold increase in Matthews Correlation Coefficient over previous algorithms, for 454-derived data.

Conclusions

This increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis. The alignment tool is available at http://bioinfo.ifm.liu.se/454tools/haxat.

National Category
Natural Sciences
Identifiers
urn:nbn:se:liu:diva-86192 (URN)10.1186/1471-2105-13-230 (DOI)000314682700001 ()
Available from: 2012-12-10 Created: 2012-12-10 Last updated: 2017-12-07Bibliographically approved

Open Access in DiVA

Bioinformatic methods for characterization of viral pathogens in metagenomic samples(1154 kB)1661 downloads
File information
File name FULLTEXT01.pdfFile size 1154 kBChecksum SHA-512
ffe72980523f4ed803e8d1c76d244ffdd00458c3613f94339a14df33a9415bfcc37a59f53488ac97c105932f9dcef4cb613e0a92c07f3804f213cd4b1095107f
Type fulltextMimetype application/pdf
omslag(956 kB)87 downloads
File information
File name COVER01.pdfFile size 956 kBChecksum SHA-512
e5afaba3ddb44d91f9240a93e3c6cf070cf345ad4b4491137105176fabe7d28fefc2c0b6324308739aa284a28456404b5f9212309fb2891971a576957c6921c0
Type coverMimetype application/pdf

Authority records BETA

Lysholm, Fredrik

Search in DiVA

By author/editor
Lysholm, Fredrik
By organisation
BioinformaticsThe Institute of Technology
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1661 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1302 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf