E.A. van der Toorn

Bachelor thesis (1)

Master thesis (1)

2 records found

Batch correction of taxonomic data of the human gut microbiome for generalization of case-control classification

Master thesis (2022) - E.A. van der Toorn (author), T.E.P.M.F. Abeel (mentor), C. Peng (graduation committee member)

Next-Generation Sequencing (NGS) has made it possible to perform metagenomic sequencing of environmental microbiome samples. Colorectal cancer (CRC) benefits from early detection, and many studies find correlations between disease presence and abundance of species in samples of the microbiome. However, these studies are hard to reproduce and even harder to build diagnostic tools from, and one of the major factors for this is the inherent bias in the datasets that were collected, the so-called batch effect.
To investigate the extent to which batch effect impacts the generalization of binary classifiers, we performed a benchmark of eleven batch correctors: four existing tools, three transformations and three encoders, assessing the subsequent performance of seven supervised binary classifiers using a leave-one-dataset-out (LODO) validation method. In addition, batch effect was measured through both visual (tSNE) and numeric (linear models) methods before and after applying each of the correctors, and the performance at different dataset counts was measured.
Batch effect was shown to be present in the shotgun metagenomic data, being reduced by some correction tools while being strengthened by others. Evaluations using AUROC showed that combining datasets without correction improved generalization, even at an equivalent number of samples. When combining batch correctors and different classifiers, the performance over the baseline did not improve significantly. Contrary to its popularity as batch corrector, the performance significantly worsened when using ComBat before training each of the binary classifiers.
Thus, even though batch correctors reduce batch effect within our taxonomic count data, they do not significantly improve classification performance when generalizing to separate datasets. We can thus advise against focusing on choosing a batch corrector when building tools for predicting diagnosis of CRC and instead aiming to improve the pool of datasets to learn from.
The code for reproducing the results and figures in this work have been made available at https://github.com/AbeelLab/ngs-batch-evaluation

Knowing one’s opponents: Self Modeling Advantage Actor Critic for the Iterated Prisoner’s Dilemma

Bachelor thesis (2020) - E.A. van der Toorn (author), N. Yorke-Smith (mentor), J.G.H. Cockx (graduation committee member), C.T. Ponnambalam (coach)

A recent advancement in Reinforcement Learning is the capability of modelling opponents. In this work, we are interested in going back to basics and testing this capability within the Iterated Prisoner's Dilemma, a simple method for modelling multi agent systems. Using t ...