Training deep learning models for time-series prediction of a target population often requires a substantial amount of training data, which may not be readily available. This work addresses the challenge of leveraging multiple related sources of time series data in the same featu
...
Training deep learning models for time-series prediction of a target population often requires a substantial amount of training data, which may not be readily available. This work addresses the challenge of leveraging multiple related sources of time series data in the same feature space to improve the prediction performance of a deep learning model for a target population. Specifically, we focus on a scenario where the target dataset, representing the desired target population, is underrepresented, while the source datasets consist of mismatched populations that are sufficiently representative for training a deep learning model. In this study, we explore state-of-the-art techniques, including transfer learning, ensemble learning, and domain adaptation to leverage source datasets towards a target population using real-world medical data. Additionally, we investigate the use of model performance-derived baselines as a heuristic to quantify the magnitude of the distribution mismatch between a source(s) and a target. Our results demonstrate that a set of well-defined baselines can effectively quantify the distribution mismatch and provide insights into the choice of leveraging technique for a given mismatch scenario. Furthermore, our results show that all state-of-the-art techniques can be employed to leverage related source datasets towards the target, though the performance of these techniques varies depending on the characteristics of the distribution mismatch. Eventually, we discuss the applicability of this research to new scenarios, along with avenues for future research.