Deep Learning with Big Data for Genetic Epidemiology

poster session

monday

Authors

Affiliations

Max Kovalenko

Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden

Filip Thor

Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden

Carl Nettelblad

Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden

Science for Life Laboratory (SciLifeLab)

Åsa Johansson

Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden

Science for Life Laboratory (SciLifeLab)

Published

November 4, 2024

ABSTRACT:

The project builds upon an earlier deep neural network model developed for deriving a low-dimensional representation of SNP data. This can be used to perform feature selection for downstream models as well as to identify and visualize population structure. The most common approach relies on PCA components, but it has shortcomings such as linearity and sensitivity to outliers. In contrast, our model, being nonlinear and data-driven, has been shown to produce more informative embeddings.

So far the model has been adapted to training on large datasets and trained on SNP data from UK Biobank, which has one of the most comprehensive databases of human genotypes and phenotypes to date. After these first training attempts, the model is now being improved and tuned. Further plans include applying other deep learning methods to UK Biobank, such as contrastive learning, and extending the model with phenotype prediction capabilities to aid genome-wide association studies.