Research challenges

Our overall goal is to understand complex genetic variants that underlie human disease. We are particularly interested in repetitive DNA variants known as short tandem repeats (STRs) as a model for complex variation. We currently focus on the specific areas described below.

Analyzing and visualizing repetitive genetic variation

Analyzing repeats from next-generation data is challenging due to the limitations of short reads and higher error rates at repeats. We previously published a tool, lobSTR, to overcome these challenges at short tandem repeats, or STRs. We are leveraging new sequencing technologies and novel bioinformatic methods to access longer and more complex repetitive regions that are traditionally filtered from sequencing studies. We also develop and maintain PyBamView, an alignment visualization tool that is especially helpful for analyzing indels and repeats.

Dissecting the contribution of repetitive regions to complex traits

Although we are learning more and more about specific genetic variants involved in regulating gene expression or associated with certain diseases, repetitive variation such as STRs are often not well captured by such studies. We recently showed that STRs play an important role in gene expression, and thus are likely to be important in complex traits. We are now building on this observation to develop statistical methods to incorporate analysis of repeats into genome-wide association studies, with a particular focus on psychiatric disease.

Predicting the impact of non-coding variation

The vast majority of genetic variants identified by genome-wide association studies lie in regions of the genome that do not code for proteins, and thus are difficult to interpret. We are leveraging machine learning techniques that predict the regulatory impact of non-coding variants, in combination with patterns of genetic variation in the population, to predict the impact of individual mutations.