Introduction to statistical genetics lecture in PNU winter workshop 2019



Overview: This hands-on sessions introduce statistical methods and computational tools for analyzing high-throughput sequence data in population scale.

Description:

The dramatic advance of high-throughput in the last decade has produced tremendous amount genomic, epigenomic, and transcriptomic sequence data at an unprecedented scale. Several factors such as M&A of Illumina and PacBio in last year, PromethION technology of ONT, and MGISEQT7 kit of BGI will make the price of whole genome sequencing much cheaper.
In this lecture, we introduce students to two methods for analysis of genomic sequencing data, and learn how to perform them. The first is GAPIT, a tool for univariate analysis which is the most widely used for GWAS in bioinformatics using Mixed Linear Model to accommodate population structure. The second is regularization procedures, which outperform univariate analysis using the penalty function. Using two methods, univariate analysis and regulrization procedure, we will analyze the imputed sequencing dataset and learn how to interpret the results.

We will assume that you have basic R programming skills, which can be obtained by taking a free online course Datacamp or Coursera.

When: January 28-30 (2019), 16:00 PM - 17:30 PM

Where: 313-103, Pusan National University, Pusan, Korea.




Part 1. Genome Association and Prediction Integrated Tool (GAPIT).

Part Title Topics
I GAPIT 1.Introduction of Genome-wide Association Study(GWAS)
    2.Statistical Model of GAPIT
    3.Quality Control(QC) before Analysis
    4.Analysis using GAPIT & result
        - code 1: treating heterozygosity
        - code 2: imputation and controlling MAF
        - code 3: optimal PC number
        - code 4 : Compressed MLM
        - code 5 : Enriched CMLM




Part 2. Regularization procedures for variable selection.

Part Title Topics
II Regularization Variable selection in high-dimensional data
    Regularization procedures
    “glmnet”
    “The Lasso”
        - Solution path
        - Cross-validation
        - Variable selection
        - Comparison with univariate analysis
        - Prediction
    Elastic-net
        - Cross-validation for two tuning parameters
    Covariate-adjusted model




Part 3. Selection probabilities.

Part Title Topics
III Selection probabilties An algorithm of selection probabilities
        - Setting a grid of tuning parameters
        - Applying the regularization for subsamples
    Why split data with 0.5 proportion
        - Advantages of “subagging”
    Algorithm summary
    The stability path
    Threshold to control the false positive
    Manhattan plot with selection probabilities