10-Oct-2022

Introduction to DNA Methylation Analysis: From Wet Lab Experiments to Bioinformatics Analysis

Summary

Methylation is one of the most classical epigenetic modifications in eukaryotes. DNA methylation regulates gene expression and has important implications in both growth and disease-related research. DNA methylation affects the maturation of germ cells or embryonic cells subject to specific gene expression. It has also been widely studied and applied to track aging and various forms of human disease. Various diseases, especially cancer have a pattern of DNA methylation that is not present in normal cells, resulting in differences in gene expression levels.

Author Name: Dianna Gellar

Editor: Dianna Gellar Last Updated: 11-Oct-2022

Mainstream DNA methylation analysis chemically converts cytosines to other bases based on their methylation status. This conversion is usually performed by sodium bisulfite. Cytosine conversion occurs on single-stranded DNA and disrupts strand complementarity. Traditional library preparation procedures first ligate Y-links to double-stranded DNA prior to heavy sulfite conversion.

Reduced representation bisulfite sequencing (RRBS) and whole-genome bisulfite sequencing (WGBS) are popular approaches for genome-wide methylation analysis. Both methods include bisulfite conversion and next-generation sequencing (NGS). The main difference is that RRBS uses appropriate restriction endonucleases and size selection to screen for GC-rich regions. The advantage of WGBS (especially MethylC-seq) is the ability to cover most of the CpGs in the genome. The purification and screening process of WGBS is relatively simple compared to RRBS. Preventing degradation losses during bisulfite transformation in WGBS is considered relatively important.

It can also be done by bisulfite-free methods. Using the affinity binding of methylcytosine and exploiting the sensitivity of restriction endonucleases to methylcytosine. MBD-seq and MeDIP-seq are representative affinity-based methods. Affinity-based methods are not suitable for application at the single-cell scale because these methods generate average DNA methylation profiles based on DNA fragments, which do not allow for differentiation of differences in DNA methylation patterns in individual cells.

After sequencing, including RRBS and WGBS, data pre-processing is required. The preprocessing steps can be divided into data quality control (QC), sequence trimming and comparison, e.g. measuring overall basic sequencing data quality using FastQC and screening using software such as Trim Galore!, fastp and Trimmomatic.

DNA methylation generally follows three steps for data mining. First, the analysis of overall genome-wide methylation changes is performed, including mean methylation level changes, methylation level distribution changes, dimensionality reduction analysis, clustering analysis, and correlation analysis. Secondly, differential methylation level analysis was performed to screen specific differential genes, including methylated cytosine (DMC) identification, differentially methylated region (DMR) detection, and dimethylglyoxime (DMG) identification, analyzing the distribution of DMC or DMR on genomic elements, transcription factor (TF) binding analysis of DMC and DMR, time-series analysis of methylation level, as well as functional analysis of DMG. Finally, methylome and transcriptomics association analysis, including DMG-DEG correspondence association, network association, etc. will be performed.

The primary goal of methylation analysis is to explore epigenetic evidence of differences between constituent samples, organs and disease states, including cancer.

Exploring intercellular epigenetic differences is key to understanding tissue heterogeneity. The most effective strategy for capturing intercellular diversity is to map the methylation of individual cells. Analysis of single-cell DNA methylation profiles provides an opportunity for the discovery of such heterogeneity. Genome-wide methylation analysis at the single-cell level will provide insight into transcriptional regulation and cellular heterogeneity.

The key components of single-cell DNA methylation data analysis include data processing, quality control, dimensionality reduction, cell clustering and annotation, cell lineage reconstruction, gene activity scoring, and integration with transcriptome data. Single-cell DNA methylation data analysis are pre-processed through library QC, contamination control, conversion QC, diploid screening.

And for all single-cell data, data sparsity is a common challenge. Filtering cells for reads with moderate genomic coverage or comparisons can effectively reduce data sparsity. Cell and feature clustering analysis. To further enhance the data signal, signals from multiple CpGs can be derived to alleviate data sparsity, while also eliminating the need to recover lost data from each CpG individually.