Data QC
Data Quality Control (Missingness, MAF, HWE)
Catch bad markers and bad samples before they corrupt downstream analysis.
How it works
Every genomics pipeline lives or dies by QC. We compute per-marker missingness, minor allele frequency (MAF), and Hardy–Weinberg equilibrium p-values, plus per-sample missingness and heterozygosity. We flag outliers and apply user-configurable filters before any GWAS or genomic-selection run.
Formula
MAF = min(p, 1−p). HWE χ² = Σ (observed − expected)² / expected, with expected from Hardy–Weinberg proportions.
What you get
- ▸Per-marker MAF, missingness, and HWE p-value distributions
- ▸Per-sample missingness and heterozygosity outliers
- ▸Filtered marker and sample lists
When to use it
- ▸On every new genotype dataset, immediately after upload
- ▸Before running GWAS, GS, or population-structure analyses
- ▸When troubleshooting unexpected results from downstream modules
References
Run Data QC on your data
Open the module and upload a CSV.