Comparison of dimension reduction-based logistic regression models forcase-control genome-wide association study: principal
components analysis vs. partial least squares
-
Honggang Yi,
-
Hongmei Wo,
-
Yang Zhao,
-
Ruyang Zhang,
-
Junchen Dai,
-
Guangfu Jin,
-
Hongxia Ma,
-
Tangchun Wu,
-
Zhibin Hu,
-
Dongxin Lin,
-
Hongbing Shen,
-
Feng Chen
-
Graphical Abstract
-
Abstract
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify
genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is
traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the
well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based
techniques, such as principal component-based logisticregression (PC-LR), partial least squares-based logistic regression
(PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor-
mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to
compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide
polymorphism(SNP)setregion.WefoundthatPC-LRandPLScanreasonablycontroltypeIerrorundernullhypothesis.
Oncontrast,LR,whichiscorrectedbyBonferronimethod,wasmoreconservedinallsimulationsettings.Inparticular,we
found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP
was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set
analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
-
-