← Back
Tumor selective Ru(III) Schiff bases complexes with strong in vitro activity toward cisplatin-resistant MDA-MB-231 breast cancer cells.
Human Genetics (2019) 138:307–326
https://doi.org/10.1007/s00439-019-01989-8
ORIGINAL INVESTIGATION
Genetic variant predictors of gene expression provide new insight
into risk of colorectal cancer
Stephanie A. Bien1,62 · Yu‑Ru Su1,62 · David V. Conti2,3,62 · Tabitha A. Harrison1,62 · Conghui Qu1,62 · Xingyi Guo4,62 ·
Yingchang Lu4,62 · Demetrius Albanes5,62 · Paul L. Auer6,62 · Barbara L. Banbury1,62 · Sonja I. Berndt5,62 ·
Stéphane Bézieau7,8,62 · Hermann Brenner9,10,11,62 · Daniel D. Buchanan12,13,14,62 · Bette J. Caan15,62 ·
Peter T. Campbell16,62 · Christopher S. Carlson1,62 · Andrew T. Chan17,18,62 · Jenny Chang‑Claude19,20,62 ·
Sai Chen21,62 · Charles M. Connolly1,62 · Douglas F. Easton22,62 · Edith J. M. Feskens23,62 · Steven Gallinger24,62 ·
Graham G. Giles12,25,62 · Marc J. Gunter26,62 · Jochen Hampe27,62 · Jeroen R. Huyghe1,62 · Michael Hoffmeister9,62 ·
Thomas J. Hudson28,29,62 · Eric J. Jacobs16,62 · Mark A. Jenkins12,62 · Ellen Kampman23,62 · Hyun Min Kang21,62 ·
Tilman Kühn30,62 · Sébastien Küry7,8,62 · Flavio Lejbkowicz31,32,62 · Loic Le Marchand33,62 · Roger L. Milne12,25,62 ·
Li Li34,62 · Christopher I. Li1,62 · Annika Lindblom35,36,62 · Noralane M. Lindor37,62 · Vicente Martín38,39,62 ·
Caroline E. McNeil2,62 · Marilena Melas2,62 · Victor Moreno39,40,41,62 · Polly A. Newcomb1,62 · Kenneth Offit42,62 ·
Paul D. P. Pharaoh43,62 · John D. Potter1,62 · Chenxu Qu2,62 · Elio Riboli44,62 · Gad Rennert31,32,62 · Núria Sala45,46,62 ·
Clemens Schafmayer47,62 · Peter C. Scacheri48,62 · Stephanie L. Schmit49,50,62 · Gianluca Severi51,62 ·
Martha L. Slattery52,62 · Joshua D. Smith53,62 · Antonia Trichopoulou54,55,62 · Rosario Tumino56,62 ·
Cornelia M. Ulrich57,62 · Fränzel J. B. van Duijnhoven23,62 · Bethany Van Guelpen58,62 · Stephanie J. Weinstein5,62 ·
Emily White1,62 · Alicja Wolk59,60,62 · Michael O. Woods61,62 · Anna H. Wu2,3,62 · Goncalo R. Abecasis21,62 ·
Graham Casey51,62 · Deborah A. Nickerson53,62 · Stephen B. Gruber2,62 · Li Hsu1,62 · Wei Zheng4,62,63 · Ulrike Peters1,62
Received: 28 September 2018 / Accepted: 20 February 2019 / Published online: 28 February 2019
© The Author(s) 2019
Abstract
Genome-wide association studies have reported 56 independently associated colorectal cancer (CRC) risk variants, most of
which are non-coding and believed to exert their effects by modulating gene expression. The computational method PrediXcan uses cis-regulatory variant predictors to impute expression and perform gene-level association tests in GWAS without
directly measured transcriptomes. In this study, we used reference datasets from colon (n = 169) and whole blood (n = 922)
transcriptomes to test CRC association with genetically determined expression levels in a genome-wide analysis of 12,186
cases and 14,718 controls. Three novel associations were discovered from colon transverse models at FDR ≤ 0.2 and further
evaluated in an independent replication including 32,825 cases and 39,933 controls. After adjusting for multiple comparisons,
we found statistically significant associations using colon transcriptome models with TRIM4 (discovery P = 2.2 × 10− 4, replication P = 0.01), and PYGL (discovery P = 2.3 × 10− 4, replication P = 6.7 × 10− 4). Interestingly, both genes encode proteins
that influence redox homeostasis and are related to cellular metabolic reprogramming in tumors, implicating a novel CRC
pathway linked to cell growth and proliferation. Defining CRC risk regions as one megabase up- and downstream of one of
the 56 independent risk variants, we defined 44 non-overlapping CRC-risk regions. Among these risk regions, we identified
genes associated with CRC (P < 0.05) in 34/44 CRC-risk regions. Importantly, CRC association was found for two genes
in the previously reported 2q25 locus, CXCR1 and CXCR2, which are potential cancer therapeutic targets. These findings
provide strong candidate genes to prioritize for subsequent laboratory follow-up of GWAS loci. This study is the first to
Stephanie A. Bien and Yu-Ru Su contributed equally to this work.
Electronic supplementary material The online version of this
article (https://doi.org/10.1007/s00439-019-01989-8) contains
supplementary material, which is available to authorized users.
Extended author information available on the last page of the article
13
Vol.:(0123456789)
308
Human Genetics (2019) 138:307–326
implement PrediXcan in a large colorectal cancer study and findings highlight the utility of integrating transcriptome data
in GWAS for discovery of, and biological insight into, risk loci.
Introduction
It is estimated that genetic variants explain 12–35% of the
heritability in colorectal cancer (CRC) risk (Lichtenstein
et al. 2000; Czene et al. 2002; Jiao et al. 2014). To date,
Genome-Wide Association Studies (GWAS) have identified
56 independent common risk variants that are robustly associated with CRC (Peters et al. 2015; Schumacher et al. 2015;
Orlando et al. 2016). However, the functional relevance of
most discovered CRC-risk variants (89%) remains unclear.
The biological mechanisms linking CRC-associated risk variants with target genes have only been validated in the laboratory for six regions [8q24 MYC (Pomerantz et al. 2009),
8q23.3 EIF3H (Pittman et al. 2010), 11q23.1 COLCA1 and
COLCA2 (Biancolella et al. 2014), 15q13.3 GREM1 (Lewis
et al. 2014), 16q22.1 CDH1 (Shin et al. 2004), and 18q21.1
SMAD7 (Fortini et al. 2014)]. Given that most of the associated loci do not include coding variants, a large portion
of CRC genetic risk is thought to be explained by regulatory variation that modulates the expression of target genes.
This hypothesis is supported by the observation that CRC
risk variants are enriched in colon expression quantitative
trait loci (eQTLs) (Hulur et al. 2015) and active regulatory
regions of colorectal enhancers (Bien et al. 2017). Together,
this evidence highlights the value of studying transcriptional
regulation in relation to CRC risk.
Large-scale efforts are underway to map regulatory elements across tissues and cell types. Many transcriptome
studies have been conducted where genotype and expression levels are jointly assayed for many individuals, enabling the discovery of tissue-specific eQTLs. For instance,
the Genotype-Tissue Expression (GTEx) Project (GTEx
Consortium 2013) is building a biospecimen repository to
comprehensively map tissue-specific eQTLs across human
tissues, which currently includes transcriptomes from 169
colon transverse samples. These data provide a remarkable
new resource for understanding function in non-coding
regions that can be used to inform GWAS.
We employed the computational method, PrediXcan
(Gamazon et al. 2015), to perform a CRC transcriptomewide association study using reference datasets to ‘impute’
unobserved expression levels into GWAS datasets. Variant prediction models were developed using colon transverse transcriptomes (n = 169) from GTEx (GTEx Consortium 2013) and a larger whole blood transcriptome panel
(n = 922) from the depression genes and networks (DGN)
(Battle et al. 2014). We included whole blood as a previous
13
analysis demonstrated that gene regulatory elements of
immune cell types from peripheral blood are enriched for
variants with more significant CRC association P (Bien et al.
2017). Further, laboratory follow-up of the CRC GWAS
locus 11q23 implicates two genes, COLCA1 and COLCA2,
which are co-expressed in immune cell types and correlate
with inflammatory processes (Peltekova et al. 2014). In addition to novel discovery, the PrediXcan approach can aid in
prioritization of candidate target genes in non-coding GWAS
loci and thereby inform testable hypotheses for laboratory
follow-up. Therefore, as a secondary analysis we investigated the association of imputed gene expression with CRC
in the 44 genetic regions harboring one or more of the 56
independent variants (r2 < 0.2) that are associated with CRC
in previous GWAS (P ≤ 5 × 10− 8) and were replicated in an
independent dataset.
We aimed to discover novel loci associated with CRC,
and refine established regulatory risk loci by reducing the
list of putative gene targets. Employing PrediXcan, we tested
genetically regulated gene expression for association with
CRC in a two-stage approach. In the discovery stage, up
to 8277 gene sets were tested in 12,186 cases and 14,718
controls from the Genetics and Epidemiology of Colorectal
Cancer Consortium (GECCO) and the Colon Cancer Family
Registry (CCFR). This discovery set was also used to identify potential target genes in the 44 genetic regions harboring 56 known CRC risk variants. We attempted replication
of three novel genes that were not positioned within 1 Mb
of the 56 previously reported risk variants and with false
discovery rate (FDR) ≤ 0.2 for CRC risk in a large and independent study of 32,825 cases and 39,933 controls from the
Colorectal Transdisciplinary (CORECT) consortium, UK
Biobank, and additional CRC GWAS (Fig. 1).
Results
Imputation of genetically regulated gene
expression
Gene expression levels were imputed using previously published multi-variant models built using elastic net regularization (variant weight gene models V6 available online from
PredictDB.org). For each tissue and gene, a quality metric
referred to as predictive R2 was provided as the correlation
between the observed and predicted expression from the
multi-variant model based on a tenfold cross validation.
Human Genetics (2019) 138:307–326
309
Fig. 1 Schematic illustration of the study design training data was
comprised of joint observations of imputed variant genotypes and
tissue-specific gene expression from reference datasets (DGN and
GTEx). Elastic net regularization was used to train genetic variant
predictors of gene expression and downloaded from PredictDB.org.
Models for colon transverse tissues and whole blood were used for
imputation of expression into independent GWAS datasets for Colo-
rectal Cancer (CRC). Imputed gene expression was then tested for
association with case (ca.)–control (co.) status in the discovery stage.
Novel gene associations with a false discovery rate (FDR) = 0.2 were
assessed in an independent CRC GWAS dataset. As a secondary analysis, the association of genetically determined expression of genes in
44 GWAS-associated risk regions was examined
After restricting to protein coding genes with a predictive
R2 > 0.01 (≥ 10% correlation between predicted and observed
expression), the discovery analysis tested the association of
imputed expression for 4850 genes using colon transverse
models and 8277 genes using whole blood models. On average, colon transverse models used 22 variants (SD = 19) per
gene with a range of 1–173 variants. The number of variants
in whole blood models were slightly larger on average with
a mean of 34 variants (SD = 24) per gene, ranging from 1 to
213 variants. We report CRC association results and predictive R2 for imputed expression of each gene with P ≤ 0.05 in
either colon transcriptome or whole blood analysis (Online
Resource 2 Table S2).
a novel CRC region using an independent GWAS dataset
comprised of 32,825 cases and 39,933 controls from the
CORECT consortium, UK Biobank, and additional GWAS
as described in Online Resource 1. In the discovery phase,
colon transcriptome models identified CRC association with
imputed genetically regulated gene expression in three putative novel regions. Two out of three genes tested in the replication dataset were significant after adjusting for multiple
comparisons (α = 0.05/3 = 0.017) (Online Resource Fig S1,
Table 1). In addition to being more than 1 Mb away from
previously identified risk variants, we confirmed that none
of the variant predictors used to impute gene expression
for these three genes were in LD (r2 ≤ 0.1) with previously
published CRC-risk variants. In the 7q22.1 locus, increased
expression of TRIM4 was associated with reduced CRC risk
with an odds ratio (OR) of 0.94 [95% confidence interval
(CI) 0.91–0.97, discovery P = 2.2 × 10− 4]. Reduced CRC
risk was also statistically associated with increased genetically regulated gene expression of TRIM4 in the independent replication dataset (P = 0.01). The second novel locus,
14q22.1, was also found to be inversely associated, where
increased genetically regulated gene expression of PYGL
was associated with decreased CRC risk, showing an OR
Discovery of new CRC susceptibility genes
In total, multivariate logistic regression was used to test the
association of CRC with genetically impute gene expression for 4850 genes from colon transverse models and 8277
genes from whole blood models. We employed PrediXcan
in 12,186 cases and 14,718 controls from 16 GWAS studies. Replication was attempted for associations meeting an
FDR = 0.2 threshold in the discovery phase if they were in
13
310
Human Genetics (2019) 138:307–326
Table 1 Genes passing discovery threshold in novel loci from colon transverse PrediXcan
Locus
7q22.1
14q22.1
16q24.3
Gene
TRIM4
PYGL
SLC22A31
Direction of gene expression
for increased CRC risk
Decrease
Decrease
Increase
Discovery (n ca./
co. = 12,186/14,718)
Replication (n ca./
co. = 32,825/39,939)
PrediXcan gene model
information
P
P
R2
Number of
predictive
variants
1.7 × 10− 4
2.3 × 10− 4
1.3 × 10− 4
1.1 × 10− 2
8.7 × 10− 4
0.62
0.51
0.26
0.14
62
23
29
P For the association between CRC and the genetically determined gene expression in discovery and replication GWAS studies
R2 = the cross-validated R2 value found when training the model (predictive R2 from PredictDB.org). Replicated at α = 0.05/3 genes = 1.7 × 10− 2
of 0.90 (95% CI 0.85–0.96) in the discovery dataset (discovery P = 2.3 × 10− 4) as well as in the replication dataset
(P = 7.9 × 10− 4). Imputed genetically regulated gene expression for SLC22A31 was associated with increased CRC risk
in the discovery phase (P = 1.3 × 10− 4), but did not replicate in the independent dataset. We found no associations in
novel regions using whole blood variant models that reached
FDR = 0.2 in the discovery phase.
Colon Transverse PrediXcan analyses were repeated for
TRIM4 and PYGL in the discovery dataset stratifying cases
by proximal (n = 4454 cases), distal (n = 3580 cases), and
rectal (n = 2936 cases) cancer sites. We excluded 1216 cases
from the stratified analysis because the colon cancer site was
unspecified. We found that for both genes the effects and
p values were similar between the three sites. For TRIM4,
the CRC association with genetically imputed gene expression had an OR of 0.94 (95% CI 0.90–0.98, P = 3 × 10− 3)
in proximal colon cases compared to an OR of 0.95 (95%
CI 0.90–1.0, P = 5 × 10− 2) in distal colon cases and an OR
of 0.93 (95% CI 0.88–0.98, P = 2 × 10− 2) in rectal cases.
There was no significant difference in the effect estimates
between these cancer sites for TRIM4 (Q-test for heterogeneity P = 1.0). Similarly, for PYGL, the CRC association with
genetically regulated gene expression had an OR of 0.89
(95% CI 0.82–0.97, P = 3 × 10− 3) in proximal colon cases
compared to an OR of 0.91 (95% CI 0.83–1.0, P = 2 × 10− 2)
in distal colon cases and an OR of 0.86 (95% CI 0.77–0.95,
P = 5 × 10− 4) in rectal cases with no significant difference in
effects (Q test for heterogeneity P = 0.98).
We further investigated the replicated CRC-associated
PrediXcan genes by summarizing the single-variant CRC
association results for variants that were included in the
prediction models, referred to hereafter as ‘variant predictors’ (Online Resources 3–6 Fig S2). In TRIM4, the association was mostly driven by one LD block with 62 correlated
genetic variant predictors used to impute genetically regulated gene expression in colon tissue models. Among the
variant predictors of TRIM4, rs2527886 was most significantly associated with CRC (P = 1.8 × 10− 4). Bioinformatic
13
follow-up of the TRIM4 locus showed that in the genomic
region containing variants correlated with rs2527886, there
were six enhancers with strong Chromatin Immunoprecipitation Sequencing (ChIP-seq) H3K27ac signal in either
normal colorectal crypt cells or a CRC cell line (Online
Resource 1 Fig S3). Using peak signal from H3K27ac activity to define enhancer regions, two enhancers were gained in
ten or more CRC cell lines compared to normal colorectal
crypt cells, referred to as recurrent variant enhancer loci
(VEL) (Akhtar-Zaidi et al. 2012). Rs2527886 is positioned
within one of these VEL. Peak ChIP-seq binding region for
CTCF suggests that the VEL harboring rs2527886 may be in
physical contact with the TRIM4 promoter. In the same VEL,
one of the LD variants, rs2525548 (LD r2 = 0.99), is positioned within transcription factor binding sites for RUNX3,
FOX, NR3C1, and BATF (Online Resource 1 Fig S3). In the
PYGL locus, rs12589665 is the variant predictor with the
strongest marginal association with CRC (P = 3.2 × 10− 4).
We identified 7 enhancers in the region spanning the variants
in LD with rs12589665, and three variants in LD with the
lead predictor variant were positioned in VEL. Two of these
variants, rs72685325 (r2 = 0.62) and rs72685323 (r2 = 0.53),
were positioned within binding sites for 7 transcription factors (Online Resource 1 Fig S3).
A series of exploratory analyses were conducted to
assess whether the observed inflation in association signals
(λ = 1.1) was the result of bias in our data or modeling error.
Results suggest that inflation was not driven by genes with
low predictive R2 values (Online Resource 1 Fig S4), other
potential confounding factors common to GWAS like genotyping batch effects (Online Resource 1 Fig S5) or cryptic population structure (Online Resource 1 Fig S6–S7),
or due to inflated Z statistics by modeling genes with little
variability in expression (Online Resource 1 Fig. S8–S11).
Observed inflation was slightly reduced, but still elevated
when looking at the marginal association results for the variant predictors (λ = 1.07; Online Resource 1 Fig S12) and
when excluding genes with high predicted co-expression
(λ = 1.07; Online Resource 1 Fig S13). Collectively, this
Human Genetics (2019) 138:307–326
exploration suggests that the observed inflation is less likely
to be the result of modeling or analytical error and more
likely reflects the polygenicity of CRC.
Refinement of known CRC GWAS‑risk regions
We first assembled a list of 56 previously reported independent (r2 ≤ 0.2) CRC GWAS risk variants and defined
a distance-based region surrounding each variant as the
chromosomal position of the first reported (index) variant ± 1 Mb (Online Resource 1 Table S3). We then combined
overlapping risk regions by taking the minimum and maximum chromosomal positions of all regions that overlapped,
resulting in a total of 44 CRC risk regions harboring 1–4
independent CRC-risk variants. In these 44 regions, there
was an average of 20 (SD ± 17) protein-coding genes per
region annotated by the Consensus Coding Sequence Database (CCDS). The average number of protein-coding genes
per region with imputed genetically regulated gene expression in the tissue-specific models was reduced to an average
of 10 (SD ± 8) genes in colon transverse, and 14 (SD ± 11)
genes in whole blood. Further, in these regions we found
that of the total number of genes with genetically regulated
gene expression across the two models, an average 45% of
the genes overlapped. We found that 34/44 (77%) of CRCrisk regions overlapped the transcription start site of a gene
associated with CRC at a P < 0.05. Comparing the number
of genes with a P < 0.05 to the total number of CCDS genes
within 1 Mb of an index variant resulted in an average reduction of 82% per region (Table 2).
We further investigated the regions that did not show evidence of gene association and found that GWAS reported
risk variants in 3/10 of these regions were a coding variant or were in LD with a coding variant (3q26-MYNN and
LRRC34, 10q24.32-WBP1L, 14q22.2-BMP4). Additionally,
2/10 of the risk variants were originally discovered in East
Asian populations and risk SNPs had weaker association
in our study (10q22.3-rs704017 P = 1 × 10− 4 and 10q24.32rs4919687 P = 1 × 10− 2). Another 2/10 GWAS risk variant
did not replicate in our study (4q31.1-rs60745952 P = 0.8
and 16p13.2-rs79900961 P = 0.26). In the remaining 3/10
regions, we found that the index variants did not reach
genome-wide significance, reflecting power limitations in
our discovery dataset (4q32.2-rs35509282 P = 6 × 10− 3,
16q24.1-rs16941835 P = 4 × 10− 3, and 20p12.3-rs961253
P = 4 × 10− 5).
Among the 34 regions containing associated genes, we
found that the most significant gene association in the PrediXcan analysis was often the strongest candidate based on
either known CRC etiology and gene function or results
from previous laboratory follow-up (e.g. COLCA2, LAMC1,
POLD3, SMAD7, TGFB1). In addition to confirming suspected genes, new candidates were also identified. For
311
example, CXCR1 (P = 8 × 10− 5) and CXCR2 (P = 9 × 10− 5)
were among the strongest associations. Notably, these genes
are biologically relevant targets given that they encode
cytokine receptors known to be implicated in a variety of
cancers.
Discussion
In this study, we employed the PrediXcan in 12,186 cases
and 14,718 controls. Genetic variant predictors of gene
expression from both colon transverse and whole blood transcriptomes were used to test the association of CRC risk
with imputed gene expression. We replicated novel associations of TRIM4 and PYGL in a large independent study of
over 70,000 participants. In addition, we identified strong
gene targets in several known GWAS loci, including genes
that were previously not reported as putative candidates.
The two novel gene associations discovered in colon
transverse models implicate genes involved with hypoxiainduced metabolic reprogramming, which is a hallmark
of tumorigenesis in solid tumors. TRIM4 is a member of
a superfamily of ubiquitin E3 ligases comprised of over
70 genes notably defined by a highly conserved N-terminal RING finger domain. This family of proteins has been
implicated in a number of oncogenic or tumor suppressor
activities that involve pathways related to CRC (Myc, Ras,
etc.) (Sato et al. 2012; Chen et al. 2012; Zaman et al. 2013;
Tocchini et al. 2014; Zhou et al. 2014; Zhan et al. 2015), and
recently have been implicated in inflammatory and immune
related activities (Eames et al. 2012; Versteeg et al. 2014).
Somatic alterations in other TRIM genes have been associated with a large number of cancers including colon (Glebov
et al. 2006; Noguchi et al. 2011; Hatakeyama 2011). While
TRIM4 has not previously been implicated in cancer risk,
the strong homology across gene members of this family and
their implications in cancer and immunity make this gene an
interesting candidate. Moreover, a recent study suggests that
expression of TRIM4 plays a role in sensitizing cells to oxidative stress-induced death and regulation of reactive oxygen
species (ROS) levels ( H2O2) through ubiquitination of the
redox regulator peroxide reductase (Tomar et al. 2015). Regulation of ROS levels and the cellular antioxidant system has
previously been implicated in the pathophysiology of many
diseases including inflammation and tumorigenesis (LópezLázaro 2007; Holmdahl et al. 2013). ROS are associated
with cell cycle, proliferation, differentiation and migration
and are elevated in colon as well as other cancers (Vaquero
et al. 2004; Kumar et al. 2008; Afanas’ev 2011; Lin et al.
2017). Notably, many of the established environmental risk
factors for colon cancer implicate oxidative stress pathways,
including high alcohol consumption, smoking, increased
consumption of red and processed meats (Stevens et al.
13
13
20
19
8
41
9
4
20
8
24
31
30
16
16
1q25.3
1q41
2q35
3p22.1
3p14.1
5p15.33
5q22.2
5q31.1
6p21.2
6p21.1
6q22.1
6q25.3
11
9
14
21
10
6
17
2
2
12
6
8
11
11
6
22
24
14
6
16
4
5
34
8
14
16
8 (57)
3 (25)
10 (38)
15 (50)
8 (50)
6 (100)
10 (43)
2 (50)
2 (40)
10 (27)
5 (56)
7 (47)
9 (50)
DCBLD1,
ROS1,
VGLL2
MAP3K4
–
–
–
SRP19
PDCD6
SLC25A26
–
GPBAR1,
WNT10A,
ARPC2
MIA3
ARPC5
–
CT
CT WB CT∩WB
(%
overlap)d
1 (87)
2 (90)
1 (75)
1 (88)
1 (94)
3 (81)
DCBLD1
–
1 (97)
4 (87)
ETV7, KCTD20,
C6orf89, PXT1
UBR2
0.01
0.01
–
9 × 10− 3
7 × 10− 3
0.01
0.04
–
–
–
SCL22A3
DCDBL2
TFEB
PITX1,
CATSPER3,
PCBD2,
MIR4461,
H2AFY
CDKN1A
APC
–
9 × 10− 3
0.02
1 × 10− 3 SLC25A26,
LRIG1
0.02
TERT
2 × 10− 3
CTNNB1
0.04
–
8 × 10− 5 PNKD,
TMBIM1
3 × 10− 3
8 (80)
DUSP10
WNT4,
CDC42
LAMC1
0.02
3 × 10− 6
0.02
WB
Reported
gene(s)
0.05
0.02
–
CT
P for most significant gene
rs7758229
rs1321311,
intergenic
rs4711689,
intronic
rs4946260,
intronic
rs35360328,
intergenic
rs812481,
intronic
rs2736100,
intronic
rs1801155,
missenseAPC
rs647161,
intergenic
rs72647484,
intergenic
rs10911251,
intronic
rs6691170,
intergenic
rs992157,
intronic (tags
missense)
Cui et al.
(2011)
Dunlop et al.
(2012)
Zeng et al.
(2016)
Schumacher
et al. (2015)
Jia et al.
(2013)
Schumacher
et al. (2015)
Schumacher
et al. (2015)
Kinnersley
et al. (2012)
Niell et al.
(2003)
Al-Tassan
et al. (2015)
Peters et al.
(2013)
Houlston et al.
(2010)
Orlando et al.
(2016)
rsID dbSNP
References
function (note)
GWAS publication for independent index
variant(s)c
3 (63)
4 (79)
1 (95)
CT + WB
CAMLG, DDX46 2 (92)
–
SLC25A26,
SUCLG2
AHRR
CXCR1, CXCR2,
ARPC2,
AAMP, PNKD,
GPBAR1,
TMBIM1
ZNF621
LAMC1, RGL1,
TEDDM1
FAM177B, AIDA
CDC42
WB
Gene set (decreasing order of
significance)
Genes with genetically imputed gene
expressiona
CCDS
gene
build
Number of
genes (%
reduced from
CCDS)b
PrediXcan results for genes with P ≤ 0.05
Gene count in region
1p36.12
Region
Table 2 Known GWAS-risk regions overlapping genes that show association of genetically regulated gene expression with CRC
–
–
–
–
–
–
–
–
–
–
–
–
–
Variant(s) with
differential allelic
effects and gene
regulated
312
Human Genetics (2019) 138:307–326
6
5
11
5
74
12
74
29
8q24.21
9q24
10p14
10q24.2
10q25.2
11q12.2
11q13.4
14
26
6
26
2
8
2
5
22
42
7
42
4
9
4
5
9 (33)
15 (28)
5 (63)
9 (53)
2 (50)
6 (55)
2 (67)
3 (43)
OR2AT4,
RNF169,
NEU3,
DNAJB13
FADS2,
GANAB
–
CUTC,
HIF1AN,
SEC31B
ITIH2
POU5F1B,
FAM84B
KIAA1432
AARD,
SLC30A8
CT
CT WB CT∩WB
(%
overlap)d
8 (89)
C11orf10,
FADS1,
FADS2,TAF6L,
C11orf9,
DAGLA,
FADS3
POLD3, RAB6A, 4 (67)
MRPL48
1 (92)
5 × 10− 4 MYRF
8 × 10− 3 POLD3
4 × 10− 3
7 × 10− 5
VTI1A,
TCF7L2
0.05
–
9 × 10− 4 ABCC2,
MRP2
5 × 10− 3
6 (70)
SLC25A28,
COX15,
SEC31B,
HIF1AN,
ENTPD7
GPAM
GATA3
–
–
0.01
0.03
POU5F1B,
MYC
Not reported
6 × 10− 3 EIF3H
WB
6 × 10− 10 0.01
0.02
CT
Reported
gene(s)
1 (80)
1 (91)
2 (60)
3 (50)
CT + WB
P for most significant gene
rs3824999,
intronic
rs174537,
intronic
rs60892987,
intergenic
rs12241008,
intronic;
rs11196172
rs1035209,
intergenic
rs2450115,
intergenic;
rs16892766,
intergenic;
rs6469656,
intergenic
rs6983267,
intergenic
rs719725,
intergenic
rs10795668
Dunlop et al.
(2012)
Zhang et al.
(2014) and
Wang et al.
(2017)
Zhang et al.
(2014) and
Schmit et al.
(2016)
Tomlinson
et al. (2008)
Zanke et al.
(2007)
Tomlinson
et al. (2008)
Whiffin et al.
(2014)
Tomlinson
et al. (2008)
and Zeng
et al. (2016)
rsID dbSNP
References
function (note)
GWAS publication for independent index
variant(s)c
–
–
POU5F1B
UTP23
WB
Gene set (decreasing order of
significance)
Genes with genetically imputed gene
expressiona
CCDS
gene
build
Number of
genes (%
reduced from
CCDS)b
PrediXcan results for genes with P ≤ 0.05
Gene count in region
8q23.3
Region
Table 2 (continued)
–
–
–
–
–
–
rs6983267; MYC
rs16888589; EIF3H
Variant(s) with
differential allelic
effects and gene
regulated
Human Genetics (2019) 138:307–326
313
13
13
16
12
6
5
12q13.12 32
12q24.12 24
12q24.22 14
9
41
15q13.3
16q22.1
23
40
12p13.32 71
35
2
11
18
20
53
13
19 (49)
2 (40)
4 (31)
9 (43)
9 (33)
33 (55)
8 (42)
–
GOLGA8N
HECTD4,
RAD9B,
BRAP,
TMEM116,
FAM109A
NOS1
LIMA1,
COX14,
CERS5,
NCKAP5L,
LETMD1,
ATF1
COLCA2,
COLCA1,
C11orf53,
DLAT
NOP2
CT
CT WB CT∩WB
(%
overlap)d
14
Gene set (decreasing order of
significance)
Genes with genetically imputed gene
expressiona
CCDS
gene
build
ESRP2, NFATC3
–
2 (98)
1 (88)
–
GREM1
8 × 10− 3 CDH1
–
9 × 10− 3 NOS1
1 × 10− 2
0.04
1 × 10− 6 SH2B3
2 × 10− 3
9 (63)
2 (86)
3 × 10− 4 DIP2B, ATF1
8 × 10− 6
13 (59)
DIP2B, LIMA1,
SMARCD1,
GALNT6,
TFCP2,
SCN8A,
METTL7A,
RACGAP1
TRAFD1, CUX2,
BRAP, ATXN2,
SH2B3
FBXO21
6 × 10− 3 CCND2,
C12orf5,
FGF6,
RAD51AP1,
FGF23,
PARP11
COLCA1,
COLCA2
0.04
–
1 × 10− 6
Reported
gene(s)
3 (96)
4 (85)
–
WB
CT
P for most significant gene
Tenesa et al.
(2008)
rs7130173;
COLCA1, COLCA2
Variant(s) with
differential allelic
effects and gene
regulated
rs16969681
intergenic
rs11632715,
intergenic
rs9929218,
intronic
rs7320812
rs3184504,
missense
COGENT
Study et al.
(2008)
Schumacher
et al. (2015)
Tomlinson
et al. (2007)
Schumacher
et al. (2015)
rs5030625; CDH1
rs16969681;GREM1
–
–
–
Jia et al.
rs10774214,
(2013),
intergenic;
Zhang et al.
rs3217810,
(2014),
intergenic
Whiffin et al.
rs10849432,
(2014) and
intergenic
Zeng et al.
rs11064437,
(2016)
splice donorSPSB2
rs11169552,
Houlston et al. –
intronic
(2010)
rs3802842,
intronic
rsID dbSNP
References
function (note)
GWAS publication for independent index
variant(s)c
CCND2,
SCNN1A
CT + WB
WB
Number of
genes (%
reduced from
CCDS)b
PrediXcan results for genes with P ≤ 0.05
Gene count in region
27
11q23.1
Region
Table 2 (continued)
314
Human Genetics (2019) 138:307–326
13
24
4
20
19q13.11 20
59
9
19q13.2
20q13.13
20q13.33 27
23
7
37
17
7
24
15 (54)
3 (38)
15 (33)
11 (58)
3 (33)
17 (65)
MTG2
PREX1
DEDD2,
TGFB1
PDCD5
FAM57A,
GEMIN4,
BMLHA9
MYO5B, LIPG
SS18L1, HRH3
SNRPA,
B3GNT8,
CCDC97
B4GALT5
PDCD5
3 (89)
0.05
rs7229639
intronic
rs4939827
intronic
rs10411210
intronic
rs12603526,
intronic
Zhang et al.
(2014)
rsID dbSNP
References
function (note)
Broderick
et al. (2007)
and Zhang
et al. (2014)
RHPN2,
COGENT
GPATCH1
Study et al.
(2008)
Zhang et al.
TGFB1, B9D2 rs1800469
(2014)
intronic (tags
missense)
PREX1
rs6066825
Schumacher
intronic
et al. (2015)
LAMA5,
rs4925386
Houlston et al.
RPS21
intronic
(2010)
SMAD7
NXN
Reported
gene(s)
GWAS publication for independent index
variant(s)c
The intersect of genes in CT and WB models
Conditionally independent in statistical models containing both variants or LD r2 < 0.2
d
c
Number of genes with a P value ≤ 0.05. % Red. = (# of genes with P value ≤ 0.05/# CCDS genes) × 100
Genes with predicted expression in the corresponding tissue
b
a
CT colon transverse, WB whole blood, No. number— no genes meeting criteria. In known loci, genes with gene expression predictive R2 < 0.01 were included
5 × 10− 3
7 × 10− 3
7 × 10− 3
2 (78)
6 × 10− 3
0.03
5 (92)
0.04
8 × 10− 3
0.02
0.01
1 × 10− 3
0.04
WB
CT
1 (95)
3 (70)
3 (89)
FAM57A,
GEMIN4
SMAD7
CT + WB
WB
P for most significant gene
CCDS genes were counted, regardless of tissue relevance, 500 kb upstream or downstream of an index variant
5
10
18q21.1
19
CT
CT WB CT∩WB
(%
overlap)d
27
Gene set (decreasing order of
significance)
Genes with genetically imputed gene
expressiona
CCDS
gene
build
Number of
genes (%
reduced from
CCDS)b
PrediXcan results for genes with P ≤ 0.05
Gene count in region
17p13.3
Region
Table 2 (continued)
–
–
–
rs6507874,
rs6507875,
rs8085824, and
rs5892087, SMAD7
–
–
Variant(s) with
differential allelic
effects and gene
regulated
Human Genetics (2019) 138:307–326
315
13
316
1988; Bird et al. 1996), or decreased consumption of fruits
and vegetables (La Vecchia et al. 2013). In future laboratory
analysis, it would be interesting to investigate whether the
association of increased TRIM4 expression with decreased
CRC risk is mechanistically acting through the regulation of
ROS and cell growth.
Under the hypoxic conditions of the tumor microenvironment, constant reprogramming of glycogen metabolism
is essential for providing the energy requirements necessary for cell growth and proliferation. PYGL (the second
novel finding) encodes the key enzyme involved in glycogen degradation, releases glucose-1-phosphate so that it can
enter the pentose phosphate pathway, which is important
for generating NADPH, nucleotides, amino acids, and lipids
required for continued cell proliferation (Favaro et al. 2012).
It has previously been shown that depletion of PYGL leads to
oxidative stress (increased ROS levels), and subsequent P53induced growth arrest in cancer cells (Favaro et al. 2012).
Of note, small molecule inhibitors of PYGL are currently
under investigation for the treatment of diabetes (Praly and
Vidal 2010). However, while decreased expression of PYGL
in the tumor may result in tumor senescence, our results
suggest that decreased PYGL expression is associated with
increased risk of CRC. Like the dynamic role of expression
for genes involved in the TGF-beta pathway, these conflicting observations between cancer risk and effects of early
versus late induction of PYGL on cancer survival are likely
reflecting the importance of context and fluctuating nutrient
and oxygen availability within the tumor microenvironment.
Importantly, we found that the PrediXcan analysis identified new candidate genes in known GWAS loci that had
previously gone undetected. For instance, in the recently
identified 2q35 locus (Orlando et al. 2016), the authors originally reported the two closest genes, PNKD and TMBIM1,
as potential targets for the putative regulatory locus marked
by the index variant, rs992157. The authors reported eQTL
evidence showing that rs992157 was associated with expression of nearby genes PNKD and TMBIM1 in lymphoblastoid cells, but not colorectal adenocarcinoma cells. In our
PrediXcan analysis, expression of two other genes in this
region, CXCR1 and CXCR2, were among the most strongly
associated genes in the entire analysis, while the associations
for PNKD (P = 6 × 10− 3) and TMBIM1 (P = 0.01) showed
weaker associations. Our study added independent evidence
for an association of the locus with CRC given that the index
variant was only borderline significantly associated in previous analysis and identify two promising targets, CXCR1 and
CXCR2. These genes are of note due to their chemotherapeutic properties. Specifically, the CXCR inhibitor, Reparixin, is
currently under investigation for progression free survival of
metastatic triple negative breast cancer in a stage 2 clinical
trial (NCT02370238). Interestingly, expression of CXCR1
and CXCR2 has been shown to be elevated in colon tumor
13
Human Genetics (2019) 138:307–326
epithelium relative to normal adjacent tissue (P < 0.001).
While there is still much to be learned, it is possible that this
drug could also be useful for the treatment of CRC (Dabkeviciene et al. 2015).
This study had many strengths, most notably the use of
reference transcriptome data to perform gene-level association testing in several large GWAS studies to both uncover
novel associations and identify likely functional gene targets
in known loci. By integrating reference transcriptome data,
this study focused on genes that are expressed in CRC-relevant tissues. Furthermore, this method provided biologically
relevant sets to aggregate variants, thereby improving statistical power by reducing the burden of multiple comparisons.
In addition, our study was quite large, being comprised of
nearly 100,000 participants across the discovery and replication datasets.
Our study had several limitations. For many genes, the
predictive R2 for genetic variant models was relatively low,
indicating that a small proportion of the variance in gene
expression was explained by these models. In a recent publication, Su et al. (2018) demonstrated through extensive
simulations that while there is an attenuation of true signal
as a results of this, the diminishment in power was less than
anticipated and more importantly this does not increase type
I error. Predictive performance values were relatively strong
in the models used for PYGL (R2 = 0.26) TRIM4. (R2 = 0.51)
corresponding to 51% and 71% correlation between predicted and observed expression, respectively. In general,
larger sample sizes for the reference panel will be needed
to achieve better prediction models, particularly for rarer
variants. While PYGL and TRIM4 were discovered using the
colon tissue model, the whole blood model also showed evidence of association. This finding was not surprising in light
of the recent GTEx paper demonstrating that many GWAS
loci implicate shared eQTLs (GTEx Consortium et al. 2017).
It should also be noted that variant predictors could implicate enhancers influencing the expression of multiple genes
and because this study only evaluates genetically influenced
expression levels, there is uncertainty that the associated
gene is the causally related gene. As such, laboratory followup remains a critical extension of these findings; however,
this laborious work can now be more targeted based on
results from this analysis.
The loci identified using GWAS are most often located in
non-coding regions and provide little biological insight. In
contrast, the PrediXcan method directly tests putative target
genes providing strong hypotheses for subsequent laboratory
follow-up. The CXCR1 and CXCR2 findings are of interest given their therapeutic potential. As such, these findings
provide preliminary support for new molecular targets that
could potentially repurpose a putative cancer therapeutic
agent and highlight the utility of integrating functional data
for discovery of, and biological insight into risk loci.
Human Genetics (2019) 138:307–326
Future analyses would be improved by increasing the
number of transcriptomes. Similarly, larger GWAS sample
sizes, or imputation of other molecular phenotypes (ChIPseq, DNase-Seq, etc.) as data become available could be
fruitful in the identification of important enhancer(s) or
other regulatory elements that could influence the expression of one or more genes.
In conclusion, we identified two novel loci through the
association of genetically predicted gene expression for
TRIM4 and PYGL with CRC risk and identified strong target
genes in known loci. The CXCR1 and CXCR2 findings highlight the advantage of using gene-based methods to identify
stronger candidate genes and potentially expedite clinically
relevant discovery. Further functional studies are required to
confirm our findings and understand their biologic implications. This, in turn, could provide further insight into CRC
etiology and potentially new therapeutic targets.
Materials and methods
Description of study cohorts
The discovery phase was comprised of 26,904 participants
(12,186 CRC cases and 14,718 controls) of European ancestral heritage across 16 studies (described in methods and
materials of Online Resource 1). Details of genotyping, QC
and single-variant GWAS have been previously reported
(Peters et al. 2013; Schumacher et al. 2015). The replication
phase included a total of 32,825 cases and 39,933 controls.
In addition to previously published CRC GWAS studies
from CORECT (Schumacher et al. 2015) we included UK
Biobank (application number 8614) and new CRC GWAS
from additional GWAS. A nested case–control dataset from
the UK Biobank resource was constructed defining cases
as subjects with primary invasive CRC diagnosed, or who
died from CRC according to ICD9 (1530–1534, 1536–1541)
or ICD10 (C180, C182–C189, C19, C20) codes. Control
selection was done in a time-forward manner, selecting one
control for each case, first from the risk set at the time of the
case’s event, and then multiple passes were made to match
second, third and fourth controls. For prevalent cases, each
case was matched with four controls that exactly matched
the following matching criteria: year at enrollment, race/
ethnicity, and sex. In total, 5356 cases and 21,407 matched
controls were included from UK Biobank in the replication
analysis. For the site-stratified analysis, “proximal” colon
cancer was defined as hepatic flexure, transverse colon,
cecum and ascending colon (ICD9 1530,1531,1534,1536),
“distal” colon cancer was defined as descending colon, sigmoid colon, and splenic flexure (ICD9 1532,1533,1537) and
“rectal” was defined as rectosigmoid junction, and rectum
(ICD9 1540,1541).
317
Studies, sample selection and matching are described in
Online Resource 1, which provides details on sample numbers, and demographic characteristics of study participants.
All participants provided written informed consent, and each
study was approved by the relevant research ethics committee or institutional review board.
Whole‑genome sequencing reference genotype
imputation panel
We performed low-pass whole-genome sequencing of 2192
samples (details in Online Resource 1) at the University
of Washington Sequencing Center (Seattle, WA, USA). A
detailed description is provided in the Online Resource 1.
In brief, after sample QC and removal of samples with estimated DNA contamination > 3% (16), duplicated samples
(5) or related individuals (1), sex discrepancies (0), and
samples with low concordance with genome-wide variant
array data (11), there were a total of 1439 CRC cases and
720 controls of European ancestry available for subsequent
imputation. These data were used as a reference imputation
panel for the discovery and replication GWAS datasets.
GWAS genotype data and quality control
In brief, genotyped variants were excluded based on call
rate (< 98%), lack of Hardy–Weinberg Equilibrium in controls (HWE, P < 1 × 10− 4), and low minor allele frequency
(MAF < 0.05). We imputed the autosomal variants of all
studies to an internal imputation reference panel derived
from whole genome sequencing (described above). We
employed a two-stage imputation strategy (Howie et al.
2012) where entire chromosomes were first pre-phased using
SHAPEIT2 (Delaneau et al. 2013), followed by imputation
using minimac3 (Das et al. 2016). Only variants with an
imputation quality R2 > 0.3 were included for subsequent
analyses.
Imputation of genetically regulated gene
expression in study cohort
Jointly measured genome variant data and transcriptome
data sets were used by Gamazon et al. to develop additive models of gene expression levels. The weights for the
estimation were downloaded from the publicly available
database (http://hakyimlab.org/predictdb/). We used these
models to estimate genetically regulated expression of genes
in colon transverse, and whole blood. These estimates represent multi-variant prediction of tissue-specific gene expression levels.
In-depth details of the reference cohort, datasets, and
model building have previously been described (Gamazon
et al. 2015). To summarize, jointly measured genome-wide
13
318
Human Genetics (2019) 138:307–326
genotype data and RNA-seq data were obtained from two
different projects: (1) the DGN cohort (Battle et al. 2014)
(whole blood, n = 922) and (2) GTEx (GTEx Consortium
2015) (transverse colon, n = 169), predominantly of European
ancestry. Gamazon et al. used approximately 650,000 variants
with MAF > 0.05 to impute non-genotyped dosages using the
1000G Phase 1 v3 reference panel variants with MAF > 0.05
and imputation R2 > 0.8 was retained for subsequent model
building. In each tissue, Gamazon et al. normalized gene
expression by adjusting for sex, the top 3 principal components
(derived from genotype data) and the top 15 PEER factors (to
quantify hidden experimental confounders). These genomic
and transcriptomic data sets were used to train additive models of gene expression levels with elastic net regularization
(Gamazon et al. 2015). The model can be written as
∑
Yg =
wk,g Xk + 𝜀,
(1)
k
where Yg is the expression trait of gene g, wk,g is the effect
size of genetic marker k for g, Xk is the number of reference variant alleles of marker k and ε is the contribution of
other factors influencing gene expression. The effect sizes
(wk,g) in Eq. (1) were estimated using the elastic net penalized approach. The summation in Eq. 1 is referred to as the
genetically determined component of gene expression. The
variant models (weights, w_k,g) were downloaded from
the publicly available database (http://hakyimlab.org/predi
ctdb/).
The heritability of gene expression was used to estimate
how well the variant models predict gene expression levels.
The narrow-sense heritability for each gene was calculated
by Gamazon et al. (2015), using a variance-component model
with a genetic relationship matrix (GRM) estimated from
genotype data, as implemented in GCTA (Yang et al. 2011).
The proportion of the variance in gene expression explained by
these local variants was calculated using a mixed-effects model
(Torres et al. 2014; Gamazon et al. 2015). This heritability was
highly correlated with the predictive R2 (The cross-validated
R2 value found when training the model). Only genes with
R2 ≥ 0.01(≥ 10% correlation between predicted and observed
expression) were tested for association with CRC. Furthermore, this analysis focused on the component of heritability
driven by variants in the vicinity (1 Mb) of each gene (cisvariants) because the component based on distal variants could
not be estimated with enough accuracy to make meaningful
inferences.
Genotypes were treated as continuous variables (dosages).
Using the variant weights provided by Gamazon et al. we estimated the genetically regulated gene expression (GReX) of
each gene g
∑
GReX =
wk,g xk ,
(2)
k
13
where wk is the single-variant coefficient derived by regressing the gene expression trait Y on variant Xk using the reference transcriptome data. To address linkage disequilibrium
among variant predictors, Gamazon et al. (2015) used the
variable selection method to select a sparser set of (less
correlated) of predictors. Specifically, variant weights (wk)
were derived using elastic net with the R package glmnet
with α = 0.5. These weights are available from http://hakyi
mlab.org/predictdb/. Using Eq. 2, and the reference variant
predictor weights (wk,g), the (unobserved) genetically determined expression of each gene g (GReX) was estimated in
our GWAS sample. For both transcriptome models, separate analyses were performed for genetically based expression of genes (up to 2 tests per gene). Genes with predictive
R2 > 0.01 were tested for association with CRC in our cohort
(colon transverse n = 4850 genes, and whole blood n = 8277
genes).
Gene level tests of CRC association with imputed
genetically regulated gene expression
Discovery phase
Statistical analyses of all data were conducted centrally at
the GECCO coordinating center on individual-level data.
Multivariate logistic regression models were adjusted for
age, sex (when appropriate), center (when appropriate), and
genotyping batch (ASTERISK) and the first four principal
components to account for potential population substructure.
Imputed genetically regulated gene expression (GREx), was
treated as a continuous variable. All studies were analyzed
together in a pooled dataset using logistic regression models
to obtain odds ratios (ORs) and 95% confidence intervals
(CIs). Quantile–quantile (Q–Q) plots were assessed to determine whether the distribution of the P was consistent with
the null distribution (except for the extreme tail). All analyses were conducted using the R software (Version 3.0.1).
Novelty of a gene finding was determined by taking all
variant predictors of the gene and determining if they were
in linkage disequilibrium (LD ≥ 0.2 in Phase 3 Thousand
Genomes Europeans) with a previously reported GWAS
index variant.
logit(pCRC ) = 𝛽0 + 𝛽1 GReX + 𝛽2 age + 𝛽2 sex + 𝛽3 center
+ 𝛽4 batch + PC1 + PC2 + PC3 + PC4.
(3)
We identified suggestive findings in the discovery stage to
be replicated in a second independent dataset. In the discovery stage we employed a false-discovery rate (FDR) threshold of 0.2 separately for colon transverse and whole blood
models. FDR for each gene was calculated using the R statistical package p.adjust, which uses the method of Benjamini
Human Genetics (2019) 138:307–326
and Hochberg to calculate the expected proportion of false
discoveries amongst the rejected hypotheses (Hochberg and
Benjamini 1990). Genes meeting this threshold were carried
forward for replication.
Replication phase
To replicate novel PrediXcan findings (n = 3 genes from
colon transverse models) that had a FDR ≤ 0.2, we used the
same GTEx colon transverse, elastic net prediction models
(as we had done in the discovery GECCO-CCFR data) to
impute genetically regulated gene expression in replication samples from (1) CORECT (pooled across consortium
studies), (2) UK Biobank and (3) a pooled dataset of 5
independent GWAS datasets. Multivariate logistic regression was used to test the association of imputed genetically
regulated gene expression with colorectal cancer risk in
these three datasets and then meta-analyzed effects using
inverse variance weighting of Z scores (details provided in
Online Resource 1). A two-sided P value less than 0.05/
(number of genes to be replicated) was considered statistically significant.
Definition of CRC risk regions and refinement
of GWAS loci
The 56 previously reported CRC risk variants used in this
analysis had an LD r2 ≤ 0.2 with other risk variants in our
known list, or were otherwise previously reported to maintain statistical significance in regression models conditioning on other nearby risk variants (referred to hereon as
‘independent’ risk variants). For each of the 56 independent
risk variants defined in Table S3, we further defined ‘risk
regions’ as 1 megabase (Mb) upstream and 1 Mb downstream of each risk variant (2 Mb regions surrounding each
risk variant). Overlapping 2 Mb risk regions were then
combined into a single new risk region defined as the minimum and maximum chromosomal coordinates from one or
more overlapping risk regions (the union of the overlapping
regions). This resulted in a total of 44 regions harboring one
or more risk variants (maximum of four independent risk
variants). A list of transcription start sites (TSS) for genes
that showed nominal association (P ≤ 0.05) between genetically regulated gene expression and CRC risk in colon transverse and whole blood models was then intersected with the
list of 44 risk regions to identify a list a putative target genes
regulated by non-coding GWAS risk variants.
Bioinformatic follow‑up
Bioinformatic follow-up was performed for the TRIM4 and
PYGL loci using the UCSC Genome Browser and publicly available functional data for CRC relevant tissues
319
and cell-types from Roadmap, ENCODE, as well as previously published epigenomes (Akhtar-Zaidi et al. 2012). The
TRIM4 and PYGL loci were defined as the genomic region
containing all variants in LD (r2 ≥ 0.2 from Phase 3 Thousand Genomes Project) with the variant predictor having the
strongest marginal CRC association (TRIM4-rs2527886 and
PYGL-rs12589665). We then aligned the locus with refseq
protein coding genes, epigenetic signals in normal crypts
and CRC cell lines to identify recurrently gained and lost
variant enhancer loci (VEL), and ChIP-seq transcription factor binding sites.
URLs
PrediXcan software, https://github.com/hakyimlab/Predi
Xcan; University of Michigan Imputation-Server, https://
imputationserver.sph.umich.edu/start.html; GTEx Portal,
http://www.gtexpo rtal. org/; PredictDB, http://predic tdb.org/.
Acknowledgements ASTERISK: We are very grateful to Dr. Bruno
Buecher without whom this project would not have existed. We also
thank all those who agreed to participate in this study, including the
patients and the healthy control persons, as well as all the physicians,
technicians and students. CORECT: The content of this manuscript
does not necessarily reflect the views or policies of the National Cancer
Institute or any of the collaborating centers in the CORECT Consortium, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government or the CORECT
Consortium. We thank Alina Hoehn for her valuable contributions to
table/figure generation and organization of this manuscript. We are
incredibly grateful for the contributions of Dr. Brian Henderson and
Dr. Roger Green over the course of this study and acknowledge them in
memoriam. We are also grateful for support from Daniel and Maryann
Fong. ColoCare: We thank the many investigators and staff who made
this research possible in ColoCare Seattle and ColoCare Heidelberg.
ColoCare was initiated and developed at the Fred Hutchinson Cancer
Research Center by Drs. Ulrich and Grady. COLON and NQplus: the
authors would like to thank the COLON and NQplus investigators at
Wageningen University & Research and the involved clinicians in the
participating hospitals. CCFR: The Colon CFR graciously thanks the
generous contributions of their study participants, dedication of study
staff, and financial support from the U.S. National Cancer Institute,
without which this important registry would not exist. The content
of this manuscript does not necessarily reflect the views or policies
of the National Cancer Institute or any of the collaborating centers
in the Colon Cancer Family Registry (CCFR), nor does mention of
trade names, commercial products, or organizations imply endorsement
by the US Government or the CCFR. CPS-II: The authors thank the
CPS-II participants and Study Management Group for their invaluable contributions to this research. The authors would also like to
acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention
National Program of Cancer Registries, and cancer registries supported
by the National Cancer Institute Surveillance Epidemiology and End
Results program. DACHS: We thank all participants and cooperating clinicians, and Ute Handte-Daub, Utz Benscheid, Muhabbet Celik
and Ursula Eilber for excellent technical assistance. Galeon: GALEON
wishes to thank the Department of Surgery of University Hospital of
Santiago (CHUS), Sara Miranda Ponte, Carmen M Redondo, and the
13
320
staff of the Department of Pathology and Biobank of CHUS, Instituto de Investigación Sanitaria de Santiago (IDIS), Instituto de Investigación Sanitaria Galicia Sur (IISGS), SERGAS, Vigo, Spain, and
Programa Grupos Emergentes, Cancer Genetics Unit, CHUVI Vigo
Hospital, Instituto de Salud Carlos III, Spain. EPIC: We thank all participants and health care personnel in the Västerbotten Intervention
Programme, as well as the Department of biobank research, Umeå University, and Biobanken norr, Västerbotten County Council. GECCO:
The authors would like to thank all those at the GECCO Coordinating Center for helping bring together the data and people that made
this project possible. The authors also acknowledge Deanna Stelling,
Mark Thornquist, Greg Warnick, Carolyn Hutter, and team members at
COMPASS (Comprehensive Center for the Advancement of Scientific
Strategies) at the Fred Hutchinson Cancer Research Center for their
work harmonizing the GECCO epidemiological data set. The authors
acknowledge Dave Duggan and team members at TGEN (Translational
Genomics Research Institute), the Broad Institute, and the Génome
Québec Innovation Center for genotyping DNA samples of cases and
controls, and for scientific input for GECCO. HPFS, NHS and PHS:
We would like to acknowledge Patrice Soule and Hardeep Ranu of the
Dana Farber Harvard Cancer Center High-Throughput Polymorphism
Core who assisted in the genotyping for NHS, HPFS, and PHS under
the supervision of Dr. Immaculata Devivo and Dr. David Hunter, Qin
(Carolyn) Guo and Lixue Zhu who assisted in programming for NHS
and HPFS, and Haiyan Zhang who assisted in programming for the
PHS. We would like to thank the participants and staff of the Nurses’
Health Study and the Health Professionals Follow-Up Study, for their
valuable contributions as well as the following state cancer registries
for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN,
IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK,
OR, PA, RI, SC, TN, TX, VA, WA, WY. The authors assume full
responsibility for analyses and interpretation of these data. MCCS:
This study was made possible by the contribution of many people,
including the original investigators and the diligent team who recruited
participants and continue to work on follow-up. We would also like
to express our gratitude to the many thousands of Melbourne residents who took part in the study and provided blood samples. PLCO:
The authors thank Drs. Christine Berg and Philip Prorok, Division of
Cancer Prevention, National Cancer Institute, the Screening Center
investigators and staff or the Prostate, Lung, Colorectal, and Ovarian
(PLCO) Cancer Screening Trial, Mr. Tom Riley and staff, Information
Management Services, Inc., Ms. Barbara O’Brien and staff, Westat,
Inc., and Drs. Bill Kopp and staff, SAIC-Frederick. Most importantly,
we acknowledge the study participants for their contributions to making
this study possible. The statements contained herein are solely those of
the authors and do not represent or imply concurrence or endorsement
by NCI. PMH: The authors would like to thank the study participants
and staff of the Hormones and Colon Cancer study. SEARCH: We
acknowledge the contributions of Mitul Shah, Val Rhenius, Sue Irvine,
Craig Luccarini, Patricia Harrington, Don Conroy, Rebecca Mayes,
and Caroline Baynes. The Swedish low-risk colorectal cancer study:
we thank Berith Wejderot and the Swedish low-risk colorectal cancer
study group. UK Biobank: This research has been conducted using the
UK Biobank Resource under Application Number 8614. WHI: The
authors thank the WHI investigators and staff for their dedication, and
the study participants for making the program possible. A full listing
of WHI investigators can be found at: http://www.whi.org/researcher
s/Documents%20%20Write%20a%20Paper/WHI%20Investigator%20
Short%20List.pdf.
Funding GECCO: This work was supported by the National Cancer
Institute; National Institutes of Health; and the United States Department of Health and Human Services (U01 CA137088, R01 CA059045,
U01 CA164930, R01 CA201407, R01 CA206279). Genotyping/
Sequencing services were provided by the Center for Inherited Disease
13
Human Genetics (2019) 138:307–326
Research and is supported by a federal contract from the National Instit u t e s o f H e a l t h t o T h e J o h n s H o p k i n s Un i ve r s i t y
(HHSN268201200008I). ASTERISK: a Hospital Clinical Research
Program (PHRC-BRD09/C) from the University Hospital Center of
Nantes and supported by the Regional Council of Pays de la Loire, the
Groupement des Entreprises Françaises dans la Lutte contre le Cancer;
the Association Anne de Bretagne Génétique; and the Ligue Régionale
Contre le Cancer. ATBC: The ATBC Study is supported by the Intramural Research Program of the United States National Cancer Institute;
National Institutes of Health; and by United States Public Health Service (HHSN261201500005C) from the National Cancer Institute,
Department of Health and Human Services. COLO2&3: National Institutes of Health (R01 CA60987). CCFR: Illumina GWAS was supported
by funding from the National Cancer Institute; and the National Institutes of Health (U01 CA122839, R01 CA143247). The Colon CFR/
CORECT Affymetrix Axiom GWAS and OncoArray GWAS were supported by funding from the National Cancer Institute; and National
Institutes of Health (U19 CA148107 to S.B.G.). The Colon CFR participant recruitment and collection of data and biospecimens used in
this study were supported by the National Cancer Institute; and
National Institutes of Health (UM1 CA167551) and through cooperative agreements between multiple Colon CFR centers: (U01 CA074778,
U01/U24 CA097735 to Australasian Colorectal Cancer Family Registry, U01/U24 CA074799 to USC Consortium Colorectal Cancer Family
Registry, U01/U24 CA074800 to Mayo Clinic Cooperative Family
Registry for Colon Cancer Studies, U01/U24 CA074783 to Ontario
Familial Colorectal Cancer Registry, U01/U24 CA074794 to Seattle
Colorectal Cancer Family Registry, U01/U24 CA074806 to University
of Hawaii Colorectal Cancer Family Registry). Additional support for
case ascertainment was provided from the Surveillance, Epidemiology
and End Results Program of the National Cancer Institute
(N01-CN-67009, N01-PC-35142, HHSN2612013000121 to Fred
Hutchinson Cancer Research Center), the Hawaii Department of Health
(N01-PC-67001 and N01-PC-35137, HHSN26120100037C), and the
California Department of Public Health (HHSN261201000035C to the
University of Southern California), and the following state cancer registries: AZ, CO, MN, NC, NH, and by the Victoria Cancer Registry and
Ontario Cancer Registry. CORECT: The CORECT Study was supported by the National Cancer Institute; National Institutes of Health;
and the United States Department of Health and Human Services (U19
CA148107, R01 CA81488 to S.B.G., P30 CA014089, R01 CA197350
to S.B.G., P01 CA196569, R01 CA201407); and National Institutes of
Environmental Health Sciences, National Institutes of Health (T32
ES013678). CPSII: The Cancer Prevention Study-II Nutrition Cohort
is supported by the American Cancer Society. COLON: The COLON
study is sponsored by Wereld Kanker Onderzoek Fonds, including
funds from grant 2014/1179 as part of the World Cancer Research Fund
International Regular Grant Programme, by Alpe d’Huzes and the
Dutch Cancer Society (UM 2012-5653, UW 2013-5927, UW20157946), and by TRANSCAN (JTC2012-MetaboCCC, JTC2013FOCUS). ColoCare: This work was supported by the National Institutes
of Health (R01 CA189184, U01 CA206110, 2P30CA015704-40 to
Gilliland]; the Matthias Lackas-Foundation; the German Consortium
for Translational Cancer Research; and the EU TRANSCAN initiative.
DACHS: This work is supported by the German Research Council
[Deutsche Forschungsgemeinschaft, BR 1704/6-1, BR 1704/6-3, BR
1704/6-4 and CH 117/1-1); and the German Federal Ministry of Education and Research (01KH0404 and 01ER0814). DALS: This work is
supported by the National Institutes of Health (R01 CA48998 to M. L.
S.) EPIC: The coordination of EPIC is financially supported by the
European Commission (DG-SANCO) and the International Agency for
Research on Cancer. The national cohorts are supported by Danish
Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave
Roussy, Mutuelle Générale de l’Education Nationale, Institut National
de la Santé et de la Recherche Médicale (INSERM) (France); German
Cancer Aid, German Cancer Research Center (DKFZ), Federal
Human Genetics (2019) 138:307–326
Ministry of Education and Research (BMBF), Deutsche Krebshilfe,
Deutsches Krebsforschungszentrum and Federal Ministry of Education
and Research (Germany); the Hellenic Health Foundation (Greece);
Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy and
National Research Council (Italy); Dutch Ministry of Public Health,
Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK
Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics
Netherlands (The Netherlands); ERC-2009-AdG 232997 and Nordforsk, Nordic Centre of Excellence programme on Food, Nutrition and
Health (Norway); Health Research Fund (FIS), PI13/00061 to Granada,
PI13/01162 to EPIC-Murcia, Regional Governments of Andalucía,
Asturias, Basque Country, Murcia and Navarra, ISCIII RETIC
(RD06/0020) (Spain); Swedish Cancer Society, Swedish Research
Council and County Councils of Skåne and Västerbotten (Sweden);
Cancer Research UK (14136 to EPIC-Norfolk; C570/A16491 and
C8221/A19170 to EPIC-Oxford), Medical Research Council (1000143
to EPIC-Norfolk, MR/M012190/1 to EPIC-Oxford) (United Kingdom).
ESTHER/VERDI: This work was supported by grants from the BadenWürttemberg Ministry of Science, Research and Arts, and the German
Cancer Aid. HPFS: This work is supported by the National Institutes
of Health (P01 CA055075, UM1 CA167552, R01 CA137178, R01
CA151993, R35 CA197735, K07 CA190673, P50 CA127003); NHS
is supported by the National Institutes of Health (R01 CA137178, P01
CA087969, UM1 CA186107, R01 CA151993, R35 CA197735, K07
CA190673, and P50 CA127003); and PHS by the National Institutes
of Health (R01 CA042182). MEC: This work is supported by the
National Institutes of Health (R37 CA54281, P01 CA033619, R01
CA063464). MCCS: Cohort recruitment was supported by VicHealth
and Cancer Council Victoria. GALEON: FIS Intrasalud (PI13/01136).
The MCCS was further supported by Australian NHMRC grants
(509348, 209057, 251553, 504711), and by infrastructure provided by
the Cancer Council Victoria. Cases and their vital status were ascertained through the Victorian Cancer Registry and the Australian Institute of Health and Welfare, including the National Death Index and the
Australian Cancer Database. MSKCC: The work at Sloan Kettering in
New York was supported by the Robert and Kate Niehaus Center for
Inherited Cancer Genomics and the Romeo Milio Foundation. Moffitt:
This work was supported by the National Institutes of Health (R01
CA189184, P30 CA076292); Florida Department of Health BankheadColey Grant (09BN-13); and the University of South Florida Oehler
Foundation. Moffitt contributions were supported in part by the Total
Cancer Care Initiative; Collaborative Data Services Core; and Tissue
Core at the H. Lee Moffitt Cancer Center & Research Institute, a
National Cancer Institute-designated Comprehensive Cancer Center
(P30 CA076292). NQplus: The NQplus study is sponsored by a
ZonMW investment grant (98-10030); by PREVIEW, the project PREVention of diabetes through lifestyle intervention and population studies in Europe and around the World (PREVIEW) project which
received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant no. 312057; by funds from TI
Food and Nutrition (cardiovascular health theme), a public–private
partnership on pre-competitive research in food and nutrition; and by
FOODBALL, the Food Biomarker Alliance, a project from JPI Healthy
Diet for a Healthy Life. OFCCR: As subset of ARCTIC, OFCCR is
supported by a GL2 grant from the Ontario Research Fund; the Canadian Institutes of Health Research; and the Cancer Risk Evaluation
Program grant from the Canadian Cancer Society Research Institute.
This work is supported by the Ontario Institute for Cancer Research,
through generous support from the Ontario Ministry of Research and
Innovation (Senior Investigator Awards to T.J.H. and B.W.Z.) PLCO:
This work is supported by the Intramural Research Program of the
Division of Cancer Epidemiology and Genetics and the Division of
Cancer Prevention, National Cancer Institute DHHS. Additionally, a
subset of control samples were genotyped as part of the Cancer Genetic
Markers of Susceptibility (CGEMS) Prostate Cancer GWAS (Yeager,
321
M et al. Genome-wide association study of prostate cancer identifies a
second risk locus at 8q24. Nat Genet 2007 May;39[5]:645–9), CGEMS
pancreatic cancer scan [PanScan] (Amundadottir, L et al. Genome-wide
association study identifies variants in the ABO locus associated with
susceptibility to pancreatic cancer. Nat Genet. 2009 Sep;41[9]:986 – 90,
and Petersen, GM et al. A genome-wide association study identifies
pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1
and 5p15.33. Nat Genet. 2010 Mar;42[3]:224–8), and the Lung Cancer
and Smoking study (Landi MT, et al. A genome-wide association study
of lung cancer identifies a region of chromosome 5p15 associated with
risk for adenocarcinoma. Am J Hum Genet. 2009 Nov;85[5]:679–91).
The prostate and PanScan study datasets were accessed with appropriate approval through the dbGaP online resource [http://cgems.cance
r.gov/data/] accession numbers phs000207.v1.p1 and phs000206.v3.p2,
respectively, and the lung datasets were accessed from the dbGaP website (http://www.ncbi.nlm.nih.gov/gap) through accession number
phs000093.v2.p2. Funding for the Lung Cancer and Smoking studywas
provided by National Institutes of Health, Genes, Environment and
Health Initiative (Z01 CP 010200, NIH U01 HG004446, and NIH GEI
U01 HG 004438). For the lung study, the GENEVA Coordinating
Center provided assistance with genotype cleaning and general study
coordination, and the Johns Hopkins University Center for Inherited
Disease Research conducted genotyping. SEARCH: Cancer Research
UK (C490/A16561). The Spanish study was supported by Instituto de
Salud Carlos III, co-funded by FEDER funds—a way to build Europe—
(PI14-613, PI09-1286); Catalan Government DURSI (2014SGR647);
and Junta de Castilla y León (LE22A10-2). Spain: Catalan Government
DURSI (2014SGR647); and Instituto de Salud Carlos III, co-funded
by FEDER funds―a way to build Europe (PI14-00613). The Swedish Low-risk Colorectal Cancer Study: The study was supported by the
Swedish research council (K2015-55X-22674-01-4, K200855X-20157-03-3, K2006-72X-20157-01-2); and the Stockholm County
Council (ALF project). VITAL: This work is supported by the National
Institutes of Health (K05 CA154337). WHI: The WHI program is supported by the National Heart, Lung, and Blood Institute; National Institutes of Health; United States Department of Health and Human Services through contracts (HHSN268201100046C,
HHSN268201100001C,
HHSN268201100002C,
HHSN268201100003C,
HHSN268201100004C,
HHSN271201100004C) The Genotype-Tissue Expression (GTEx)
Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health. Additional funds were provided
by the National Cancer Institute, National Human Genome Research
Institute, National Human Lung and Blood Institute, National Institute
of Drug Abuse, National Institute of Mental Health, and National Institute of Neurological Disorders and Stroke. Donors were enrolled at
Biospecimen Source Sites supported by the National Cancer Institute
and SAIC-Frederick, Inc. (SAIC-F) subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute
(10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data
Analysis, and Coordinating Center was supported by a contract
(HHSN268201000029C to The Broad Institute, Inc). Biorepository
operations were funded through an SAIC-F subcontract to Van Andel
Institute (10ST1035). Additional data repository and project management were provided by SAIC-F (HHSN261200800001E). The Brain
Bank was supported by a supplement to University of Miami
(DA006227, DA033684, N01MH000028). Statistical Methods development grants were made to the University of Geneva (MH090941 &
MH101814), the University of Chicago (MH090951, MH090937,
MH101820, MH101825), the University of North Carolina - Chapel
Hill (MH090936, MH101819), Harvard University (MH090948), Stanford University (MH101782), Washington University St Louis
(MH101810), and the University of Pennsylvania (MH101822). The
data used for the analyses described in this manuscript were obtained
from the GTEx Portal on 10/19/2016.
13
322
Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of
interest.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativeco
mmons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate
credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
References
Afanas’ev I (2011) Reactive oxygen species signaling in cancer: comparison with aging. Aging Dis 2:219–230
Akhtar-Zaidi B, Cowper-Sal-lari R, Corradin O et al (2012) Epigenomic enhancer profiling defines a signature of colon cancer.
Science 336:736–739. https://doi.org/10.1126/science.1217277
Al-Tassan NA, Whiffin N, Hosking FJ et al (2015) A new GWAS and
meta-analysis with 1000 Genomes imputation identifies novel
risk variants for colorectal cancer. Sci Rep 5:10442. https://doi.
org/10.1038/srep10442
Battle A, Mostafavi S, Zhu X et al (2014) Characterizing the genetic
basis of transcriptome diversity through RNA-sequencing of
922 individuals. Genome Res 24:14–24. https://doi.org/10.1101/
gr.155192.113
Biancolella M, Fortini BK, Tring S et al (2014) Identification and characterization of functional risk variants for colorectal cancer mapping to chromosome 11q23.1. Hum Mol Genet 23:2198–2209.
https://doi.org/10.1093/hmg/ddt584
Bien SA, Auer PL, Harrison TA et al (2017) Enrichment of colorectal
cancer associations in functional regions: insight for using epigenomics data in the analysis of whole genome sequence-imputed
GWAS data. PLoS One 12:e0186518. https://doi.org/10.1371/
journal.pone.0186518
Bird CL, Witte JS, Swendseid ME et al (1996) Plasma ferritin,
iron intake, and the risk of colorectal polyps. Am J Epidemiol
144:34–41
Broderick P, Carvajal-Carmona L, Pittman AM et al (2007) A genomewide association study shows that common alleles of SMAD7
influence colorectal cancer risk. Nat Genet 39:1315–1317. https
://doi.org/10.1038/ng.2007.18
Chen L, Chen D-T, Kurtyka C et al (2012) Tripartite motif containing
28 (Trim28) can regulate cell proliferation by bridging HDAC1/
E2F interactions. J Biol Chem 287:40106–40118. https://doi.
org/10.1074/jbc.M112.380865
Cui R, Okada Y, Jang SG, Ku JL, Park JG, Kamatani Y, Hosono N,
Tsunoda T, Kumar V, Tanikawa C, Kamatani N, Yamada R, Kubo
M, Nakamura Y, Matsuda K (2011) Common variant in 6q26-q27
is associated with distal colon cancer in an Asian population. Gut
60(6):799–805
Czene K, Lichtenstein P, Hemminki K (2002) Environmental and heritable causes of cancer among 9.6 million individuals in the Swedish Family-Cancer Database. Int J Cancer 99:260–266. https: //doi.
org/10.1002/ijc.10332
Dabkeviciene D, Jonusiene V, Zitkute V et al (2015) The role of interleukin-8 (CXCL8) and CXCR2 in acquired chemoresistance of
human colorectal carcinoma cells HCT116. Med Oncol 32:258.
https://doi.org/10.1007/s12032-015-0703-y
13
Human Genetics (2019) 138:307–326
Das S, Forer L, Schönherr S et al (2016) Next-generation genotype
imputation service and methods. Nat Genet 48:1284–1287. https
://doi.org/10.1038/ng.3656
Delaneau O, Howie B, Cox AJ et al (2013) Haplotype estimation
using sequencing reads. Am J Hum Genet 93:687–696. https://
doi.org/10.1016/j.ajhg.2013.09.002
Dunlop MG, Dobbins SE, Farrington SM et al (2012) Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nat Genet 44:770–776. https://doi.org/10.1038/
ng.2293
Eames HL, Saliba DG, Krausgruber T et al (2012) KAP1/TRIM28:
an inhibitor of IRF5 function in inflammatory macrophages.
Immunobiology 217:1315–1324. https://doi.org/10.1016/j.imbio
.2012.07.026
Favaro E, Bensaad K, Chong MG et al (2012) Glucose utilization
via glycogen phosphorylase sustains proliferation and prevents
premature senescence in cancer cells. Cell Metab 16:751–764.
https://doi.org/10.1016/j.cmet.2012.10.017
Fortini BK, Tring S, Plummer SJ, Edlund CK, Moreno V, Bresalier RS,
Barry EL, Church TR, Figueiredo JC, Casey G (2014) Multiple
functional risk variants in a SMAD7 enhancer implicate a colorectal cancer risk haplotype. PLoS One. https://doi.org/10.1371/
journal.pone.0111914
Gamazon ER, Wheeler HE, Shah KP et al (2015) A gene-based association method for mapping traits using reference transcriptome
data. Nat Genet 47:1091–1098. https://doi.org/10.1038/ng.3367
Glebov OK, Rodriguez LM, Soballe P et al (2006) Gene expression
patterns distinguish colonoscopically isolated human aberrant
crypt foci from normal colonic mucosa. Cancer Epidemiol Biomark Prev 15:2253–2262. https://doi.org/10.1158/1055-9965.
EPI-05-0694
GTEx Consortium (2013) The genotype-tissue expression (GTEx)
project. Nat Genet 45:580–585. https://doi.org/10.1038/ng.2653
GTEx Consortium (2015) Human genomics. The genotype-tissue
expression (GTEx) pilot analysis: multitissue gene regulation
in humans. Science 348:648–660. https://doi.org/10.1126/scien
ce.1262110
GTEx Consortium, Laboratory, Data Analysis & Coordinating
Center (LDACC), Analysis Working Group, Statistical Methods
Groups, Analysis Working Group et al (2017) Genetic effects
on gene expression across human tissues. Nature 550:204–213.
https://doi.org/10.1038/nature24277
Hatakeyama S (2011) TRIM proteins and cancer. Nat Rev Cancer
11:792–804. https://doi.org/10.1038/nrc3139
Hochberg Y, Benjamini Y (1990) More powerful procedures for
multiple significance testing. Stat Med 9:811–818. https://doi.
org/10.1002/sim.4780090710
Holmdahl R, Sareila O, Pizzolla A et al (2013) Hydrogen peroxide
as an immunological transmitter regulating autoreactive T cells.
Antioxid Redox Signal 18:1463–1474. https://doi.org/10.1089/
ars.2012.4734
Houlston RS, Cheadle J, Dobbins SE et al (2010) Meta-analysis of
three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and
20q13.33. Nat Genet 42:973–977. https: //doi.org/10.1038/
ng.670
Howie B, Fuchsberger C, Stephens M et al (2012) Fast and accurate
genotype imputation in genome-wide association studies through
pre-phasing. Nat Genet 44:955–959. https://doi.org/10.1038/
ng.2354
Hulur I, Gamazon ER, Skol AD et al (2015) Enrichment of inflammatory bowel disease and colorectal cancer risk variants in colon
expression quantitative trait loci. BMC Genom 16:138. https://
doi.org/10.1186/s12864-015-1292-z
Human Genetics (2019) 138:307–326
Jia W-H, Zhang B, Matsuo K et al (2013) Genome-wide association
analyses in East Asians identify new susceptibility loci for colorectal cancer. Nat Genet 45:191–196. https://doi.org/10.1038/
ng.2505
Jiao S, Peters U, Berndt S et al (2014) Estimating the heritability of
colorectal cancer. Hum Mol Genet 23:3898–3905. https://doi.
org/10.1093/hmg/ddu087
Kinnersley B, Migliorini G, Broderick P, Whiffin N, Dobbins SE,
Casey G, Hopper J, Sieber O, Lipton L, Kerr DJ, Dunlop MG,
Tomlinson IP, Houlston RS, Colon Cancer Family Registry (2012)
The TERT variant rs2736100 is associated with colorectal cancer
risk. Br J Cancer 107:1001–1008
Kumar B, Koul S, Khandrika L et al (2008) Oxidative stress is inherent
in prostate cancer cells and is required for aggressive phenotype.
Cancer Res 68:1777–1785. https://doi.org/10.1158/0008-5472.
CAN-07-5259
La Vecchia C, Decarli A, Serafini M et al (2013) Dietary total antioxidant capacity and colorectal cancer: a large case-control study
in Italy. Int J Cancer 133:1447–1451. https://doi.org/10.1002/
ijc.28133
Lewis A, Freeman-Mills L, de la Calle-Mustienes E et al (2014) A
polymorphic enhancer near GREM1 influences bowel cancer
risk through differential CDX2 and TCF7L2 binding. Cell Rep
8:983–990. https://doi.org/10.1016/j.celrep.2014.07.020
Lichtenstein P, Holm NV, Verkasalo PK et al (2000) Environmental and
heritable factors in the causation of cancer-analyses of cohorts of
twins from Sweden, Denmark, and Finland. N Engl J Med 343:78–
85. https://doi.org/10.1056/NEJM200007133430201
Lin S, Li Y, Zamyatnin AA et al (2017) Reactive oxygen species and colorectal cancer. J Cell Physiol 233:5119–5132. https://doi.org/10.1002/
jcp.26356
López-Lázaro M (2007) Dual role of hydrogen peroxide in cancer: possible relevance to cancer chemoprevention and therapy. Cancer Lett
252:1–8. https://doi.org/10.1016/j.canlet.2006.10.029
Niell BL, Long JC, Rennert G, Gruber SB (2003) Genetic anthropology
of the colorectal cancer-susceptibility allele APC I1307K: evidence
of genetic drift within the Ashkenazim. Am J Hum Genet 73:1250–
1260. https://doi.org/10.1086/379926
Noguchi K, Okumura F, Takahashi N et al (2011) TRIM40 promotes neddylation of IKKγ and is downregulated in gastrointestinal cancers.
Carcinogenesis 32:995–1004. https://doi.org/10.1093/carcin/bgr068
Orlando G, Law PJ, Palin K et al (2016) Variation at 2q35 (PNKD and
TMBIM1) influences colorectal cancer risk and identifies a pleiotropic effect with inflammatory bowel disease. Hum Mol Genet
25:2349–2359. https://doi.org/10.1093/hmg/ddw087
Peltekova VD, Lemire M, Qazi AM et al (2014) Identification of genes
expressed by immune cells of the colon that are regulated by colorectal cancer-associated variants. Int J Cancer 134:2330–2341. https
://doi.org/10.1002/ijc.28557
Peters U, Jiao S, Schumacher FR et al (2013) Identification of genetic
susceptibility loci for colorectal tumors in a genome-wide meta-analysis. Gastroenterology 144:799–807.e24. https://doi.org/10.1053/j.
gastro.2012.12.020
Peters U, Bien S, Zubair N (2015) Genetic architecture of colorectal cancer. Gut 64:1623–1636. https://doi.org/10.1136/gutjnl-2013-306705
Pittman AM, Naranjo S, Jalava SE et al (2010) Allelic variation at the
8q23.3 colorectal cancer risk locus functions as a cis-acting regulator of EIF3H. PLoS Genet 6:e1001126. https://doi.org/10.1371/
journal.pgen.1001126
Pomerantz MM, Ahmadiyeh N, Jia L et al (2009) The 8q24 cancer risk
variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat Genet 41:882–884. https://doi.org/10.1038/ng.403
Praly J-P, Vidal S (2010) Inhibition of glycogen phosphorylase in the
context of type 2 diabetes, with focus on recent inhibitors bound at
the active site. Mini Rev Med Chem 10:1102–1126
323
Sato T, Okumura F, Ariga T, Hatakeyama S (2012) TRIM6 interacts
with Myc and maintains the pluripotency of mouse embryonic stem
cells. J Cell Sci 125:1544–1555. https://doi.org/10.1242/jcs.095273
Schmit SL, Schumacher FR, Edlund CK et al (2016) Genome-wide
association study of colorectal cancer in Hispanics. Carcinogenesis
37:547–556. https://doi.org/10.1093/carcin/bgw046
Schumacher FR, Schmit SL, Jiao S et al (2015) Corrigendum: genomewide association study of colorectal cancer identifies six new
susceptibility loci. Nat Commun 6:8739. https://doi.org/10.1038/
ncomms9739
Shin Y, Kim I-J, Kang HC et al (2004) A functional polymorphism (− 347
G → GA) in the E-cadherin gene is associated with colorectal cancer. Carcinogenesis 25:2173–2176. https://doi.org/10.1093/carcin/
bgh223
Stevens RG, Jones DY, Micozzi MS, Taylor PR (1988) Body iron stores
and the risk of cancer. N Engl J Med 319:1047–1052. https://doi.
org/10.1056/NEJM198810203191603
Study COGENT, Houlston RS, Webb E et al (2008) Meta-analysis of
genome-wide association data identifies four new susceptibility
loci for colorectal cancer. Nat Genet 40:1426–1435. https://doi.
org/10.1038/ng.262
Su Y-R, Di C, Bien SA et al (2018) A mixed-effects model for powerful
association tests in integrative functional genomics: an application
to a large-scale genome-wide association study of colorectal cancer. Am J Hum Genet 102(5):904–919. https://doi.org/10.1016/j.
ajhg.2018.03.019
Tenesa A, Farrington SM, Prendergast JGD et al (2008) Genome-wide
association scan identifies a colorectal cancer susceptibility locus
on 11q23 and replicates risk loci at 8q24 and 18q21. Nat Genet
40:631–637. https://doi.org/10.1038/ng.133
Tocchini C, Keusch JJ, Miller SB et al (2014) The TRIM-NHL protein
LIN-41 controls the onset of developmental plasticity in Caenorhabditis elegans. PLoS Genet 10:e1004533. https://doi.org/10.1371/
journal.pgen.1004533
Tomar D, Prajapati P, Lavie J et al (2015) TRIM4; a novel mitochondrial
interacting RING E3 ligase, sensitizes the cells to hydrogen peroxide
(H2O2) induced cell death. Free Radic Biol Med 89:1036–1048.
https://doi.org/10.1016/j.freeradbiomed.2015.10.425
Tomlinson I, Webb E, Carvajal-Carmona L et al (2007) A genome-wide
association scan of tag SNPs identifies a susceptibility variant for
colorectal cancer at 8q24.21. Nat Genet 39:984–988. https://doi.
org/10.1038/ng2085
Tomlinson IPM, Webb E, Carvajal-Carmona L et al (2008) A genomewide association study identifies colorectal cancer susceptibility loci
on chromosomes 10p14 and 8q23.3. Nat Genet 40:623–630. https://
doi.org/10.1038/ng.111
Torres JM, Gamazon ER, Parra EJ et al (2014) Cross-tissue and tissuespecific eQTLs: partitioning the heritability of a complex trait. Am J
Hum Genet 95:521–534. https://doi.org/10.1016/j.ajhg.2014.10.001
Vaquero EC, Edderkaoui M, Pandol SJ et al (2004) Reactive oxygen
species produced by NAD(P)H oxidase inhibit apoptosis in pancreatic cancer cells. J Biol Chem 279:34643–34654. https://doi.
org/10.1074/jbc.M400078200
Versteeg GA, Benke S, García-Sastre A, Rajsbaum R (2014) InTRIMsic
immunity: positive and negative regulation of immune signaling by
tripartite motif proteins. Cytokine Growth Factor Rev 25:563–576.
https://doi.org/10.1016/j.cytogfr.2014.08.001
Wang H, Schmit SL, Haiman CA et al (2017) Novel colon cancer susceptibility variants identified from a genome-wide association study
in African Americans. Int J Cancer 140:2728–2733. https://doi.
org/10.1002/ijc.30687
Whiffin N, Hosking FJ, Farrington SM et al (2014) Identification of susceptibility loci for colorectal cancer in a genome-wide meta-analysis.
Hum Mol Genet 23:4729–4737. https://doi.org/10.1093/hmg/ddu17
7
13
324
Human Genetics (2019) 138:307–326
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for
genome-wide complex trait analysis. Am J Hum Genet 88:76–82.
https://doi.org/10.1016/j.ajhg.2010.11.011
Zaman MM-U, Nomura T, Takagi T et al (2013) Ubiquitination-deubiquitination by the TRIM27-USP7 complex regulates tumor necrosis
factor alpha-induced apoptosis. Mol Cell Biol 33:4971–4984. https
://doi.org/10.1128/MCB.00465-13
Zeng C, Matsuda K, Jia W-H et al (2016) Identification of susceptibility loci and genes for colorectal cancer risk. Gastroenterology
150:1633–1645. https://doi.org/10.1053/j.gastro.2016.02.076
Zhan W, Han T, Zhang C et al (2015) TRIM59 promotes the proliferation and migration of non-small cell lung cancer cells by upregulating cell cycle related proteins. PLoS One 10:e0142596. https://doi.
org/10.1371/journal.pone.0142596
Zhang B, Jia W-H, Matsuda K et al (2014) Large-scale genetic study in
East Asians identifies six new loci associated with colorectal cancer
risk. Nat Genet 46:533–542. https://doi.org/10.1038/ng.2985
Zanke BW, Greenwood CM, Rangrej J, Kustra R, Tenesa A, Farrington
SM, Prendergast J, Olschwang S, Chiang T, Crowdy E, Ferretti
V, Laflamme P, Sundararajan S, Roumy S, Olivier JF, Robidoux
F, Sladek R, Montpetit A, Campbell P, Bezieau S, O’Shea AM,
Zogopoulos G, Cotterchio M, Newcomb P, McLaughlin J, Younghusband B, Green R, Green J, Porteous ME, Campbell H, Blanche
H, Sahbatou M, Tubacher E, Bonaiti-Pellié C, Buecher B, Riboli
E, Kury S, Chanock SJ, Potter J, Thomas G, Gallinger S, Hudson
TJ, Dunlop MG (2007) Genome-wide association scan identifies
a colorectal cancer susceptibility locus on chromosome 8q24. Nat
Genet 39:989–994
Zhou Z, Ji Z, Wang Y et al (2014) TRIM59 is up-regulated in gastric
tumors, promoting ubiquitination and degradation of p53. Gastroenterology 147:1043–1054. https://doi.org/10.1053/j.gastr
o.2014.07.021
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Affiliations
Stephanie A. Bien1,62 · Yu‑Ru Su1,62 · David V. Conti2,3,62 · Tabitha A. Harrison1,62 · Conghui Qu1,62 · Xingyi Guo4,62 ·
Yingchang Lu4,62 · Demetrius Albanes5,62 · Paul L. Auer6,62 · Barbara L. Banbury1,62 · Sonja I. Berndt5,62 ·
Stéphane Bézieau7,8,62 · Hermann Brenner9,10,11,62 · Daniel D. Buchanan12,13,14,62 · Bette J. Caan15,62 ·
Peter T. Campbell16,62 · Christopher S. Carlson1,62 · Andrew T. Chan17,18,62 · Jenny Chang‑Claude19,20,62 ·
Sai Chen21,62 · Charles M. Connolly1,62 · Douglas F. Easton22,62 · Edith J. M. Feskens23,62 · Steven Gallinger24,62 ·
Graham G. Giles12,25,62 · Marc J. Gunter26,62 · Jochen Hampe27,62 · Jeroen R. Huyghe1,62 · Michael Hoffmeister9,62 ·
Thomas J. Hudson28,29,62 · Eric J. Jacobs16,62 · Mark A. Jenkins12,62 · Ellen Kampman23,62 · Hyun Min Kang21,62 ·
Tilman Kühn30,62 · Sébastien Küry7,8,62 · Flavio Lejbkowicz31,32,62 · Loic Le Marchand33,62 · Roger L. Milne12,25,62 ·
Li Li34,62 · Christopher I. Li1,62 · Annika Lindblom35,36,62 · Noralane M. Lindor37,62 · Vicente Martín38,39,62 ·
Caroline E. McNeil2,62 · Marilena Melas2,62 · Victor Moreno39,40,41,62 · Polly A. Newcomb1,62 · Kenneth Offit42,62 ·
Paul D. P. Pharaoh43,62 · John D. Potter1,62 · Chenxu Qu2,62 · Elio Riboli44,62 · Gad Rennert31,32,62 · Núria Sala45,46,62 ·
Clemens Schafmayer47,62 · Peter C. Scacheri48,62 · Stephanie L. Schmit49,50,62 · Gianluca Severi51,62 ·
Martha L. Slattery52,62 · Joshua D. Smith53,62 · Antonia Trichopoulou54,55,62 · Rosario Tumino56,62 ·
Cornelia M. Ulrich57,62 · Fränzel J. B. van Duijnhoven23,62 · Bethany Van Guelpen58,62 · Stephanie J. Weinstein5,62 ·
Emily White1,62 · Alicja Wolk59,60,62 · Michael O. Woods61,62 · Anna H. Wu2,3,62 · Goncalo R. Abecasis21,62 ·
Graham Casey51,62 · Deborah A. Nickerson53,62 · Stephen B. Gruber2,62 · Li Hsu1,62 · Wei Zheng4,62,63 · Ulrike Peters1,62
* Stephanie A. Bien
sbien@fredhutch.org
7
Centre Hospitalier Universitaire Hotel-Dieu, 44093 Nantes,
France
8
Service de Génétique Médiczle, Centre Hospitalier
Universitaire (CHU), 44093 Nantes, France
9
Division of Clinical Epidemiology and Aging Research,
German Cancer Research Center (DKFZ), 69120 Heidelberg,
Germany
10
Division of Preventive Oncology, German Cancer Research
Center (DKFZ) and National Center for Tumor Diseases
(NCT), 69120 Heidelberg, Germany
1
Division of Public Health Sciences, Fred Hutchinson Cancer
Research Center, Seattle, WA 98109, USA
2
USC Norris Comprehensive Cancer Center, University
of Southern California, Los Angeles, CA 90089, USA
3
Department of Preventive Medicine, Keck School
of Medicine, University of Southern California, Los Angeles,
CA 90033, USA
4
Division of Epidemiology, Vanderbilt University School
of Medicine, Nashville, TN 37232, USA
11
5
Division of Cancer Epidemiology and Genetics, National
Cancer Institute, Bethesda, MD 20892, USA
German Cancer Consortium (DKTK), 69120 Heidelberg,
Germany
12
6
Joseph J. Zilber School of Public Health, University
of Wisconsin-Milwaukee, Milwaukee, WI 53205, USA
Centre for Epidemiology and Biostatistics, Melbourne
School of Population and Global Health, The University
of Melbourne, Parkville, VIC 3010, Australia
13
Colorectal Oncogenomics Group, Department of Pathology,
University of Melbourne, Melbourne, VIC 3010, Australia
13
Human Genetics (2019) 138:307–326
325
14
Genetic Medicine and Familial Cancer Centre, The Royal
Melbourne Hospital, Parkville, VIC 3010, Australia
39
CIBER Epidemiología y Salud Pública (CIBERESP),
28029 Madrid, Spain
15
Division of Research, Kaiser Permanente Medical Care
Program of Northern California, Oakland, CA 94612, USA
40
Catalan Institute of Oncology, Bellvitge Biomedical Research
Institute (IDIBELL), 08028 Barcelona, Spain
16
Epidemiology Research Program, American Cancer Society,
Atlanta, GA 30329‑4251, USA
41
University of Barcelona, 08007 Barcelona, Spain
42
Department of Medicine, Clinical Genetics Service,
Memorial Sloan Kettering Cancer Center, New York,
NY 10065, USA
17
Division of Gastroenterology, Massachusetts General
Hospital, Harvard Medical School, Boston, MA 02114, USA
18
Channing Division of Network Medicine, Brigham
and Women’s Hospital and Harvard Medical School, Boston,
MA 02115, USA
43
Department of Public Health and Primary Care, Centre
for Cancer Genetic Epidemiology, University of Cambridge,
Cambridge CB2 1TN, UK
19
Unit of Genetic Epidemiology, Division of Cancer
Epidemiology, German Cancer Research Center (DKFZ),
69120 Heidelberg, Germany
44
School of Public Health, Imperial College London, London,
UK
45
20
Genetic Tumour Epidemiology Group, University Medical
Center Hamburg-Eppendorf, University Cancer Center
Hamburg, 20246 Hamburg, Germany
Unit of Nutrition and Cancer, Cancer Epidemiology
Research Program, Catalan Institute of Oncology-IDIBELL,
L’Hospitalet de Llobregat, 08908 Barcelona, Spain
46
Molecular Epidemiology Group, Translational Research
Laboratory, Catalan Institute of Oncology-IDIBELL,
L’Hospitalet de Llobregat, 08908 Barcelona, Spain
21
Department of Biostatistics, University of Michigan,
Ann Arbor, MI 48109, USA
22
Department of Public Health and Primary Care School
of Clinical Medicine, University of Cambridge, Cambridge,
England 01223, UK
47
Department of General and Thoracic Surgery, University
Hospital Schleswig-Holstein, Campus Kiel, 24118 Kiel,
Germany
23
Division of Human Nutrition, Wageningen University &
Research, Wageningen, The Netherlands
48
Department of Genetics and Genome Sciences, Case Western
Reserve University, Cleveland, OH 44106, USA
24
Lunenfeld Tanenbaum Research Institute, Mount Sinai
Hospital, University of Toronto, Toronto, ON 1X5, Canada
49
Department of Cancer Epidemiology, H. Lee Moffitt Cancer
Center and Research Institute, Inc, Tampa, FL 33612, USA
25
Cancer Epidemiology & Intelligence Division, Cancer
Council Victoria, Melbourne 3004, Australia
50
26
Section for Epidemiology, Department of Public Health,
Aarhus University, Aarhus, Denmark
Department of Gastrointestinal Oncology, H. Lee Moffitt
Cancer Center and Research Institute, Inc, Tampa, FL 33612,
USA
51
Medical Department 1, University Hospital Dresden, TU
Dresden, 01307 Dresden, Germany
Centre for Research in Epidemiology and Population Health,
Institut de Cancérologie Gustave Roussy, Villejuif, France
52
Department of Internal Medicine, University of Utah,
Salt Lake City, UT, USA
53
Department Genome Sciences, University of Washington,
98195 Seattle, WA, USA
54
Hellenic Health Foundation, 13 Kaisareias &
Alexandroupoleos, 115 27 Athens, Greece
55
WHO Collaborating Center for Nutrition and Health,
Unit of Nutritional Epidemiology and Nutrition in Public
Health, Department of Hygiene, Epidemiology and Medical
Statistics, Medical School, National and Kapodistrian
University of Athens, Mikras Asias 75, 115 27 Athens,
Greece
27
28
Ontario Institute for Cancer Research, Toronto, ON, Canada
29
AbbVie Inc, 1500 Seaport Blvd, Redwood City, CA 94063,
USA
30
Division of Cancer Epidemiology, German Cancer Research
Center (DKFZ), Heidelberg, Germany
31
Clalit Health Services National Israeli Cancer Control
Center, 34361 Haifa, Israel
32
Department of Community Medicine and Epidemiology,
Carmel Medical Center, 34361 Haifa, Israel
33
University of Hawai’i Cancer Center, Honolulu,
Hawaii 96813, USA
56
Department of Family Medicine and Community Health,
Case Western Reserve University, Cleveland, OH 44106,
USA
Affiliation Cancer Registry, Department of Prevention,
Azienda Sanitaria Provinciale di Ragusa, Ragusa, Italy
57
Population Sciences, Huntsman Cancer Institute,
Salt Lake City, UT 84112, USA
34
35
Department of Clinical Genetics, Karolinska University
Hospital Solna, 171 77 Stockholm, Sweden
58
Department of Medical Biosciences, Pathology, Umeå
University, Umeå, Sweden
36
Department of Molecular Medicine and Surgery, Karolinska
Institutet Solna, 171 77 Stockholm, Sweden
59
Institute of Environmental Medicine, Karolinska Institutet
Solna, 17177 Stockholm, Sweden
37
Department of Health Science Research, Mayo Clinic
Arizona, Scottsdale, AZ 85259, USA
60
Department of Surgical Sciences, Uppsala University,
75121 Uppsala, Sweden
38
Biomedicine Institute (IBIOMED), University of León, León,
Spain
13
326
Human Genetics (2019) 138:307–326
61
Discipline of Genetics, Faculty of Medicine, Memorial
University of Newfoundland, Saint John’s, NL A1B 3V6,
Canada
62
Department of Public Health Sciences, University of Virginia
School of Medicine, Charlottesville, VA 22908, USA
13
63
Vanderbilt‑Ingram Cancer Center, Vanderbilt University,
Nashville, TN 37232, USA