Short communications

← vista completa

population analysis of the CLOCK rs3749474T-rs4864548A haplotype and its relationship with obesity

Análisis poblacional in silico del haplotipo CLOCK rs3749474T-rs4864548A y su relación con obesidad


It has been suggested that the rs3749474T/rs4864548A haplotype of the CLOCK gene increases the risk of obesity, but the population variability of these alleles and the haplotype is unknown. This research aims to determine the linkage between the rs3749474T and rs4864548A alleles from the database of 1000Genomes to confirm the existence of the TA haplotype polymorphisms of these alleles and their frequency in five macro populations. Linkage disequilibrium and haplotype frequencies for 2504 individuals from 26 populations were analyzed using the r statistic and Fisher's exact test. There is a high frequency of the TA haplotype in Latin America (44.8%), a high linkage disequilibrium (r2= 0.92) worldwide between these alleles, a high differentiation between macro populations, and a high homogeneity. The evidence warrants further studies on the association between this haplotype and the risk of obesity and overweight in Latin American populations.

Main messages

  • The rs3749474T/rs4864548A haplotype of the CLOCK gene increases the risk of obesity, but population variability is unknown.
  • This study provides background on the frequency of the rs374947474T/rs4864548A haplotype of the CLOCK gene in 26 populations among five macropopulations.
  • There is a high frequency of the TA haplotype in Latin America, high linkage disequilibrium, high differentiation between macro populations, and high homogeneity within populations.
  • This study is limited to in silico analysis of databases and warrants further primary studies to confirm the association of the TA haplotype with obesity.


The CLOCK gene regulates circadian cycles through its encoded protein. Its polymorphisms have been associated with obesity, hypercholesterolemia, and hyperglycemia, among other conditions [1]. The single-nucleotide polymorphisms (SNPs) rs3749474T and rs4864548A of the CLOCK gene have been independently linked to a higher body mass index, increasing the risk of overweight and obesity by up to 1.5-fold [2,3,4,5,6].

In the study by Pino-Astorga et al., a subject with both polymorphisms (rs3749474T and rs4864548A) was overweight and had a high waist circumference despite having normal eating habits [7]. This implies an additive effect between both risk alleles, suggesting the existence of the TA haplotype.

This research is a preliminary study aiming to determine the linkage between the rs3749474T and rs4864548A risk alleles from the 1000genome database to confirm the existence of the TA haplotype of the rs3749474-rs4864548 polymorphisms of the CLOCK gene.



The genotypes of SNPs rs3749474 and rs4864548 of the CLOCK gene were obtained from 2504 individuals from the 1000Genomes project database in phase 3 [8]. The total chromosome 4 data was downloaded, and then the rs3749474 and rs4864548 genotypes were extracted for each individual using VCFtools software. In that project, all genomes were sequenced at low coverage (4x). For data validation, 24 were sequenced at high coverage (50x). The five macro populations included in 1000Genomes were included in this study: Africa, East Asia, South Asia, Europe, and Latin America, together with the 26 populations within them (Table 1).

Information on the macro populations and populations analyzed in this study and the frequency of the TA haplotype in each of them.
View table

The sample size for a margin of error of 5% and a confidence interval of 95% was n = 385. Only Latin American was the macropopulation with a sample size below this value, corresponding to 347 participants. For this sample size, the margin of error is 5.15%.

Statistical analysis

At the current stage of development, the 1000Genomes database contains "in-phase" information on genotypes. Therefore, it is possible to reconstruct the allele distribution on each parental chromosome. Using an algorithm written in R, we identified the alleles of both SNPs for the 5008 chromosomes analyzed and then estimated the frequency of the four haplotypes according to populations and macro populations.

Considering the ancestral and derived alleles for both SNPs and their respective allele frequencies, the recombinant and non-recombinant haplotypes and their respective population frequencies were determined. Then, recombination frequency and linkage disequilibrium were estimated through the r2 statistic. Differences in haplotype frequencies were analyzed using Fisher’s exact test.


The following frequencies were obtained worldwide for the four haplotypes of SNPs of rs3749474 and rs4864548: TA = 37.6%, CG = 62.2%, CA = 0.08%, and TG = 0.08%. Considering these frequencies and that the CG haplotype is ancestral according to the information in 1000Genomes, it is evident that TA and CG are the parental haplotypes in the present populations. Thus, the recombination frequency for these two loci is 0.16 %. Linkage disequilibrium was high (r2= 0.92).

The frequency of TA in the populations is given in Table 1 and Figure 1. The lowest frequency of TA haplotype genotypes is found in Africa, followed by Europe, Latin America, South Asia, and East Asia. Comparing genotypic frequencies between all pairs of populations confirms the high divergence from Africa. There are significant differences (p < 0.05) in all comparisons except between Latin America and South Asia (p < 0.62). However, there is high homogeneity within the five macro populations. Peru has a statistically significant higher frequency of TA relative to the other Latin American populations (p < 0.05).

Frequency of the TA haplotype in each population and macro population analyzed in this study.

Source: Prepared by the authors of this study.
Full size


The high level of linkage between the rs3749474T and rs4864548A risk alleles of the CLOCK gene reported here reinforces the observation of the study by Pino-Astorga et al. [7], as both risk alleles are jointly inherited in approximately 44.8% of the chromosomes in the Latin American population. In comparison, in the remaining 55.2%, the non-risk CG haplotype is inherited.

The high divergence of the African macro population could be due to the interaction of dietary patterns and biological adaptation [9] and dietary changes experienced during the food transition [10]. The differences among the remaining four macro populations could be due to specific selective pressures during the settlement of the different continents.

The high homogeneity between populations within each macro population is striking, which reinforces our hypothesis of heterogeneity between macro populations. The high divergence observed in the Lima population, which exhibited a high frequency of the TA haplotype, requires further analysis. One possible explanation for this is the greater accumulation of alleles of Asian origin, considering the history of settlement and miscegenation in Peru.

One limitation of the present work is the existence of population particularities that are not covered here. An example of this is the existence of different ancestries within macro populations, as is the case of indigenous populations in Latin America. This need becomes evident when observing the high frequency of the TA haplotype in Lima, Peru. This country presents a higher Native American ancestry than the other Latin American populations studied here. Despite this limitation, the evidence collected establishes the existence of the TA haplotype rs3749474-rs4864548 with a relatively high frequency and linkage disequilibrium, a candidate marker for a risk study of overweight and obesity in the Latin American population.


The high level of linkage between the rs3749474T and rs4864548A risk alleles, together with their relatively high frequency in the Latin American population, allow us to confirm the existence of the TA haplotype for these polymorphisms. This finding warrants further research at a population level to establish its relationship with the development of obesity and overweight.