Background

The mechanisms governing the genetic control of many quantitative traits are only poorly understood and have yet to be fully exploited.

Over the last 2 decades, over 150 studies identifying almost 6000 QTL in sorghum for over 220 traits have been published, producing an enormous amount of information concerning the genetic basis of quantitative traits, including their genomic location, allelic effects and epistatic interactions. To date such QTL information has not been fully exploited by sorghum improvement programs and sorghum genetic researchers world-wide. This can be attributed not only to the heterogeneous nature of QTL studies leading to variable reliability of QTL identified from one study to the next, but also to the time investment required to integrate the information and generate a comprehensive QTL database.

The precision of initial QTL identification in QTL mapping studies is influenced by many factors including population size and type, precision of the phenotyping, QTL analysis methodology and marker order, coverage and density. The resulting complexity of the QTL landscape means that delimiting the QTL confidence intervals identified across multiple highly diverse studies is a major challenge. Any attempt to project QTL locations from multiple experiments onto a single genetic or physical map must take into account factors on which the reliability of the initial QTL study depends. In addition to the precision of the initial QTL identification, any comparative QTL analysis also depends on the quality of QTL projection onto a common framework map. The publication of the reference whole genome sequence of sorghum (Paterson et al 2009) and the increasing, almost exclusive, use of sequence-based markers for genetic linkage and association mapping studies, including RFLPs, SSRs, DArTs and SNPs, has facilitated accurate determination of the peak location of the QTL, or significant SNP, on the physical genome, permitting the development of more comprehensive QTL databases.

Increasingly, over the last five years, quantitative trait dissection has been undertaken using association mapping approaches in contrast to standard QTL genetic linkage mapping in biparental populations. The information generated through GWAS studies presents new opportunities and challenges for genetic researchers. It provides researchers with opportunities to 1) increase the mapping resolution due to the increased amount of recombination available and 2) identify more allelic diversity than in a traditional bi-parental population. However, factors such as sample size, population structure, unexpected LD, small effect sizes and low allele frequency remain a challenge for association mapping approaches. The Nested Association Mapping (NAM) approach was recently described (Buckler et al 2009) in order to combine the advantages and eliminate the disadvantages of both genetic linkage QTL mapping and association mapping. It is unlikely that any single QTL, GWAS or NAM study will be able to detect all the QTL influencing the trait including those with minor effects, due to the statistical nature of the analysis and the noise inherent in the complex biological and environmental systems involved. The atlas developed here provides researchers with a new meta-analysis tool to integrate data across multiple studies and types of studies (QTL, GWAS and NAM) and across species to compare positions across studies and to determine allelic relationships among QTL, allowing for the more comprehensive dissection of the genetic architecture of complex traits. In addition the inclusion of 35 major effect genes from Mace and Jordan (2010), which are frequently used as selection targets in breeding, provides sorghum breeders the opportunity to explore unintended consequences of marker assisted selection for traits with co-locating QTL. The atlas includes information on approximately 6000 individual QTL and significant marker-trait associations from 150 studies. On a study-by-study basis, information has been collated on the population type and size, the total number of markers mapped and the analysis methodology used, in addition to the details of the QTL, or significant SNP, including significance level, R2 value, flanking or most significant marker, and allele effect. The locations of the QTL and significant marker trait associations reported were aligned to six different maps; the sorghum consensus map (Mace et al 2009), sorghum genome assemblies v1, v2 and v3 (Paterson et al 2009), the maize genome assembly (B73 agpv2; Schnable et al 2009) and the rice genome assembly, O. sativa subsp. japonica (Release 7; Goff et al 2002; Yu et al 2002). The QTL atlas provides information on the predicted gene models underlying the QTL CI, across all sorghum genome assembly gene sets (Sbi1.4 gene set, v2.1 gene sets and v3.0 gene set) and maize and rice and also links to the recently published sorghum resequencing resource (Mace et al 2013) to provide information on the diversity of the underlying genes and information on signatures of selection.

A reported QTL was not projected if its flanking markers in a particular study were inconsistent with the order and chromosome on the consensus map and the whole genome sequence assemblies. The confidence intervals (CI) for the projected QTL have been estimated based on the following formulae, when the relevant information was provided in the original publication:

CI = 530/(NR2) for F2 (described by Darvasi & Soller 1997)

CI = 163/(NR2) for RI (described by Guo et al. 2006)

where N is the number of lines in the mapping population and R2 is the proportion of phenotypic variation explained by the identified QTL.