<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Inferring human neutral genetic variation from craniodental phenotypes</title></titleStmt>
			<publicationStmt>
				<publisher>PNAS</publisher>
				<date>07/01/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10525529</idno>
					<idno type="doi">10.1093/pnasnexus/pgad217</idno>
					<title level='j'>PNAS Nexus</title>
<idno>2752-6542</idno>
<biblScope unit="volume">2</biblScope>
<biblScope unit="issue">7</biblScope>					

					<author>Hannes Rathmann</author><author>Silvia Perretti</author><author>Valentina Porcu</author><author>Tsunehiko Hanihara</author><author>G Richard Scott</author><author>Joel D Irish</author><author>Hugo Reyes-Centeno</author><author>Silvia Ghirotto</author><author>Katerina Harvati</author><author>George Milner</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>There is a growing consensus that global patterns of modern human cranial and dental variation are shaped largely by neutral evolutionary processes, suggesting that craniodental features can be used as reliable proxies for inferring population structure and history in bioarchaeological, forensic, and paleoanthropological contexts. However, there is disagreement on whether certain types of data preserve a neutral signature to a greater degree than others. Here, we address this unresolved question and systematically test the relative neutrality of four standard metric and nonmetric craniodental data types employing an extensive computational genotype–phenotype comparison across modern populations from around the world. Our computation draws on the largest existing data sets currently available, while accounting for geographically structured environmental variation, population sampling uncertainty, disparate numbers of phenotypic variables, and stochastic variation inherent to a neutral model of evolution. Our results reveal that the four data types differentially capture neutral genomic variation, with highest signals preserved in dental nonmetric and cranial metric data, followed by cranial nonmetric and dental metric data. Importantly, we demonstrate that combining all four data types together maximizes the neutral genetic signal compared with using them separately, even with a limited number of phenotypic variables. We hypothesize that this reflects a lower level of genetic integration through pleiotropy between, compared to within, the four data types, effectively forming four different modules associated with relatively independent sets of loci. Therefore, we recommend that future craniodental investigations adopt holistic combined data approaches, allowing for more robust inferences about underlying neutral genetic variation.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Human skeletal morphology is highly diverse and varies among individuals and populations across the globe. This pattern was shaped by the complex interplay of neutral evolutionary processes (i.e. selectively neutral mutations, random genetic drift, and gene flow) and nonneutral forces related to local adaptation and developmental plasticity in response to environmental and cultural stimuli <ref type="bibr">(1)</ref><ref type="bibr">(2)</ref><ref type="bibr">(3)</ref>. Different parts of the skeleton (such as the cranium, mandible, teeth, pelvis, long bones, hands, and feet) have been shown to preserve neutral and nonneutral signatures to different degrees <ref type="bibr">(4)</ref><ref type="bibr">(5)</ref><ref type="bibr">(6)</ref><ref type="bibr">(7)</ref><ref type="bibr">(8)</ref><ref type="bibr">(9)</ref><ref type="bibr">(10)</ref><ref type="bibr">(11)</ref><ref type="bibr">(12)</ref><ref type="bibr">(13)</ref>. Overall, however, there is wide consensus that cranial and dental morphology as a whole evolved for a large part under neutrality and, thus, can be used as a proxy for reconstructing population structure and history <ref type="bibr">(14)</ref><ref type="bibr">(15)</ref><ref type="bibr">(16)</ref><ref type="bibr">(17)</ref><ref type="bibr">(18)</ref><ref type="bibr">(19)</ref><ref type="bibr">(20)</ref>. This is relevant for the study of human skeletal remains from archaeological and forensic contexts, where DNA analyses are often constrained due to poor molecular preservation, particularly in the deep fossil record and in warmer climates, or when destructive DNA sampling of fragile and rare specimens is not possible.</p><p>Morphological investigations based on craniodental features typically focus on either quantitative (hereafter, metric) or qualitative (hereafter, nonmetric) data to characterize the overall geometry of study specimens. Cranial metric data collection is performed by defining a set of homologous anatomical landmarks located on the skull and by measuring either linear dimensions between them <ref type="bibr">(21,</ref><ref type="bibr">22)</ref> or the relative position of landmark and semilandmark coordinates in two or three dimensions <ref type="bibr">(23)</ref><ref type="bibr">(24)</ref><ref type="bibr">(25)</ref><ref type="bibr">(26)</ref>. Dental metric data collection is performed in a similar fashion, usually by measuring linear lengths, widths, or diagonal dimensions at the tooth crown or at the cement-enamel junction <ref type="bibr">(27,</ref><ref type="bibr">28)</ref> or semilandmark-based crown outlines <ref type="bibr">(29,</ref><ref type="bibr">30)</ref>. Cranial nonmetric trait data collection is performed by visually scoring minor discontinuous variants, such as extra-sutural ossicles, proliferative ossifications including bridges or spurs, or variation in foramina number and location, for example <ref type="bibr">(31)</ref><ref type="bibr">(32)</ref><ref type="bibr">(33)</ref>. Similarly, dental nonmetric trait data collection is performed by observing the number of cusps and roots, or the pattern of fissures, ridges, and grooves on tooth crowns <ref type="bibr">(34)</ref><ref type="bibr">(35)</ref><ref type="bibr">(36)</ref><ref type="bibr">(37)</ref>.</p><p>Despite the popularity of all four craniodental data types in population structure and history studies, it remains poorly understood whether some preserve neutral genomic signatures to a greater degree than others. This is problematic because investigations based on different craniodental data types may arrive at markedly disparate conclusions. For example, some researchers have suggested that teeth are a "safe box" of the genetic code, much more than any other skeletal element, because they form relatively early during ontogeny and their morphology remains unchanged after full formation, making teeth less affected by external stimuli <ref type="bibr">(38)</ref>. Some have also hypothesized that metric data are more useful than nonmetric traits, because measurements can be collected in a more objective and consistent manner, whereas visual scoring of nonmetric traits can be subjective and prone to observer error <ref type="bibr">(39,</ref><ref type="bibr">40)</ref>. A vast body of literature also suggest varying levels of heritability among the different craniodental data types, with disparate amounts of genetic integration through pleiotropy, indicating that some types of data contain more independent genomic information than others <ref type="bibr">(41)</ref><ref type="bibr">(42)</ref><ref type="bibr">(43)</ref><ref type="bibr">(44)</ref><ref type="bibr">(45)</ref><ref type="bibr">(46)</ref><ref type="bibr">(47)</ref><ref type="bibr">(48)</ref>. Several studies also point out that more holistic approaches combining different craniodental data types in a single analysis capture more phenotypic and thus genomic variation, compared with using them separately <ref type="bibr">(49)</ref><ref type="bibr">(50)</ref><ref type="bibr">(51)</ref>. Lastly, it has been proposed that there are not only differences in neutrality between the craniodental data types, but also differences within a given data type. That is, some bones, single trait expressions, or functional and developmental modules conserve a stronger evolutionary neutral signal than other, more labile, regions <ref type="bibr">(10,</ref><ref type="bibr">14,</ref><ref type="bibr">17,</ref><ref type="bibr">20)</ref>.</p><p>A standard approach for quantifying the utility of a given craniodental data type in capturing a neutral genomic signature is to estimate phenotypic distances among worldwide modern human populations, on the one hand, and to compare them to neutral genomic distances estimated among the same or closely matched set of populations on the other <ref type="bibr">(1,</ref><ref type="bibr">2,</ref><ref type="bibr">52)</ref>. These analyses, hereafter termed D P -D G comparisons, have been extensively performed for cranial metric data <ref type="bibr">(14,</ref><ref type="bibr">16,</ref><ref type="bibr">17,</ref><ref type="bibr">19,</ref><ref type="bibr">20,</ref><ref type="bibr">51,</ref><ref type="bibr">53,</ref><ref type="bibr">54)</ref>, dental metric data <ref type="bibr">(18,</ref><ref type="bibr">55)</ref>, cranial nonmetric trait data <ref type="bibr">(51,</ref><ref type="bibr">56,</ref><ref type="bibr">57)</ref>, and dental nonmetric trait data <ref type="bibr">(10,</ref><ref type="bibr">15,</ref><ref type="bibr">18,</ref><ref type="bibr">58)</ref>. However, the estimated levels of neutrality of the different craniodental data types reported in previous D P -D G studies are not directly comparable, since different populations have been sampled and diverse methodological approaches for calculating between-population distances have been employed at different geospatial scales <ref type="bibr">(54)</ref>.</p><p>To date, only few D P -D G studies have attempted to systematically co-analyze the relative neutrality of different craniodental data types in a single analytical framework, thus, allowing for comparability <ref type="bibr">(18,</ref><ref type="bibr">51,</ref><ref type="bibr">56)</ref>. Those investigations found contradicting results, reporting either similar degrees of neutrality for different data types <ref type="bibr">(18)</ref> or that they were differentially associated with genomic markers <ref type="bibr">(51,</ref><ref type="bibr">56)</ref>. However, those previous studies were constrained by several factors. First, they were limited to either cranial <ref type="bibr">(51,</ref><ref type="bibr">56)</ref> or dental <ref type="bibr">(18)</ref> data and none compared all four craniodental data types together. Second, none of the previous studies assessed the utility of a mixed-type data set combining metric and nonmetric traits in a single analysis. Third, none of these studies accounted for geographically structured environmental variation that can affect phenotypic and genomic variation <ref type="bibr">(12)</ref>. Fourth, all used rather limited sets of matched populations with varying and sometimes small sample sizes without accounting for variation introduced by sampling uncertainty. Fifth, all studies compared craniodental data types with unequal numbers of variables, which leads to biased results since phenotypic distances based on many variables are more robust than those based on only a few <ref type="bibr">(10,</ref><ref type="bibr">59)</ref>. Sixth and finally, all previous studies compared phenotypic distances to a single point estimate of genetic distance, which takes all sampled genomic loci into consideration; instead, phenotypic distances should be compared with multiple equally plausible neutral genetic distances by randomly sampling genomic loci in order to account for stochastic variation inherent to a neutral model of evolution <ref type="bibr">(10,</ref><ref type="bibr">52,</ref><ref type="bibr">60)</ref>.</p><p>In this study, we address these research gaps by using a global D P -D G framework in which we jointly investigate the relative neutrality of the four different craniodental data types, plus a mixed-type data set combining all four types of data together. Our extensive computations draw on the largest existing genomic and phenotypic databases currently available, while accounting for geographically structured environmental variation, population sampling uncertainty, disparate numbers of phenotypic variables, and stochastic variation inherent to a neutral model of evolution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head><p>Mining large existing databases, we matched five different genomic and phenotypic data types for 26 modern human population samples from around the world, namely: (i) 8,821 single nucleotide polymorphisms (SNPs), (ii) 37 cranial metrics (in the form of linear dimensions, arcs, chords, and subtenses), (iii) 28 dental metrics (in the form of mesiodistal and buccolingual crown diameters), (iv) 24 cranial nonmetric traits, and (v) 25 dental nonmetric traits (Fig. <ref type="figure">1A</ref> and SI Appendix, Table <ref type="table">S1</ref>). We then estimated pairwise between-population genetic distances using Weir-Cockerham's F ST derived from the SNP data, which served as a benchmark to evaluate neutral expectations (Data Set S1). Next, we estimated pairwise between-population phenotypic distances using Mahalanobis' D 2 , generated separately from the cranial metrics, dental metrics, cranial nonmetric traits, dental nonmetric traits, and the combined craniodental data (Data Sets S2-S6). We then subjected the F ST and D 2 distances to Kruskal's nonmetric multidimensional scaling (NMDS) to visualize the matrices in a decomposed three-dimensional (3D) coordinate space, where a spatial grouping of populations indicates close affinity, and vice versa (Fig. <ref type="figure">1B-G</ref>). The MDS stress level for the F ST matrix was 0.0407, and the stress levels for the cranial metric, dental metric, cranial nonmetric traits, dental nonmetric traits, and combined craniodental data D 2 matrices were 0.0539, 0.1185, 0.1392, 0.0683, and 0.0615, respectively. All stress levels are below the acceptable threshold of 0.15, indicating that 3D captures the overall among-population variation well. To spatially orient the D 2 distance configurations similar to the F ST distance configuration, we subjected the decomposed D 2 coordinates to Procrustes superimposition to scale and rotate them to maximum similarity with the decomposed F ST coordinates by minimizing the overall sum of squared differences among populations. All 3D NMDS plots show major continental clusters of populations. The clusters appear most markedly geographically structured in the SNP data (Fig. <ref type="figure">1B</ref>) and to a similar degree in the combined craniodental data (Fig. <ref type="figure">1G</ref>), whereas clusters in the dental metric data appear least structured geographically (Fig. <ref type="figure">1D</ref>).</p><p>To formally quantify the neutral signals preserved in a given phenotypic data type, we conducted partial correlation tests to measure the degree of congruence between D 2 and F ST , while controlling for the effects of geographically structured environmental variation on phenotypic and genomic variation <ref type="bibr">(12)</ref>. Computationally, the partial correlation test design calculates the correlation of the residuals from the independent regressions D 2 &#8764; C and F ST &#8764; C, whereby C describes climatic differences among sampled population environments (Data Set S7). The resulting partial correlation value r was treated as a neutrality estimate, with an r value close to 1 indicating a higher degree of neutrality, whereas an r value near to 0 indicates a lower degree. We obtained the highest r value for the combined craniodental data (r = 0.684), followed by cranial metrics (r = 0.618), dental nonmetric traits (r = 0.592), cranial nonmetric traits (r = 0.390), and dental metrics (r = 0.223). Similar patterns were observed when comparing D 2 to F ST , while controlling for geographic distances (G) (Data set S8), albeit with slightly lower overall r values (SI Appendix, Table <ref type="table">S2</ref>). However, due to variations in sample sizes between the matched phenotypic and genomic data sets (SI Appendix, Table <ref type="table">S1</ref>), the D 2 and F ST distances are statistically biased, and in consequence, the neutrality estimate r. Therefore, to explore the effect of population sampling uncertainty, we employed a resampling procedure whereby we calculated the neutrality estimator 1,000 times, each time leaving out a randomly selected population in the phenotypic and genomic data sets and a randomly selected individual in each remaining population. We then reported the median of the resulting distribution of r values and constructed an interpercentile range accounting for 95% of the spread. The results are summarized in Table <ref type="table">1</ref> and visualized in Fig. <ref type="figure">2A</ref> using violin plots. Overall, the highest distribution of r values was again attained for the combined craniodental data, followed by cranial metrics, dental nonmetric traits, cranial nonmetric traits, and dental metrics. To statistically corroborate this finding, we conducted repeated-measures t-tests among pairs of distributions (SI Appendix, Table <ref type="table">S3</ref>) and found significant differences in the levels of neutral signals preserved in each craniodental data type (P &lt; 0.001).</p><p>The five phenotypic data sets in our analysis comprise unequal numbers of variables (SI Appendix, Table <ref type="table">S1</ref>), namely: 37 cranial metric variables; 24 cranial nonmetric trait variables; 28 dental metric variables; 25 dental nonmetric trait variables; and the combined craniodental data set comprises a summed up total of 114 variables. This imbalance hampers a direct comparison across data types, given that phenotypic analyses based on many variables are more robust than those based on only a few <ref type="bibr">(10,</ref><ref type="bibr">59)</ref>. Therefore, to create equally sized numbers of variables across all five phenotypic data sets, we calculated the neutrality estimator r for a given phenotypic data type 1,000 times, each time randomly undersampling the number of variables down to 24. This corresponds to the number of variables in the cranial nonmetric trait data set, comprising the fewest variables among all data sets compared. Phenotypic variable sampling bias correction was performed together with population sampling bias correction, to explore the combined effect of these two analytical refinements. On average, the resulting distributions of r values for the five phenotypic data types exhibit a similar relative ordering to the population sampling bias corrected r values alone, with the only difference that dental nonmetric traits show a higher preservation of neutral genomic signatures compared with cranial metrics (Table <ref type="table">1</ref>, Fig. <ref type="figure">2B</ref>). Pairwise repeated-measures t-tests (SI Appendix, Table <ref type="table">S4</ref>) confirmed that the neutral signals preserved in each craniodental data type significantly differ from one another (P &lt; 0.001).</p><p>Under a neutral model of evolution, the F ST distance matrix, used as a benchmark for our comparisons, is just one of multiple equally plausible neutral genetic outcomes produced by stochastic variation <ref type="bibr">(10,</ref><ref type="bibr">52,</ref><ref type="bibr">60)</ref>. To account for this stochasticity, we calculated F ST and thus the neutrality estimator r for a given phenotypic data type 1,000 times, each time randomly undersampling the number of SNP loci down to the same number of phenotypic variables, namely 24. This sampling strategy is consistent with population and quantitative genetics theory, where a heritable, additive, and selectively neutral phenotype is approximately as informative about population differentiation as a single neutral genomic locus, regardless of how many loci influence the phenotype <ref type="bibr">(61,</ref><ref type="bibr">62)</ref>. Loci undersampling was performed in conjunction with population sampling bias correction and phenotypic variable sampling bias correction, to investigate the combined effect of these three analytical refinements. On average, the resulting distributions of r values exhibit a similar relative ordering to those correcting for population and phenotypic variable sampling bias combined (Table <ref type="table">1</ref>, Fig. <ref type="figure">2C</ref>). Pairwise repeated-measures t-tests (SI Appendix, Table <ref type="table">S5</ref>) showed that all neutral signals differ significantly (P &lt; 0.001).</p><p>We note that overall our phenotypic data sets exhibit imbalanced distributions of sexes, with more males represented than females. This bias can be problematic as sexual dimorphism in shape, size, and trait expression may introduce variation unrelated to neutral genomic variation. Although we implemented data preprocessing steps to correct for sexual dimorphism (see Materials and Methods), we conducted two additional analyses focusing on males only. One analysis utilized size-corrected metric data (SI Appendix, Table <ref type="table">S6</ref>), while the other did not apply size correction to the metric data (SI Appendix, Table <ref type="table">S7</ref>). The results from the male subsets generally follow the same pattern as those of the complete data set, albeit with slightly wider r value intervals when controlling for population sampling uncertainty, which is expected given the overall smaller sample size.</p><p>Lastly, we acknowledge that our results may be affected by small sample sizes for certain data types, particularly dental metrics and SNPs, which were represented by only a few individuals in some populations (SI Appendix, Table <ref type="table">S1</ref>). Although our analysis accounts for sampling bias (as described above), we conducted a more cautious analysis focusing on a subset of 16 out of the 26 populations with phenotypic and genomic sample sizes of n &#8805; 10 (SI Appendix, Table <ref type="table">S8</ref>). The obtained results show patterns that generally align with those observed in the full data set, although with slightly higher overall r values. The only distinction lies in higher r values observed for cranial metrics compared with dental nonmetric traits. Therefore, in order to reconcile the findings of the full 26-population data set and the 16-population subset, we consider dental nonmetric traits and cranial metrics equally suitable for tracking neutral signatures until further samples become available for study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>To our knowledge, this study is the first to systematically co-analyze the relative utility of four widely used standard craniodental phenotypic data types in capturing neutral genomic variation, namely (i) cranial metrics, (ii) dental metrics, (iii) cranial nonmetric traits, and (iv) dental nonmetric traits, plus (v) a mixedtype data set combining all four data types together. We performed a comprehensive D P -D G comparison across 26 worldwide populations, drawing on the largest existing phenotypic and genomic data sets currently available, and incorporating a range of analytical refinements commonly neglected in previous D P -D G studies. In doing so, we demonstrated the importance of accounting for sampling uncertainty and showed that r neutrality estimates can vary substantially based on the composition of population samples and numbers of specimens included, even with large data sets as employed here. This is, for example, markedly expressed by the dental metric data in the full 26-population data set, with a point estimate of r = 0.223, which widens to a 95% range of r = 0.118-0.281 when accounting for sampling bias. We further demonstrated the importance of accounting for unevenly sized numbers of phenotypic variables when comparing relative levels of neutrality across phenotypic data sets. Specifically, in the full 26-population data set, cranial metrics exhibited higher levels of neutrality compared with dental nonmetric traits when no correction was applied, but this pattern reversed when the number of phenotypic variables was equalized across data sets through random undersampling. This result is in agreement with previous research finding that the validity of cranial metric and dental nonmetric trait distances in reflecting neutral expectations is contingent upon the number of variables employed <ref type="bibr">(10,</ref><ref type="bibr">59)</ref>. On a related note, our undersampling procedure also takes into account the practical limitations of working with skeletal remains, particularly in bioarchaeological or fossil contexts, where craniodental data are often highly fragmented, and where researchers must work with random subsets of variables. Lastly, we demonstrated the importance of accounting for stochastic variation inherent to a neutral model of evolution by randomly undersampling the SNP loci to match the number of phenotypic variables. This resulted in r neutrality estimate distributions with much wider ranges, and for the dental metric data, the 95% range was found to be r = -0.043-0.338 (in the full 26-population data set) and r = 0.067-0.612 (in the 16-population subset), with the lower bounds near zero implying nonneutral evolutionary forces. This finding therefore calls into question the validity of dental metrics as a proxy for neutral genomic markers.</p><p>Inspecting the four craniodental data types separately, our results clearly show that they are differentially associated with neutral genomic variation after accounting for population sampling uncertainty, disparate numbers of phenotypic variables, and stochastic variation inherent to a neutral model of evolution. In testing for neutrality, our estimates reveal that, overall, dental nonmetric traits and cranial metrics performed best, followed at some distance by cranial nonmetric traits, whereas dental metrics performed relatively poorly. Interestingly, these estimates do not relate to the suggested general divide in utility between cranial versus dental features, with the latter proposed to be less affected by external environmental stimuli <ref type="bibr">(38)</ref>, and nonmetric versus metric data, with the latter suggested to be less prone to observer error <ref type="bibr">(39)</ref>. Instead, our estimates are in agreement with previous quantitative genetic studies of pleiotropy in humans (or in nonhuman primates when studies on humans are not yet available), finding that the amount of independent genetic information in dental metrics <ref type="bibr">(41)</ref> and cranial nonmetric traits ( <ref type="formula">45</ref>) is low, compared with the amount of independent genetic information in cranial metrics <ref type="bibr">(44)</ref> and dental nonmetric traits <ref type="bibr">(42,</ref><ref type="bibr">43)</ref>. The relatively poor performance of dental metrics contrasts with what was proposed in a previous study using a methodological D P -D G framework similar to ours <ref type="bibr">(18)</ref>, which found that dental metrics and nonmetric traits are both comparably well-suited in tracking neutral genomic variation. The present study expands and improves upon the D P -D G investigation by Rathmann et al. <ref type="bibr">(18)</ref> in several respects. Among the most important are a more comprehensive dental nonmetric trait data set for comparison (25 versus 15 traits) and a larger set of globally distributed matched population samples (26 versus 19 populations).</p><p>Perhaps one of the most interesting findings of our study is that phenotypic inferences of neutral genomic variation are most accurate when based on a combined mixed-type data set, compared with using the four different data types separately. This result is in agreement with previous studies reporting that phenotypic inferences about genomic affinities are best served when multiple lines of evidence are jointly investigated <ref type="bibr">(49)</ref><ref type="bibr">(50)</ref><ref type="bibr">(51)</ref>. This is also expected, given that the number of variables in the mixed-type data set is many times higher than in the four different data sets separately, leading to a richer knowledge of phenotypic and thus genomic variation <ref type="bibr">(10,</ref><ref type="bibr">59)</ref>. Interestingly though, when equalizing the numbers of phenotypic variables across all data sets via undersampling, the mixed-type data still performed best. One possible explanation for this result could be that genetic integration through pleiotropy between the four data types is lower than genetic integration within the four data types, effectively forming four different modules regulated by different sets of loci that are relatively independent from each other <ref type="bibr">(63)</ref><ref type="bibr">(64)</ref><ref type="bibr">(65)</ref>. In this situation, even when just a few phenotypic variables per data type would Median (and 95% range) of 1,000 iteratively generated r values, each iteration leaving out a randomly selected population in the phenotypic and genomic data sets and a randomly selected individual in each remaining population. b Median (and 95% range) of 1,000 iteratively generated r values, each iteration randomly undersampling the number of phenotypic variables, combined with population sampling bias correction. c Median (and 95% range) of 1,000 iteratively generated r values, each iteration randomly undersampling the number of loci, combined with population and phenotype sampling bias correction.</p><p>contribute to the mixed-type data, more underlying genomic variation from different loci would still be captured than using the full phenotypic variable battery of one of the four data types individually. This hypothesis could be tested with a quantitative genetic analysis of pleiotropy in a modern human population with known pedigree structure sampled for all four cranial and dental metric and nonmetric trait data types, which to our knowledge has not been performed so far and could thus lead to exciting new research directions. We note that the reported r neutrality estimates for the different craniodental data types must be considered minimum values as they are biased toward zero. This is because we compared matched but unpaired population samples, with phenotypic and genomic data coming from different individuals; however, phenotypic and genomic distances among unpaired samples have a reduced degree of congruence, given that within-population variation is high compared with between-population variation <ref type="bibr">(66)</ref>. Nevertheless, comparing unpaired samples is a standard procedure in global scale D P -D G analyses <ref type="bibr">(7, 10, 14-20, 51, 55)</ref>, and our applied analytical correction for sampling bias (i.e. both population and specimen resampling of the phenotypic and genomic data) may account for at least some of this uncertainty. Moreover, although our large craniodental data sets comprise the most widely used metric and nonmetric trait variables in bioanthropological research, they could be complemented with additional standard and nonstandard variables proposed to be informative <ref type="bibr">(67)</ref><ref type="bibr">(68)</ref><ref type="bibr">(69)</ref><ref type="bibr">(70)</ref>. Similarly, the metric portion of our data sets, consisting of linear dimensions, arcs, cords, and subtenses, could be replaced with 3D coordinate data that better retain the geometry of the studied specimens than caliper-based measurements. Interestingly, though not fully comparable, previous D P -D G analyses based on craniodental 3D data reported neutrality levels similar to those reported here <ref type="bibr">(14,</ref><ref type="bibr">17,</ref><ref type="bibr">19,</ref><ref type="bibr">20,</ref><ref type="bibr">55)</ref>, suggesting that caliper-based measurements and 3D coordinates are equally well-suited for reconstructing genetic relationships, though our caliper-based data sets have the advantage to be many times larger.</p><p>Previous studies proposed that there are not only differences in neutrality between the four craniodental data types, but also differences among the variables within a given data type <ref type="bibr">(10,</ref><ref type="bibr">14,</ref><ref type="bibr">17,</ref><ref type="bibr">20)</ref>. Our phenotypic variable undersampling procedure takes at least some of these considerations into account and we show that neutrality estimates for a given data type differ substantially when different subsets of variables are employed, further reinforcing previous claims. Future investigations should therefore explore additional arrangements of variables beyond the five tested here. For instance, considering only cranial data, combining all nonmetric variables, utilizing variables with the highest discriminatory power, or focusing on variables associated with previously identified functional and developmental modules <ref type="bibr">(9,</ref><ref type="bibr">17,</ref><ref type="bibr">48)</ref>. We propose that testing for neutrality in all possible combinations of cranial and dental metric and nonmetric variables, as recently employed for dental nonmetric trait data <ref type="bibr">(10)</ref>, is the most promising approach, rather than restricting analysis to predefined or hypothesized arrangements only.</p><p>In conclusion, our results serve as a reference for future craniodental research, confirming that most of the traditionally used data types can be used as proxies for neutral genomic data, although some are more useful than others. We do advise, however, to carefully review the use of dental metrics in the form of standard mesiodistal and buccolingual crown dimensions only, as they may not cover sufficient independent genomic variation, at least in comparison with other craniodental data types. Importantly, instead of using the different data types separately, we advise relying on a more holistic approach by combining them together, as this maximizes genotypic coverage over different loci resulting from primarily neutral evolution. Future work in combinatorics should focus on identifying specific subsets of mixed cranial and dental metric and nonmetric traits that are the most useful for tracking human neutral genetic variation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Materials and methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Matching population samples</head><p>Materials for this study comprise five different types of data: (i) SNPs, (ii) cranial metrics, (iii) dental metrics, (iv) cranial nonmetric traits, and (v) dental nonmetric traits. All data were taken from existing databases. We matched the different data types for 26 globally distributed modern human populations for which all five types of data were available (Fig. <ref type="figure">1</ref> and SI Appendix, Table <ref type="table">S1</ref>).</p><p>Populations were chosen for inclusion in this study based on three criteria: (i) availability of n &#8805; 3 unrelated individuals per genetic sample; (ii) availability of n &#8805; 4 individuals per phenotypic sample; and (iii) sample antiquity &lt;2,000 years, to control for temporal bias. In instances where exact population matches between genotypic and phenotypic populations could not be achieved, a geographically proximate population with ethno-linguistic affinities was selected. In a few cases, closely related populations were pooled to maximize sample size. We note that the matched population samples in this study are unpaired; that is, all five types of data derive from different individuals. When possible, approximately equal numbers of adult males and females (determined osteologically) were sampled for the phenotypic data sets, to control for sexual dimorphism; however, we note that overall the phenotypic data sets are biased toward representing more males.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SNP data</head><p>SNP data were obtained from published databases, genotyped with the Affymetrix Human Origins Array (71-80). To correctly merge genotypes coming from different data sets, we ensured they were all related to the same Reference Sequence, the Genome Reference Consortium Human Build 37 (81) using, when needed, the LiftOver tool <ref type="bibr">(82)</ref>. To merge data from selected data sets, we used the plink-1.90 software <ref type="bibr">(83)</ref>. We filtered the data removing all transversions to avoid ambiguity in strand alignment (C/G or A/T), principal component analysis outliers, and first-and second-degree relative pairs. We selected only those SNPs that map to nonfunctional genomic regions and are therefore unlikely to be affected by natural selection. We applied two different filter levels for the amount of allowed missing data: first, to populations collected by</p><p>Lazaridis et al. (74), Mallick et al. (76), Pickrell and Pritchard (78), and Skoglund et al. (80), we retained only individuals with 0% missing data; second, from the other published resources, we removed individuals with &gt;10% of missing data. All filtering was performed using the plink-1.90 software (83). Finally, we converted the data set from PLINK file format into a genepop file using PGDSpider (84). The final preprocessed SNP data set comprised 857 individuals sharing 8,821 markers, with population sample representation varying from 3 to 176 individuals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cranial metric data</head><p>The cranial metric data were selected from a larger database collected by one of us (T.H.) <ref type="bibr">(85)</ref>. The data set consists of 37 measurements of the cranium recorded for each individual, in the form of linear dimensions, arcs, cords, and subtenses. All measurements were recorded following the procedures in Br&#228;uer (70) using sliding and spreading calipers. Raw measurements were converted into scale-free shape variables by dividing each measurement by the geometric mean for all the measurements in each individual <ref type="bibr">(86)</ref>. This procedure removes gross size from the data in order to assess differences in the proportionate contribution of single variables to overall cranial size and adjusts for size differences between individuals that may result from sexual dimorphism. Because size-correction requires complete cases, missing values were imputed with the k-nearest neighbor (kNN) method <ref type="bibr">(87)</ref>. kNN searches the entire data set for k = 5 individuals most similar to the one with missing data and generates a mean to replace the missing value(s). Prior to imputation, individuals with more than half of the measurements missing were removed from the analysis. In this way, we ensured that &lt;2.5% of the final data set requires imputation (down from 3.1%). Summary statistics of the kNN-imputed and size-corrected cranial metric data set are provided in Data Set S9. The final preprocessed cranial metric data set comprised 2,994 individuals, with population sample representation varying from 24 to 366 individuals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dental metric data</head><p>The dental metric data were selected from a larger database collected by one of us (T.H.) <ref type="bibr">(88)</ref>. The data set consists of mesiodistal and buccolingual crown diameters of all teeth recorded for each individual (up to a total of 28 metric variables, excluding third molars). Only right teeth were measured, but when a right tooth was missing, damaged, or affected by wear or pathology, the corresponding left antimere was measured. All measurements were recorded according to the procedures in Moorrees <ref type="bibr">(89)</ref> and Hillson (90) using a digital sliding caliper accurate to 0.01 mm. We applied the same data preprocessing steps as for the cranial metric data. First, individuals missing more than half of the measurements were removed to ensure that &lt;24.3% of the data set requires imputation (down from 57.7%). Second, missing values were imputed using the kNN algorithm <ref type="bibr">(87)</ref>. Third, raw measurements were converted into scale-free shape variables <ref type="bibr">(86)</ref> to assess differences in the proportionate contribution of individual variables to overall tooth size and to adjust for size differences that may result from sexual dimorphism <ref type="bibr">(40)</ref>. Summary statistics of the kNN-imputed and size-corrected dental metric data set are reported in Data Set S10. The final preprocessed dental metric data set comprised 909 individuals, with population sample representation varying from 4 to 185 individuals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cranial nonmetric trait data</head><p>The cranial nonmetric trait data were selected from a larger database collected for the most part by one of us (T.H.) <ref type="bibr">(91)</ref>. The data set consists of 24 discrete observations of the cranium recorded for each individual and comprises data on sutural variation, supernumerary ossicles, hypostotic and hyperostotic traits, and vessel/nerve-related traits. The scoring procedures for each trait are described elsewhere <ref type="bibr">[Hanihara et al. (91)</ref> and references therein]. Scoring followed the individual count method <ref type="bibr">(92)</ref>, where bilateral traits were counted only once per cranium, regardless of whether or not the trait appeared bilaterally. In cases where a trait was expressed asymmetrically, the side with the highest expression level was scored.</p><p>Graded trait expression scores were collapsed into simplified binary dichotomies of absence or presence based on established breakpoints [Hanihara et al. (91) and references therein]. Sex differences were found in a few traits but none of the traits differed consistently between males and females in all sampled populations and we thus analyzed both sexes together, as it has been done in previous analyses of the same data set (91). Summary statistics of the cranial nonmetric trait data set are provided in Data Set S11. The final preprocessed cranial nonmetric trait data set comprised 4,623 individuals, with population sample representation varying from 26 to 533 individuals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dental nonmetric trait data</head><p>The dental nonmetric trait data were obtained from published resources <ref type="bibr">(15,</ref><ref type="bibr">68)</ref>, whereby the majority of the samples were collected by C. G. Turner II, later augmented with samples collected by two of us (G.R.S. and J.D.I.; SI Appendix, supplementary text 1). The data set consists of 25 discrete observations of the dentition, including data on the number of cusps and roots, and the pattern of fissures, ridges, and grooves on tooth crowns. All data collectors used the Arizona State University Dental Anthropology System (ASUDAS) to record trait observations <ref type="bibr">(68,</ref><ref type="bibr">93)</ref>. The ASUDAS comprises a reference set of dental casts illustrating expression levels for various traits alongside specific instructions that ensure a standardized scoring procedure, which minimizes observer error. ASUDAS traits are routinely collected on key teeth (usually the most mesial member of a tooth district) because these are considered the most stable members in terms of development and evolution <ref type="bibr">(94)</ref>. As in the cranial nonmetric trait data set, scoring followed the individual count method <ref type="bibr">(92)</ref>. Dental trait expression scores were collapsed into simplified binary dichotomies of absence or presence based on established breakpoints <ref type="bibr">(15,</ref><ref type="bibr">68)</ref>. Dental traits of the ASUDAS have little or no sexual dimorphism, thus, it is a standard procedure to pool sexes <ref type="bibr">(42,</ref><ref type="bibr">46,</ref><ref type="bibr">68,</ref><ref type="bibr">94)</ref>. Summary statistics of the dental nonmetric trait data set are provided in Data Set S12. The dental nonmetric trait data set comprised 2,986 individuals, with population sample representation varying from 28 to 450 individuals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Estimating distances among populations</head><p>Pairwise neutral genetic distances among populations were computed from the SNP data using F ST , defined as the fixation (F ) index comparing the subset ( S ) genetic diversity within populations to the total ( T ) genetic diversity of all sampled populations. We followed Weir and Cockerham's method of moments for diploid loci and calculated F ST for each SNP individually, averaging F ST over all loci <ref type="bibr">(95)</ref>. Under this model, populations of the same size are considered to have descended from a common ancestral population, which is assumed to be in Hardy-Weinberg equilibrium (Data Set S1). Pairwise phenotypic distances were calculated from the craniodental data using Mahalanobis' D 2 distance, a model-free measure accounting for correlation among variables to avoid over-representing variation from variables that co-occur. The D 2 distance between two populations i and j is estimated as the difference between two vectors of variable averages (X i and X j ), adjusted by a pooled within-population variance-covariance matrix (S) estimated over all populations in the analysis. For the cranial and dental metrics, we estimated D 2 following Mahalanobis <ref type="bibr">(96)</ref>, where X i and X j are calculated as geometric means, and S is calculated as a pooled Pearson variance-covariance matrix weighted by population sample sizes (Data Sets S2 and S3). For the cranial and dental nonmetric traits, we estimated D 2 following Nikita <ref type="bibr">(97)</ref>, where X i and X j are calculated as probit threshold values of trait frequencies, and S is calculated as a pooled Pearson correlation matrix weighted by the sample sizes for each pair of traits (Data Sets S4 and S5). When estimating D 2 for the combined craniodental data, we first computed D 2 independently for each of the four data types, and then combined the four D 2 matrices as a weighted average based on the numbers of variables (Data Set S6). Although this approach is valuable for handling unpaired samples and accounts for correlations within the four data sets, it does not account for correlations between them. However, in our case, it may still be appropriate since previous research demonstrated that the different data types are largely independent from each other, at least when comparing cranial metrics, dental metrics, and dental nonmetric traits <ref type="bibr">(27)</ref>, or cranial nonmetric and dental nonmetric traits <ref type="bibr">(98)</ref>. In addition to model-free D 2 distances, we also calculated model-bound P ST distances, which incorporate relative estimates of effective population size (N e ; SI Appendix, Table <ref type="table">S9</ref>) and average estimates of heritability (h 2 ; SI Appendix, supplementary text 2). Results obtained with P ST show similarities to those using D 2 (SI Appendix, Table <ref type="table">S10</ref>). However, due to the challenge of validating the parameter estimates N e and h 2 , we opted to rely on D 2 in order to limit potential model bias.</p><p>Pairwise climatic distances among sampled population environments (C) were calculated as Euclidean distances based on five temperature-related variables obtained from a global climate database published in Hubbe et al. (9), using latitudes and longitudes approximated for each population sample (Data Set S7). As climate indicators for each population region, we used estimates of annual minimum temperature, annual maximum temperature, annual average temperature, maximum temperature of the warmest month, and minimum temperature of the coldest month, all measured in &#176;C. These indicators are listed for each population sample in Data Set S13.</p><p>Pairwise geographic distances (G) were calculated as geodesic distances between population latitudes and longitudes (Data Set S8).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Correlation tests</head><p>We conducted Pearson correlation tests between the off-diagonal values in any two distance matrices to measure the linear association between phenotypic (D 2 ), genetic (F ST ), climate (C), and geographic (G) distances. We used partial Pearson correlation tests based on the residuals of a previous correlation and the offdiagonal values in a third matrix to evaluate the linear association between D 2 and F ST , while controlling for either C or G. The resulting r coefficients are reported in SI Appendix, Table <ref type="table">S2</ref>. To account for population sampling uncertainty in our partial correlation tests of D 2 , F ST , and C, we calculated the r coefficients 1,000 times, each time leaving out a randomly selected population in the phenotypic and genomic data sets and a randomly selected individual in each remaining population. Additionally, to create equally sized numbers of variables across all phenotypic data sets, in each of the 1,000 iterations we randomly undersampled the number of variables down to 24, which corresponds to the number of variables in the cranial nonmetric trait data set, comprising the fewest variables among all phenotypic data sets being compared. Further, to account for stochastic variation inherent to a neutral model of evolution, in each of the 1,000 iterations we randomly undersampled the number of SNP loci down to the same number as there are phenotypic variables, namely, 24. To gauge the relative neutrality of the different phenotypic data types in a visual manner, we plotted the distributions of estimated r coefficients using violin plots. Statistical significance between</p></div></body>
		</text>
</TEI>
