<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Comparative bioacoustics: a roadmap for quantifying and comparing animal sounds across diverse taxa</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>08/01/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10298788</idno>
					<idno type="doi">10.1111/brv.12695</idno>
					<title level='j'>Biological Reviews</title>
<idno>1464-7931</idno>
<biblScope unit="volume">96</biblScope>
<biblScope unit="issue">4</biblScope>					

					<author>Karan J. Odom</author><author>Marcelo Araya‐Salas</author><author>Janelle L. Morano</author><author>Russell A. Ligon</author><author>Gavin M. Leighton</author><author>Conor C. Taff</author><author>Anastasia H. Dalziell</author><author>Alexis C. Billings</author><author>Ryan R. Germain</author><author>Michael Pardo</author><author>Luciana Guimarães Andrade</author><author>Daniela Hedwig</author><author>Sara C. Keen</author><author>Yu Shiu</author><author>Russell A. Charif</author><author>Michael S. Webster</author><author>Aaron N. Rice</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Animals produce a wide array of sounds with highly variable acoustic structures. It is possible to understand the causes and consequences of this variation across taxa with phylogenetic comparative analyses. Acoustic and evolutionary analyses are rapidly increasing in sophistication such that choosing appropriate acoustic and evolutionary approaches is increasingly difficult. However, the correct choice of analysis can have profound effects on output and evolutionary inferences. Here, we identify and address some of the challenges for this growing field by providing a roadmap for quantifying and comparing sound in a phylogenetic context for researchers with a broad range of scientific backgrounds. Sound, as a continuous, multidimensional trait can be particularly challenging to measure because it can be hard to identify variables that can be compared across taxa and it is also no small feat to process and analyse the resulting high-dimensional acoustic data using approaches that are appropriate for subsequent evolutionary analysis. Additionally, terminological inconsistencies and the role of learning in the development of acoustic traits need to be considered. Phylogenetic comparative analyses also have their own sets of caveats to consider. We provide a set of recommendations for delimiting acoustic signals into discrete, comparable acoustic units. We also present a three-stage workflow for extracting relevant acoustic data, including options for multivariate analyses and dimensionality reduction that is compatible with phylogenetic comparative analysis. We then summarize available phylogenetic comparative approaches and how they have been used in comparative bioacoustics, and address the limitations of comparative analyses with behavioural data. Lastly, we recommend how to apply these methods to acoustic data across a range of study systems. In this way, we provide an integrated]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION: THE NEED FOR COMPARATIVE BIOACOUSTICS</head><p>Animals exhibit a bewildering diversity of complex and highly variable sounds used for diverse communicative functions. These sounds range from the long, low-frequency modulated rumbles of forest elephants to the short, rapid burst pulses of dolphins, or the highly variable mimicry and species-specific songs of superb lyrebirds (Menura novaehollandiae) (Fig. <ref type="figure">1</ref>; <ref type="bibr">Dalziell, 2012;</ref><ref type="bibr">Dalziell et al., 2013;</ref><ref type="bibr">de Andrade et al., 2017;</ref><ref type="bibr">Keen et al., 2017;</ref><ref type="bibr">Hedwig, Verahrami &amp; Wrege, 2019;</ref><ref type="bibr">Dalziell et al., in press)</ref>. Such variation exists among distantly related taxa as well as among close relatives. For example, within both the New World blackbirds (family: Icteridae) and birds-of-paradise (family: Paradisaeidae), songs vary from pure-tone whistles to intricate combinations of broadband notes, clicks, and buzzes <ref type="bibr">(Price &amp; Lanyon, 2002;</ref><ref type="bibr">Ligon et al., 2018)</ref>. Such variation can reflect the underlying morphology and physiology of animal sound production mechanisms, adaptations to different transmission properties of the environment, as well as variation in sexual and other social selective pressures <ref type="bibr">(Wiley, 1982;</ref><ref type="bibr">Devoogd et al., 1993;</ref><ref type="bibr">Andersson, 1994;</ref><ref type="bibr">Podos, 1996;</ref><ref type="bibr">Blumstein &amp; Armitage, 1997;</ref><ref type="bibr">Bradbury &amp; Vehrencamp, 2011;</ref><ref type="bibr">Suthers et al., 2016;</ref><ref type="bibr">Taylor, Charlton &amp; Reby, 2016)</ref>. Evaluating how these constraints and selective pressures relate to extreme variation of acoustic signals is fundamental to addressing how diversity in this widespread communication modality arose <ref type="bibr">(Rendell et al., 1999;</ref><ref type="bibr">Cardoso &amp; Hu, 2011;</ref><ref type="bibr">Riede &amp; Goller, 2014)</ref>. Biological Reviews 96 (2021) 1135-1159 &#169; 2021 Cambridge Philosophical Society.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Comparative bioacoustics of diverse animal sounds</head><p>In the digital age, extensive access to acoustic media and complementary morphological, environmental, and lifehistory data sets gives us the data necessary to address questions about signal evolution at previously impossible scales (e.g. <ref type="bibr">Wilman et al., 2014;</ref><ref type="bibr">Dale et al., 2015;</ref><ref type="bibr">Mason et al., 2017b;</ref><ref type="bibr">Miller et al., 2019)</ref>. Combined with rapid advances in comparative phylogenetic analyses, statistical and bioacoustics software packages, and well-resolved phylogenies constructed from genomic data, we now have an unprecedented toolkit to address such questions with continuously improving precision <ref type="bibr">(Charif, Waack &amp; Strickman, 2010;</ref><ref type="bibr">Jetz et al., 2012;</ref><ref type="bibr">Revell, 2012;</ref><ref type="bibr">Rabosky et al., 2014)</ref>. Recent largescale comparative studies of signal evolution have made major advances in our thinking about the underlying evolutionary pressures shaping elaborate traits <ref type="bibr">(Am&#233;zquita et al., 2009;</ref><ref type="bibr">Farris &amp; Ryan, 2011;</ref><ref type="bibr">Odom et al., 2014;</ref><ref type="bibr">Riede &amp; Goller, 2014;</ref><ref type="bibr">Tobias et al., 2014;</ref><ref type="bibr">Dale et al., 2015;</ref><ref type="bibr">Rabosky et al., 2018)</ref>. However, studies that evaluate sound as a continuously varying character in a phylogenetic framework have only started to become common recently (e.g. <ref type="bibr">Goutte et al., 2016;</ref><ref type="bibr">Mason et al., 2017a;</ref><ref type="bibr">Billings, 2018;</ref><ref type="bibr">Garc&#237;a &amp; Tubaro, 2018;</ref><ref type="bibr">Ligon et al., 2018;</ref><ref type="bibr">Charlton, Owen &amp; Swaisgood, 2019</ref>). Yet such studies are important to elucidate nuanced evolutionary patterns shaping animal sounds. However, because bioacoustics and comparative biology are both specialized fields, researchers with the skills to quantify sound do not necessarily have the background for phylogenetic analyses and vice versa. With the current bioacoustics data and phylogenetic methods available, we are at a crossroads where assembling and evaluating the tools available for comparative analysis of animal sounds could enable a broader range of researchers to quantify and compare acoustic signals in a phylogenetic framework.</p><p>There are a number of challenges specific to phylogenetic analyses of animal acoustic signals, including identifying appropriate metrics for comparing diverse acoustic structures, inconsistent terminology across studies and taxa, and analytical challenges associated with the resulting highdimensional data <ref type="bibr">(Busnel, 1963;</ref><ref type="bibr">Hopp, Owren &amp; Evans, 1998;</ref><ref type="bibr">Bradbury &amp; Vehrencamp, 2011;</ref><ref type="bibr">Suthers et al., 2016;</ref><ref type="bibr">Adams &amp; Collyer, 2018)</ref>. Sound, as a highly variable multidimensional trait can be difficult to quantify, especially in concise ways that capture overall acoustic structure and facilitate comparison across taxa. Nevertheless, for comparative acoustic analyses it is essential to identify and measure homologous or comparable units of sound so that various acoustic structures can be detected and processed consistently across taxa <ref type="bibr">(Lauder, 1986;</ref><ref type="bibr">ten Cate, Lachlan &amp; Zuidema, 2013;</ref><ref type="bibr">Russo, Ancillotto &amp; Jones, 2018)</ref>. However, animal sounds can range from subtle structural variation to a seeming complete lack of similar acoustic features between species, not to mention variation in syntax or element spacing, which makes direct comparison of signals difficult (e.g. <ref type="bibr">Goicoechea, De La Riva &amp; Padial, 2010;</ref><ref type="bibr">Dunn et al., 2011;</ref><ref type="bibr">Matthews et al., 2012;</ref><ref type="bibr">Katahira et al., 2013;</ref><ref type="bibr">Ligon et al., 2018)</ref>. In addition, the terminology used to investigate animal sounds varies, including the use of multiple terms for the same metric or acoustic unit, single terms with multiple definitions, or non-mutually exclusive terms <ref type="bibr">(Marler, 1961</ref><ref type="bibr">(Marler, , 1967;;</ref><ref type="bibr">Broughton, 1963;</ref><ref type="bibr">Thompson, LeDoux &amp; Moody, 1994;</ref><ref type="bibr">Deecke &amp; Janik, 2005;</ref><ref type="bibr">Cholewiak, Sousa-Lima &amp; Cerchio, 2013;</ref><ref type="bibr">K&#246;hler et al., 2017)</ref>. Thus, researchers risk terminological entanglements when trying to compare sounds across species, and especially across more diverse taxa. Lastly, the properties of sound itself present challenges for comparative bioacoustic analyses. Sound is a multidimensional signal with continuous variation, making it difficult to sample all aspects of an acoustic signal accurately <ref type="bibr">(Hopp et al., 1998;</ref><ref type="bibr">Deecke &amp; Janik, 2005;</ref><ref type="bibr">K&#246;hler et al., 2017)</ref>. Handling the resulting highly dimensional data in ways that are compatible with existing phylogenetic comparative analyses is a further challenge. The set of metrics collected and how they are prepared for analysis can also vary depending on the kind of phylogenetic comparative analysis conducted <ref type="bibr">(Uyeda, Caetano &amp; Pennell, 2015;</ref><ref type="bibr">Adams &amp; Collyer, 2018)</ref>. Therefore, careful consideration is needed at each of these stages of acoustic analysis to ensure that the resulting acoustic measurements accurately reflect the original animal signals and are comparable in a phylogenetic context.</p><p>Our goal is to collate the main bioacoustics approaches and comparative phylogenetic analyses useful for quantifying and comparing animal sounds in a phylogenetic context. This review is divided into two main sections: (i) approaches for quantifying animal sounds and (ii) approaches for phylogenetic comparative analysis of animal sounds. The section on quantifying animal sounds includes a synopsis of common terms and metrics used to compare animal sounds, discussion of techniques for handling and consolidating multiple acoustic variables, including data-reduction techniques, and pitfalls and best practices of acoustic analyses with diverse animal sounds. The section on phylogenetic comparative analysis summarizes types of phylogenetic comparative analyses, including examples of comparative acoustic studies that have used each type. The current limitations of phylogenetic comparative analyses are also discussed. At the end of both sections, we recommend and outline best practices for acoustic and comparative phylogenetic analyses. We draw most heavily from literature pertaining to vertebrate vocalizations, especially birds, for which a bulk of literature exists; however, we expect the acoustic and comparative analyses described here to be broadly applicable to all animals. Our intention is to provide a roadmap for phylogenetic comparative analysis of animal sounds aimed at improving the accuracy and ease with which highly variable animal sounds can be quantified and compared across species.</p><p>We note that the term 'signal' has variable meanings within evolutionary biology and animal communication [e.g. signal evolution <ref type="bibr">(Endler, 1992)</ref>, phylogenetic signal <ref type="bibr">(Blomberg, Garland &amp; Ives, 2003)</ref>, behavioural signals <ref type="bibr">(Bradbury &amp; Vehrencamp, 2011)</ref>], and a separate meaning in acoustics, with origins in signal processing <ref type="bibr">(Morfey, 2001)</ref>. In the remainder of this review, we primarily restrict the use of the term 'signal' to meanings consistent with signal processing, as one major focus is on extraction and analysis of acoustic parameters. Unless specified, we remain agnostic about the evolutionary or animal communication interpretations of the sounds we discuss, although we acknowledge that species-specific experimental studies are often needed to verify whether animal sounds are signals in the animal communication sense of the term (e.g. <ref type="bibr">Bradbury &amp; Vehrencamp, 2011)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. APPROACHES FOR QUANTIFYING ANIMAL SOUNDS</head><p>Because animal sounds vary in multiple dimensions, it is important to choose acoustic traits a priori that exist in the majority of species being examined, but that are variable and biologically relevant. We think of this as a two-step process. First, decide how consistently to delimit (separate) the acoustic units that will be compared. Second, choose metrics that accurately capture the variation that exists in the signals of interest. Both steps require becoming familiar with the acoustic variation in the model system and keeping the research question in mind when choosing appropriate acoustic units and metrics.</p><p>(1) Choosing which acoustic units to compare At the most basic level, comparing acoustic structure across taxa requires identifying and defining the type of acoustic signals to be compared. This requires some familiarity with the organisms being studied, because most animals produce multiple sounds that are structurally and functionally distinct <ref type="bibr">(Gerhardt &amp; Huber 2002;</ref><ref type="bibr">Marler &amp; Slabbekoorn, 2004;</ref><ref type="bibr">Catchpole &amp; Slater, 2008;</ref><ref type="bibr">Cardoso, 2012;</ref><ref type="bibr">Cholewiak et al., 2013;</ref><ref type="bibr">Russo &amp; Voigt, 2016;</ref><ref type="bibr">Smotherman et al., 2016)</ref>. The broadest acoustic unit that will be compared is usually dictated by the research question (e.g. Do bat echolocation calls vary with habitat type? Which features of frog advertisement calls can be used to distinguish among species? Does the syntax of wolf howls vary with context?). The first step in an acoustic analysis is deciding whether the research question, and thus the comparative analysis, is concerned with the entire acoustic repertoire of a given taxon, or a specific subset of those signals. For example, while megadermatid bats use echolocation calls to hunt prey, they also produce a large complex repertoire of mostly lower-frequency calls used during intraspecific interactions <ref type="bibr">(Leippert, 1994;</ref><ref type="bibr">Hanrahan 2020;</ref><ref type="bibr">Hanrahan et al., in press)</ref>. Therefore, researchers interested in how echolocation calls vary with habitat type in this family will need to be able to distinguish between echolocation calls and non-echolocation acoustic social signals. Decisions about which acoustic units to compare should also take into account the function of the vocalizations, when known, because different selective pressures likely act on functionally different vocalizations <ref type="bibr">(Greig, Price &amp; Pruett-Jones, 2013;</ref><ref type="bibr">Greig &amp; Webster, 2014)</ref>. At the same time, if the ultimate comparative question is one of function or selective pressures, it is important to avoid circularity in the criteria used to select samples and the hypotheses being tested (i.e. evolutionary response to ambient noise measured on parameters affected by background noise). A broad enough range of vocalizations to test your hypotheses should be included. For example, if investigating whether a vocalization is sexually or naturally selected, analysing vocalizations exclusively used in mate choice might not provide much insight.</p><p>Once the research question and signals of interest have been identified, the next step is to partition the acoustic signals into component structures that can be analysed and compared across the species of interest <ref type="bibr">(ten Cate et al., 2013)</ref>. Animal sounds are frequently hierarchically organized (e.g. elements structured into syllables, structured into multi-syllabic vocalizations, which in turn can be structured into vocal bouts). Within certain taxa, careful attention has been paid to develop terminology to describe acoustic signals and their component structures (e.g. <ref type="bibr">Thompson et al., 1994;</ref><ref type="bibr">Cholewiak et al., 2013;</ref><ref type="bibr">K&#246;hler et al., 2017)</ref>. However, even within taxa, acoustic terminology can be inconsistent (e.g. <ref type="bibr">Thompson et al., 1994;</ref><ref type="bibr">Catchpole &amp; Slater, 2008;</ref><ref type="bibr">Bonnevie &amp; Craig, 2018)</ref> and across taxa there is little consensus on terminology (but see <ref type="bibr">Busnel, 1963)</ref>. It is beyond the scope of this review to provide a comprehensive list of terminology or reconcile past inconsistencies. Luckily, more recent treatments have provided guidelines for delimiting acoustic units regardless of taxa or terminology <ref type="bibr">(Kershenbaum et al., 2016)</ref>. Table <ref type="table">1</ref> provides a glossary of broad terms that we have identified as useful for categorizing diverse animal sounds into acoustic units and hierarchical levels, some of which are illustrated in Fig. <ref type="figure">1A</ref>.</p><p>Which acoustical units are compared and how to delimit them must be based on the research question. In some cases, measuring a broad acoustic unit is designated by the research question. For example, for research questions about total duration or frequency modulation over an entire multisyllabic bird song, the entire song is the appropriate acoustic unit to measure <ref type="bibr">(Podos, 1997;</ref><ref type="bibr">Dalziell &amp; Cockburn, 2008)</ref>. In other instances, researchers may want to measure multiple hierarchical levels of acoustic structure. For example, for a project investigating whether song syntax differs across whale populations, researchers may want to gather data at the element level (or the smallest acoustic unit) while keeping track of higher levels of organization, as that will offer flexibility to match when the same elements are or are not grouped together in a higher level.</p><p>Our general recommendation for delimiting animal sounds is to separate multi-component or hierarchically structured sounds into obvious, discrete acoustic units that are shared across the taxa of interest. There are multiple approaches for delimiting acoustic signals, including differentiating among acoustic units based on (i) silent intervals (breaks) in the acoustic signal, (ii) changes in the acoustic properties of the signal (e.g. a transition from a pure-tone to a broadband signal), or (iii) series of similar sounds that appear to be grouped together, such as grouping trilled notes, pulse trains, or syllables (see Fig. <ref type="figure">2</ref> in <ref type="bibr">Kershenbaum et al., 2016)</ref>. Whichever approach is used, we recommend delimiting the component signals into the smallest possible unit (i.e. elements; Table <ref type="table">1</ref>), particularly for broad comparative studies of fine acoustic structure when hierarchical patterning or perceptual capabilities of the included taxa are not known a priori, as this is often the least subjective approach. Also, in most cases, additional higher-level hierarchical units can be re-created from element-level measurements simply through concatenation (e.g. measurements for a call or song can be calculated from the component elements or syllables). In addition, packages and algorithms can often be used to quantify syntax or sequential patterning of elements or vocalizations within animal signals <ref type="bibr">(Kershenbaum et al., 2016)</ref> and automated methods for classifying the overall structure of acoustic signals removes observer bias or subjectivity associated with manual classification of acoustic signals <ref type="bibr">(Deecke, Ford &amp; Spong, 1999;</ref><ref type="bibr">Keen et al., 2014;</ref><ref type="bibr">Wadewitz et al., 2015;</ref><ref type="bibr">see Section II.2c)</ref>. For an extensive review of approaches for syntax analysis with animal sounds, see <ref type="bibr">Kershenbaum et al. (2016)</ref>.</p><p>Nevertheless, we recommend that researchers also pay attention to and possibly delimit higher levels of acoustic Table <ref type="table">1</ref>. Some common terms used to describe animal acoustic signals. Note that there is little standardized agreement of terms used to describe animal sounds within and across taxa, so our terms and definitions may not be representative for all taxa. In addition, researchers specializing in certain taxa may use these terms differently. The definitions provided here are based on structural variation, but context is also likely important in differentiating among some of these vocalization types</p><p>Acoustic unit Definition Synonyms References Element Smallest unit of sound; a discrete, continuous sound with defined start and end; discrete sound separated from other elements by silence or abrupt changes in the spectral energy distribution; related to the sound mechanism: muscle activation and firing Note, call, syllable, pulse, acoustic unit, unit Gerhardt (1998); Catchpole &amp; Slater (2008); Cholewiak et al. (2013); Hedwig et al. (2014); Smotherman et al. (2016); K&#246;hler et al. (2017) Syllable Sequence of one or more elements repeated rendered together in the same pattern sequence Element group, note group Catchpole &amp; Slater (2008); Weir &amp; Wheatcroft (2011); Lachlan et al. (2013) Call Single element or repeated sequence of one or more elements or syllables Note, syllable Cardoso (2012); Cholewiak et al. (2013); Marler &amp; Slabbekoorn (2004) Song Stereotyped or otherwise distinctive pattern of elements or phrases. Songs are typically considered to have elaborate structure (multiple elements and/or element types) Call, strophe, theme, motif, verse Marler &amp; Slabbekoorn (2004); Catchpole &amp; Slater (2008); Cholewiak et al. (2013); Smotherman et al. (2016) Vocal bout A performance of songs/calls rendered discrete either by time (bouts separated by pauses) or distinctive acoustic features (i.e. 'types of bouts' or 'singing modes') Song session, dawn recital, song sequence, singing mode</p><p>structure (e.g. syllables, phrases, calls, songs) when commonly observed in their study system or designated by the research question. Higher acoustic organization is often biologically meaningful and may be useful to quantify. For example, many songbirds perform trills (rapidly repeated series of elements) or combine elements into stereotyped syllables.</p><p>Researchers that measured all silent intervals within serin (Serinus serinus) songs found a bimodal distribution representing the short intervals within repeated, stereotyped syllables and the longer intervals between non-syllable elements <ref type="bibr">(Mota &amp; Cardoso, 2001)</ref>. They used this distribution to establish a threshold to distinguish among syllables versus less-stereotyped, independent elements within songs and they successfully applied this approach in a comparative framework <ref type="bibr">(Mota &amp; Cardoso, 2001;</ref><ref type="bibr">Cardoso &amp; Mota, 2007)</ref>. Also, we note that other disciplines recommend neurological, production, perception, or function-based approaches for delimiting acoustic structure <ref type="bibr">(Spector, 1994;</ref><ref type="bibr">Gerhardt &amp; Huber, 2002;</ref><ref type="bibr">Blumstein, 2010;</ref><ref type="bibr">Owren, Rendall &amp; Ryan, 2010;</ref><ref type="bibr">Peshek &amp; Blumstein, 2011;</ref><ref type="bibr">Bonnevie &amp; Craig, 2018)</ref>.</p><p>(2) Choosing which acoustic metrics to collect</p><p>Animal sounds exhibit a range of frequencies that vary through time. Even the smallest unit of animal sound (i.e. an element) can vary in multiple dimensions, ranging from a long pure tone that can be easily represented by frequency and time measurements to a short broadband harsh or 'buzzy' sound that may be best distinguished by its harmonic structure, and much more <ref type="bibr">(Price, Earnshaw &amp; Webster, 2006;</ref><ref type="bibr">Tyson, Nowacek &amp; Miller, 2007;</ref><ref type="bibr">Charlton, 2015)</ref>. Therefore, even within a set of clearly defined acoustic units, choosing a set of acoustic measurements that capture the range of ways the animal sounds vary while being well suited for the research question at hand is not always straightforward (e.g. <ref type="bibr">Mason &amp; Burns, 2015;</ref><ref type="bibr">Billings, 2018)</ref>.</p><p>Here we provide a workflow for extracting and quantifying animal sounds across species in preparation for phylogenetic comparative analysis (Fig. <ref type="figure">2</ref>). Quantitative assessments of animal sounds are typically based on measurements from sound spectrograms, which are visual representations of the relative power at different frequencies in a sound over time (Appendix S1). This framework involves three stages or classes of metrics ordered by the amount of processing involved: (i) signal analysisthe extraction of measurements of frequency, amplitude, time, or energy distributions from spectrograms of the acoustic signal; (ii) derived metric analysisthe calculation and extraction of measurements that capture overall structure, structural variation, or syntax of a vocalization or vocal bout, usually derived from measurements taken or acoustic software procedures performed during signal analysis; and (iii) multivariate analysis. This third class includes multivariate procedures used to create comparisons or composites of signal and derived metrics between two or more sounds (e.g. similarity or distance scores created by spectrogram cross-correlation, cluster analysis, or mapping features in multi-dimensional space). Multivariate analysis also includes data reduction and classification of acoustic units into distinct categories as a precursor for phylogenetic or additional acoustic analysis. Derived metrics can usually be differentiated from multivariate metrics in that derived metrics are calculated for specific acoustic units (individual elements, syllables, songs) in a study whereas multivariate metrics are computed as comparisons across pairs of acoustic units or by combining data from all acoustic units.</p><p>Depending on the research question, all three types of analyses can be prepared to evaluate a broad range of acoustic variables. Both signal and derived measurements can contribute to multivariate analysis, and then one, two, or all three of these metrics can be used in subsequent comparative analysis (Fig. <ref type="figure">2</ref>). However, not all comparative analyses of sound require all three steps, nor is inclusion of all possible Fig. <ref type="figure">2</ref>. Workflow for extracting and quantifying animal sounds across species in preparation for phylogenetic comparative analysis. Signal analysis (yellow) is the extraction of measurements directly from the acoustic signal, which can be compiled to compute derived metrics (blue), calculations from signal measurements that capture overall structure, structural variation, or syntax of a vocalization or vocal bout. Signal and derived metrics can subsequently be integrated into multivariate analysis (green) that combines multiple features into multiple dimensions. Both signal and derived metrics can be combined into multivariate analysis in preparation for phylogenetic comparative analyses (orange). Separation among the three stages of acoustic analysis is not always clearcut (e.g. some derived metrics may be extracted from the signal or multivariate analyses may be used to calculate derived metrics), and is represented by the graded colours between these categories. acoustic variables necessarily appropriate. For example, in some studies it is most appropriate to input the raw signal variables directly into phylogenetic comparative analyses, either because the raw variables are closely tied to the research question (e.g. effects of habitat on frequency: <ref type="bibr">Billings, 2018)</ref> or multivariate analyses may not be appropriate <ref type="bibr">(Mason et al., 2017a)</ref>. In addition, the separation of acoustic metrics into one class of metric or the other is not always clear-cut. For example, element number is a quantity that can be counted directly on the spectrogram (signal) but may be considered a derived metric because it can also be calculated automatically from the number of selections made during signal analysis. Similarly, principal components created from multivariate analysis may replace signal analysis variables in phylogenetic comparative analyses. Therefore, we picture all three sets of acoustic metrics as somewhat overlapping, but we think of them as distinct in the process by which they are computed (Fig. <ref type="figure">2</ref>; Table <ref type="table">2</ref>; see online Supporting Information, Table <ref type="table">S1</ref>).</p><p>We provide figures and tables to help navigate this framework for acoustic analysis, including additional explanation and examples of acoustic metrics ( <ref type="table">Figs 1-3; Tables 2</ref> and <ref type="table">S1</ref>). Fig. <ref type="figure">1</ref> provides example animal sounds that illustrate key acoustic metrics. Fig. <ref type="figure">2</ref> illustrates a schematic of the analytical workflow, while Fig. <ref type="figure">3</ref> is a decision tree with recommendations for data processing and reduction depending on the data structure and the comparative analyses to be performed. Table <ref type="table">2</ref> provides a list of signal, derived, and multivariate Table <ref type="table">2</ref>. Commonly used quantitative acoustic variables that can be extracted or calculated from animal acoustic signals for phylogenetic comparative analysis. These quantitative variables may come from direct signal analysis (yellow) or calculations of derived (blue) or multivariate (green) metrics, and may be overlapping. See Table <ref type="table">S1</ref> for more extensive definitions, uses, and best practices for these metrics metrics that can be collected and Table <ref type="table">S1</ref> provides more extensive explanations for each variable, including additional references and recommendations for best practices. Below we describe each of the three classes of acoustic analysis in more detail.</p><p>If the analysis includes many acoustic variables which may contribute to differentiation of different taxa to varying extents, without a priori knowledge of which variables are most relevant, it may be more helpful to combine all metrics into multivariate analyses. Obvious caveats to such broad multivariate procedures include that interpreting how specific variables contribute to the analysis, and thus their biological interpretation, can be challenging (although parameters such as variable loading or importance can help with this; <ref type="bibr">Ramasubramanian &amp; Singh, 2016)</ref>.</p><p>(a) Signal analysis For both single-species studies and comparative analyses evaluating signal structure, the most common approach is to measure multiple acoustic features (signal metrics) per individual or species <ref type="bibr">(Darling &amp; Sousa-Lima, 2005;</ref><ref type="bibr">Rice &amp; Bass, 2009;</ref><ref type="bibr">Cardoso &amp; Atwell, 2011;</ref><ref type="bibr">Greig et al., 2013;</ref><ref type="bibr">Tobias et al., 2014;</ref><ref type="bibr">Mason et al., 2017a)</ref>. Typical features range from frequency or amplitude measured within each element or vocalization (i.e. measurements of frequency, time, or amplitude; e.g. minimum, maximum, or peak frequency/amplitude) to differences between single (point) measurements or calculations of energy distributions per element or vocalization (e.g. frequency bandwidth, duration, entropy, or harmonicity; Table <ref type="table">S1</ref>; <ref type="bibr">Calder, 1990;</ref><ref type="bibr">Podos, 2001;</ref><ref type="bibr">Cardoso, 2010;</ref><ref type="bibr">Blumstein &amp; Chi, 2012;</ref><ref type="bibr">Kershenbaum, 2014;</ref><ref type="bibr">Lachlan, Ratmann &amp; Nowicki, 2018)</ref>. Most acoustic software programs can extract these metrics from designated acoustic units, although the acoustic units to extract these features from often need to be specified by hand or in a semi-supervised manner. Researchers may also count discrete features, such as numbers of elements, syllables, or vocalizations (to calculate vocalization rates), number of frequency inflections, or instances of a particular feature, such as biphonation or trills (Fig. <ref type="figure">1</ref>; Tables <ref type="table">2</ref> and <ref type="table">S1</ref>). For frequency measurements, Cardoso (2013) recommends analysing frequency on a logarithmic scale, as it provides a more accurate representation of animal sound perception and better reflects the relationship between animal body size and resonating frequency. Measuring frequency on a logarithmic scale is particularly important for comparative studies across terrestrial vertebrate species, as frequency analysis on a linear scale could bias results and overestimate differences in frequency <ref type="bibr">(Cardoso, 2013)</ref>. As with most analytical decisions, implementation will depend on the research question.</p><p>To capture more precisely how signals vary over the duration of an element or other acoustic unit, researchers also compare time series of signal measurements (i.e. vectors of acoustic features sampled throughout the signal, such as frequency contours; Fig. <ref type="figure">1B</ref>; Table <ref type="table">S1</ref>; Kogan &amp; Margoliash, Fig. <ref type="figure">3</ref>. Decision tree for treatment of acoustic data for phylogenetic analysis, depending on (A) the kinds of acoustic variables in your analysis (signal: yellow, derived: blue, or multivariate metrics: green), (B) the number of acoustic variables in your data set, and (C) the kinds of phylogenetic analysis (orange) to be carried out. We make recommendations on data handling, as well as indicate potential caveats of certain analyses in red.</p><p>1998; <ref type="bibr">Tchernichovski et al., 2000;</ref><ref type="bibr">McCowan, Hanser &amp; Doyle, 2002;</ref><ref type="bibr">Lachlan et al., 2013;</ref><ref type="bibr">Meliza, Keen &amp; Rubenstein, 2013;</ref><ref type="bibr">Wang et al., 2013)</ref>. While metrics like frequency contours provide a promising method to measure and reconstruct signal structure more precisely than point measurements, they have limitations. First, frequency contours are a vector of values, rather than a single value. Therefore, they need to be transformed in order to compare the overall similarity of each contour to all other contours [e.g. time series analysis, such as dynamic time warping (DTW); <ref type="bibr">Wang et al., 2013]</ref>. DTW is an algorithm that allows flexible comparison of disparate structures by stretching and aligning the signal over time <ref type="bibr">(Kogan &amp; Margoliash, 1998;</ref><ref type="bibr">Lachlan, 2007;</ref><ref type="bibr">Meliza et al., 2013)</ref>, the product of which is a proximity matrix (a pairwise matrix comparing each signal to all other signals). The resulting proximity matrix then usually needs to be further transformed to extract a set of values that meaningfully and independently represents each contour's shape. This is usually done with multidimensional scaling or other matrix vectorization procedures (see Section II.2c). In addition, the accuracy of frequency contours is vulnerable to signal quality. Noise or reverberation can lead to inaccurate contour tracing, especially at the ends of notes when reverberation masks the actual signal structure. Similarly, tight harmonic stacking coupled with frequency and/or amplitude modulation can lead to inaccurate frequency contour estimation, such that the trace of the contour 'jumps' among harmonics. Solutions include manually editing troublesome note structures (warbleR function 'seltailor'; Araya-Salas &amp; Smith-Vidaurre, 2017), pitch-tracking algorithms, or measuring only the fundamental or dominant frequency (soundgen function 'analyze'; <ref type="bibr">Anikin, 2019)</ref>. While recent studies have successfully employed time series data <ref type="bibr">(Lachlan et al., 2013</ref><ref type="bibr">(Lachlan et al., , 2018))</ref>, classifying sounds using machine-learning algorithms on a wide variety of acoustic features can in some cases outperform DTW <ref type="bibr">(Keen et al., 2014)</ref>. Nevertheless, combining frequency contours with spectrogram cross-correlation (pairwise comparison of amplitude matrices in the bidimensional time-frequency space, 'sliding' one spectrogram over the other and calculating a correlation at each step) and feature analysis (analysis of individual metrics) may improve overall classification beyond any one of these approaches on its own (M. Araya-Salas, unpublished data). For these and many other reasons, we generally recommend evaluating the performance of time series data, and consistency and accuracy of all automated acoustic metrics prior to statistical analysis.</p><p>Finally, certain irregular animal signals may require special attention or analysis, such as subharmonics or biphonation (Fig. <ref type="figure">1E</ref>; <ref type="bibr">Wilden et al., 1998;</ref><ref type="bibr">Tokuda et al., 2002;</ref><ref type="bibr">Tyson et al., 2007;</ref><ref type="bibr">Charlton, Watchorn &amp; Whisson, 2017)</ref>. While rare across species, these naturally produced complex acoustic structures appear to be common in certain taxa [e.g. oscine birds <ref type="bibr">(Zollinger, Riede &amp; Suthers, 2003)</ref>, cetaceans <ref type="bibr">(Filatova et al., 2007;</ref><ref type="bibr">Tyson et al., 2007)</ref>]. One approach to measure such non-linear phenomena is to divide the signal into separate acoustic units where transitions to different acoustic structures occur. More sophisticated approaches are also available (see <ref type="bibr">Tokuda et al., 2002)</ref>.</p><p>In addition, there may be benefits to measuring sound while taking into account perceptual capabilities of the animals being studied. For certain taxa, algorithms and tools have been developed to measure the amplitude of signals within predefined frequency bands that reflect species' auditory ranges <ref type="bibr">(Lyon &amp; Ordubadi, 1982;</ref><ref type="bibr">Fraile &amp; Godino-Llorente, 2014)</ref>. For example, researchers can apply a series of band-pass filters to a power spectrum to extract a series of spectral bands and the relative bandwidths of the filters can be adjusted to reflect the fact that many animals are better at discerning differences between certain frequencies than others. Typically, the modified spectrum is then mathematically transformed into a cepstrum, and a series of numbers that describe the cepstrum, known as cepstral coefficients, are used as acoustic features with which to compare sounds. The most commonly used type of cepstral coefficient, Melfrequency cepstral coefficients, are derived from a filter bank that approximates the frequency response of the human auditory system, but have been used successfully to classify the vocalizations of other mammals as well as birds <ref type="bibr">(Picone, 1993;</ref><ref type="bibr">Cowling &amp; Sitte, 2003;</ref><ref type="bibr">Darch, Milner &amp; Shao, 2004;</ref><ref type="bibr">Sandsten, Gro&#223;e Ruse &amp; J&#246;nsson, 2016)</ref>. Coupling comparative analyses with taxon-specific acoustic perceptual models, especially for taxa that perceive sound in ways that are very different from how we perceive sound, could aid our understanding of signal evolution and ensure that we are comparing salient acoustic features <ref type="bibr">(Clemins &amp; Johnson, 2006;</ref><ref type="bibr">Ren et al., 2009)</ref>.</p><p>(b) Derived metric analysis Derived metrics are acoustic measurements computed primarily from summary statistics or other calculations from signal measurements. Examples include estimates of repertoire size, element diversity, coefficients of variation, or syntactical patterns. For some derived metrics, the researcher may need to conduct calculations on a species-by-species basis or design a specific analysis of interest <ref type="bibr">(Cardoso &amp; Hu, 2011;</ref><ref type="bibr">Geberzahn &amp; Aubin, 2014;</ref><ref type="bibr">Podos et al., 2016;</ref><ref type="bibr">Garc&#237;a &amp; Tubaro, 2018;</ref><ref type="bibr">Cardoso, 2019)</ref>. In other instances, general soundanalysis packages are available in R that are useful both for feature extraction and computing basic derived metrics of sounds (e.g. warbleR calculates vocalization-level measurements of vocalization duration, element rate, and averaged frequency parameters from the component element selections; Araya-Salas, Smith-Vidaurre &amp; Webster, 2017). Automated methods and R packages also exist for calculating more complicated parameters such as repertoire size estimation and syntax <ref type="bibr">(Kershenbaum, Freeberg &amp; Gammon, 2015;</ref><ref type="bibr">Wadewitz et al., 2015;</ref><ref type="bibr">Harris et al., 2016;</ref><ref type="bibr">Kershenbaum et al., 2016;</ref><ref type="bibr">Luttrell, Gallagher &amp; Lohr, 2016)</ref>. Recent application of more advanced techniques borrowed from ecology and economics, including rarefaction, offer particularly promising methods for estimating repertoire size <ref type="bibr">(Peshek &amp; Blumstein, 2011;</ref><ref type="bibr">Kershenbaum et al., 2015)</ref>. Multivariate methods can also be applied to automate the process of classifying acoustic units into categories for syntax analysis or determining element diversity. For example, in some instances cluster analysis, random forest, and other classification algorithms can efficiently and reliably classify vocalizations into discrete types <ref type="bibr">(Greig et al., 2013;</ref><ref type="bibr">Keen et al., 2014;</ref><ref type="bibr">Sandsten et al., 2016)</ref>. The steps involved in these approaches are discussed in more detail in Section II.2c. We recommend automated procedures for categorizing animal sounds whenever possible because this streamlines acoustic analysis, reduces observer bias, and improves reproducibility <ref type="bibr">(Salisbury &amp; Kim, 2001;</ref><ref type="bibr">Botero et al., 2008;</ref><ref type="bibr">Wadewitz et al., 2015;</ref><ref type="bibr">Pearse et al., 2018</ref>; but see <ref type="bibr">Mikula, Petruskov&#225; &amp; Albrecht, 2018)</ref>. Nevertheless, we recognize this depends on first adequately delimiting sounds into the appropriate acoustic units. While methods like band-limited energy detection exist to detect and delimit breaks in signals, they often require a high signal-to-noise ratio and work better on some signals than others. Thus, separating sounds into acoustic units is a required first step which currently often has to be done by hand for field recordings. Furthermore, in certain circumstances human classification may be the best available method (e.g. if sample sizes are small or vocal units contain higher hierarchical-levels of acoustic organization). In cases in which human classification is used, we recommend calculating a measure of inter-observer reliability <ref type="bibr">(Kaufman &amp; Rosenthal, 2009)</ref>. We provide a catalogue of some of the most commonly investigated derived metrics in Tables <ref type="table">2</ref> and <ref type="table">S1</ref>.</p><p>(c) Multivariate analysis Multivariate metrics can be computed from signal and/or derived metrics to produce reduced sets of combined variables or to classify or map acoustic features in multidimensional space. The multivariate metrics we will discuss include creating reduced sets of vectors produced from data-reduction techniques such as principal component analysis (PCA) or multidimensional scaling (MDS), which can then be input directly into subsequent analyses. Useful multivariate approaches for comparative bioacoustics also include creation and analysis of matrices of similarity scores produced by spectrogram cross-correlation, random forest classification, or DTW of frequency contours (e.g. <ref type="bibr">Keen et al., 2014</ref>; Table <ref type="table">2</ref>). We will also discuss the use of multivariate analyses to create derived metrics from highdimensional measurements. This can include classification of acoustic units into discrete vocalization types to calculate repertoire size or categorize vocalizations prior to syntax analysis, and also includes data reduction to create fewer signal-measurement vectors <ref type="bibr">(Hedwig et al., 2014;</ref><ref type="bibr">Wadewitz et al., 2015;</ref><ref type="bibr">Mason et al., 2017a)</ref>. Lastly, multivariate analyses can be used to compute derived metrics in multidimensional space, such as calculations of the area surrounding sets of sounds plotted in feature spacea useful proxy for signal diversity <ref type="bibr">(Nelson &amp; Marler, 1990;</ref><ref type="bibr">Tobias et al., 2014;</ref><ref type="bibr">Ligon et al., 2018)</ref>. Accordingly, the multivariate methods explained here include both data reduction and classification procedures.</p><p>Here we briefly summarize some popular multivariate approaches for data reduction and classification. Common data-reduction procedures include PCA, factor analysis (FA), t-distributed stochastic neighbor embedding (t-SNE), MDS (also called principal coordinates analysis, PCoA), and random forest (RF), whereas common classification procedures also include RF, as well as linear discriminant analysis (LDA), neural networks, and cluster analysis <ref type="bibr">(Johnson &amp; Wichern, 1982;</ref><ref type="bibr">Izenman, 2008;</ref><ref type="bibr">Legendre &amp; Legendre, 2012;</ref><ref type="bibr">Ramasubramanian &amp; Singh, 2016)</ref>. In data reduction, the goal is to produce vectors representing a reduced set of variables from a larger, potentially correlated set of variables. Classification analyses, on the other hand, are used to classify a set of signals into categories based on sets of acoustic variables. Both approaches can also be used to map acoustic signals in feature space (PCA, FA, t-SNE, MDS, RF, cluster analysis). An additional important distinction among these analyses is whether they are supervised or unsupervised <ref type="bibr">(Ramasubramanian &amp; Singh, 2016)</ref>. Supervised approaches are those in which the classes of the response variable are known a priori and, with the explanatory variables (in this case, features of the signal), are used to predict the class of the signal [e.g. LDA, supervised RF <ref type="bibr">(Armitage &amp; Ober, 2010;</ref><ref type="bibr">Ramasubramanian &amp; Singh, 2016;</ref><ref type="bibr">Tharwat et al., 2017)</ref>]. Conversely, unsupervised methods are used when the classes of the input signals are not known a priori, but may be an aim of the analysis (e.g. PCA, FA, t-SNE, MDS, unsupervised RF, neural networks, cluster analysis). Unsupervised approaches, such as t-SNE and MDS, can also be used to visualize and explore relationships among variables in the data in a space with fewer dimensions than present in the input data <ref type="bibr">(Ramasubramanian &amp; Singh, 2016)</ref>. Moreover, data visualization procedures, such as Uniform Manifold Approximation and Projection (UMAP) may also prove useful for assessing separation of variables in acoustic space <ref type="bibr">(Parra-Hern&#225;ndez et al., 2020)</ref>. When using such approaches for dimensionality reduction, however, careful attention should be paid that the method preserves between-object distance, as data visualization methods such as t-SNE and UMAP may sacrifice global structure in order to preserve local variance. The specific applications of each of these analyses differs and certain analyses may be better designed for certain data structures. For example, for behavioural questions, PCA is usually more appropriate than FA <ref type="bibr">(Budaev, 2010;</ref><ref type="bibr">Wadewitz et al., 2015)</ref>. Additionally, each of these methods has their own underlying assumptions and applications, which should be understood before implementing the above analyses <ref type="bibr">(Johnson &amp; Wichern, 1982;</ref><ref type="bibr">Izenman, 2008)</ref>.</p><p>Some analyses, such as combining acoustic metrics that have multiple data structures (e.g. combining raw point measures with vectors of frequency contours or proximity matrix output from spectrogram cross-correlation) require multiple multivariate procedures. Two multi-step multivariate procedures we see as particularly valuable to comparative bioacoustics are (i) conversion of proximity matrix data into tabular vectors (or new data columns representing the "coordinates" of each signal in multidimensional space) and (ii) using acoustic feature space to estimate repertoire size or element diversity. Common procedures that require the conversion of proximity matrices to vectors, include DTW, RF analysis, or spectrogram cross-correlation analysisas their output is pairwise comparisons (proximity matrices) of all acoustic units included in the analysis. Once these matrix data are converted to vectors, they can then be used as independent variables for further analysis or to plot sounds in acoustic feature space. For DTW, this process first involves interpolating the frequency contours to be the same number of points per signal before conducting the DTW, RF, or spectrogram cross-correlation analyses. MDS or t-SNE can be used to convert the resulting proximity matrix data to a set of tabular vectors. By plotting any tabular vector data, acoustic feature spaces can be created, from which acoustic areas, overlap, or distances can be calculated.</p><p>If the analysis goal is to classify signals into categories or calculate element diversity, unsupervised RF or clustering methods are effective. To calculate element diversity or acoustic variability from acoustic area, first perform one of the procedures described in the above paragraph. Then take the area surrounding the points for the acoustic signals or taxon of interest in feature space (i.e. acoustic area). Similarly, repertoire size or element diversity could be estimated from cluster analysis. Here, the optimal number of categories defined in the clustering procedure is interpreted to be the repertoire size or number of signal types for the taxon. However, we advise researchers to use caution, as discrete clustering may not be possible for species with large repertoires or continuously varying acoustic signals. In such instances, silhouette coefficients can help to quantify how similar signals in their own cluster are compared to signals in other clusters to evaluate the discreteness of vocal repertoires across taxa (e.g. <ref type="bibr">Hedwig et al., 2014)</ref>. Among the most commonly used methods for data clustering are: k-means clustering, fuzzy k-means, and hierarchical clustering. Clustering analyses to determine repertoire size will likely need to be repeated on a species-byspecies basis, whereas calculating element diversity from acoustic area requires including all species of interest in the feature space so that the magnitudes are comparable.</p><p>Above we outline a range of multivariate techniques to produce reduced or combined variables for phylogenetic comparative analysis. A very important caveat is that multivariate procedures often rotate and recombine data such that the original data structure, and therefore underlying evolutionary signal, can be lost <ref type="bibr">(Uyeda et al., 2015)</ref>. Especially when the planned phylogenetic analyses will involve evaluating underlying evolutionary structure (i.e. phylogenetic signal or diversification rate analyses) or make implicit assumptions about it, then multivariate approaches that take the underlying evolutionary structure into account should be used <ref type="bibr">(Adams &amp; Collyer, 2018)</ref>. Phylogenetic PCA (pPCA) is used to replace regular PCA while incorporating underlying evolutionary structure. A variety of other multivariate approaches that incorporate phylogeny have also been created and tested; however, their performance varies, including pPCA <ref type="bibr">(Harmon &amp; Glor, 2010;</ref><ref type="bibr">Uyeda et al., 2015;</ref><ref type="bibr">reviewed in Adams &amp; Collyer, 2018)</ref>. We strongly recommend understanding the weaknesses of these analyses before employing them, and evaluating comparative analyses using both multivariate and the raw component variables until these methods are improved <ref type="bibr">(Mason et al., 2017a;</ref><ref type="bibr">Adams &amp; Collyer, 2018)</ref>. We think that phylogenetically controlled morphometric analyses show great potential to allow sound to be analysed in a multidimensional space, as described above <ref type="bibr">(Catalano, Goloboff &amp; Giannini, 2010)</ref>. Lastly, additional useful phylogenetically controlled statistical procedures that exist include calculations of phylogenetic signal for high-dimensional data <ref type="bibr">(Adams, 2014a)</ref>, phylogenetic Mantel tests <ref type="bibr">(Harmon &amp; Glor, 2010)</ref>, and phylogenetic MANOVA, and ANCOVA <ref type="bibr">(Revell, 2012;</ref><ref type="bibr">Goolsby, 2015;</ref><ref type="bibr">Fuentes-G et al., 2016)</ref>.</p><p>We also ask researchers to be cautious when choosing multivariate approaches and input variables in general <ref type="bibr">(Bj&#246;rklund, 2019)</ref>. As with any analysis, multivariate approaches can be sensitive to the quality and coverage of the input data. Specifically, researchers should be conscientious about the inclusion of variables with high collinearity into multivariate approaches; while certain approaches, especially PCA and RF, are robust to this <ref type="bibr">(Afanador et al., 2016)</ref>, PCA output can be weighted towards similar variables that are more heavily represented in the analysis, especially when correlated. For example, if many moderately correlated frequency measurements are included in a PCA with a few time or rate variables, the first principal component will often be a composite of the frequency measurements because of their prevalence in the analysis. For researchers interested in determining the relative importance or weighting of component variables in acoustic classification, RF and implementing variable importance rankings could be a useful alternative to PCA. These caveats are important to keep in mind when considering research questions and what subsequent information can be extracted from the analysis, and we encourage researchers to conduct exploratory analyses to evaluate the quality and contribution of input variables to such multivariate approaches. Nevertheless, as classification and data-reduction methods diversify and improve, we envision that combining wide varieties of acoustic metrics using multiple multivariate techniques or mapping acoustic features in multi-dimensional spaces will become powerful ways to quantify variation among acoustic signals prior to comparative analysis.</p><p>(3) Tools and software Dedicated software for measuring fine-scale features of acoustic signals include Raven Pro, Avisoft-SASlab Pro, Praat, Luscinia, SoundRuler, Syrinx, and KOE. MATLAB also has a signal-processing toolbox and a variety of acoustics packages exist in R (e.g. seewave, tuneR, warbleR, Rraven, Rpraat, soundgen; reviewed in <ref type="bibr">Sueur, 2018)</ref>. Sound-editing software can also be used to measure certain aspects of sound manually, as well as to annotate recordings. Popular software for sound editing includes Audacity and Adobe Audition.</p><p>For data reduction and downstream analysis, again a large number of packages are available in R (e.g. stats, vegan, MASS, Rtsne, fpc, pvclust, mclust, randomForest, ranger, caret). Some of these methods can also be run on graphical user interface (GUI)-based software (e.g. JMP, SPSS), although creating similarity matrices from tabular matrices usually requires command line software packages (i.e. programming in Python or R).</p><p>(4) Caveats for measuring sound Appropriate acoustic analysis begins with understanding sound transmission and the recording process. We provide a detailed explanation of important methodological considerations about recording parameters and standardizing recordings before acoustic analysis in Appendix S1. One important point to recognize is that field recordings collected across a wide range of habitats and by different recordists will likely vary greatly in recording quality and format. Such variation can cause artifacts during acoustic analysis, but given proper attention before sound analysis, they can usually be overcome. We recommend standardizing sample rate and bit depth across all recordings prior to analysis, especially if recordings come from multiple sources. Recordings with low signal-to-noise ratio (SNR; e.g. recordings with a high degree of background noise or faint signals) can significantly affect the precision of certain acoustic parameters, and so such recordings should typically be avoided (Araya-Salas et al., 2017; Table <ref type="table">S1</ref>, Appendix S1). Recordings that appear to contain distortion (e.g. over-amplification, aliasing; see Appendix S1) should also be avoided. In sound collections, choosing recordings ranked of a certain quality or higher can help avoid such issues (e.g. <ref type="bibr">Billings, 2018)</ref>. Recordings collected or stored with 'lossy' compression (compression with irreversible information loss; e.g. mp3) can distort acoustic measures of single extreme values (e.g. peak frequency) and affect the precision of DTW analysis (Araya-Salas et al., 2017), although the resulting measurement error could be less problematic for across-species comparisons. Nonetheless, proper parameter selection can avoid some of these issues, but using uncompressed recordings (e.g. WAV file format) or recordings with lossless compression, such as FLAC file format, is preferable.</p><p>Researchers should also adopt practices for standardizing collection of acoustic metrics. First and foremost, spectrogram parameters (window settings) should be standardized so that frequency and time are measured at consistent resolutions across all recordings. Traditional 'by-eye' measurements measured directly on the spectrogram should be replaced with automated, energy-based or threshold measurements whenever possible, as signal strength and quality can greatly affect measurements taken directly on the spectrogram <ref type="bibr">(Charif et al., 2010;</ref><ref type="bibr">Zollinger et al., 2012;</ref><ref type="bibr">R&#237;os-Chel&#233;n et al., 2017)</ref>. Additionally, measurements of absolute sound source level (i.e. amplitude, power, energy, pressure) require calibrated equipment and in some cases calculations of sound transmission. Therefore, unless recordings were collected in standardized or controlled conditions, measures of absolute amplitude should be avoided. Lastly, researchers should be aware that spectrograms reflect how humans perceive sound <ref type="bibr">(Lyon &amp; Ordubadi, 1982;</ref><ref type="bibr">Dooling &amp; Prior, 2017)</ref>. Therefore, not all acoustic features that can be measured may be detected by or relevant to the species being studied (e.g. centre frequency, a computational signal measurement which the vertebrate auditory system does not encode, is not biologically relevant, unlike peak frequency). Similarly, acoustic measures of similarity in signal structures represent a proxy for perceptually important differences among distinct taxa, but may not reflect what is perceived by those taxa. Unfortunately, we seldom know a priori which variables are biologically relevant. One approach to address this is to test if acoustic variables identified as statistically significant or distinct elicit a response in the study species (e.g. <ref type="bibr">Rand &amp; Ryan, 1987</ref>). An interesting future direction for bioacoustics research would be to analyse sound within taxon-specific acoustic perceptual models <ref type="bibr">(Clemins &amp; Johnson, 2006;</ref><ref type="bibr">Stoddard &amp; Prum, 2008)</ref>. This is especially relevant for taxa with auditory ranges narrower than the signals they produce or that perceive sound in specialized ways that are not easily measured or encoded in spectrograms (e.g. <ref type="bibr">Narins &amp; Capranica, 1976)</ref>. Such models could also take 'just noticeable differences' in animal sound perception into account to reflect accurately the scale or components of a signal that taxa are capable of perceiving <ref type="bibr">(Kuhl, 1981)</ref>. However, this is not known for many species.</p><p>(5) Best practices for measuring sounds When the goal of a study is to classify or quantify a wide variety of acoustic signals, and there is little a priori knowledge of which acoustic variables may be biologically relevant, the best option is usually to measure a wide variety of acoustic features that capture the range of structural variation in the signals being studied. Incorporating measurements that best capture the spectral and temporal properties of a study system will improve the likelihood that meaningful variation is detected and categorized in subsequent phylogenetic comparative analyses. Therefore, choosing appropriate measurements should be done on a study-by-study basis, after some preliminary examination of signal variation and keeping the goals of the study in mind. Overall, of greatest importance is to ensure sufficient measurement precision and inclusion of appropriate variables to capture the natural variation within and among acoustic signals for the species of interest. We recommend automated procedures for feature extraction (e.g. robust measurements based on energy distributions and threshold metrics) and classification (e.g. cluster analysis, RF) whenever possible to limit extraneous error and human bias. In addition, using algorithms for classification and analyses that remove or downplay non-informative parameters (e.g. RF) may be especially beneficial for detecting meaningful variation.</p><p>Regardless of which metrics are collected, all measurements, metrics, and other associated acoustic terminology used should be defined in the publication, preferably with figures (e.g. Fig. <ref type="figure">1</ref>). Clear definitions are, of course, necessary for the analyses to be repeated but may easily be overlooked in large, multistage projects. More broadly, clearly defining metrics can help researchers to standardize analyses across studies and improve reproducibility. With the increasing movement towards data archiving, standardizing the basic metrics collected and procedures for sound analysis will allow hard-earned acoustic measurements to be combined into larger comparative analyses in the future. Along these lines, we encourage archiving not only of raw acoustic measurement data, but also of the associated recordings and annotations. Such practices will best enable the output and results from current comparative studies to be built upon as knowledge, media archives, and analytical tools improve <ref type="bibr">(Caetano &amp; Aisenberg, 2014)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. COMPARATIVE PHYLOGENETIC ANALYSES WITH SOUND</head><p>Phylogenetic comparative analyses have recently been used to make considerable advances in our understanding of the evolutionary processes responsible for diversity on Earth <ref type="bibr">(May-Collado, Agnarsson &amp; Wartzok, 2007;</ref><ref type="bibr">Am&#233;zquita et al., 2009;</ref><ref type="bibr">Jetz et al., 2012;</ref><ref type="bibr">Rabosky et al., 2013;</ref><ref type="bibr">Dale et al., 2015;</ref><ref type="bibr">Sauquet et al., 2017)</ref>. Such studies are becoming increasingly popular to address questions about signal evolution, including comparative studies of detailed acoustic structure <ref type="bibr">(Derryberry et al., 2012;</ref><ref type="bibr">Mason et al., 2017a;</ref><ref type="bibr">Ligon et al., 2018)</ref>. For these kinds of studies, phylogenetically controlled analyses are necessary because they transform evolutionary questions into statistical models that enable researchers to control for the statistical non-independence caused by shared evolutionary history <ref type="bibr">(Felsenstein, 1985;</ref><ref type="bibr">Martins &amp; Hansen, 1997)</ref>. Also, incorporating phylogeny into comparative studies allows researchers to evaluate hypotheses for evolutionary mechanisms beyond phylogeny, examine evolutionary patterns leading to trait divergence, and study the role of traits in diversification <ref type="bibr">(Hern&#225;ndez et al., 2013;</ref><ref type="bibr">Garamszegi, 2014;</ref><ref type="bibr">Rabosky et al., 2014)</ref>. Even poorly resolved phylogenies provide improved accuracy compared to no phylogeny at all <ref type="bibr">(Boettiger, Coop &amp; Ralph, 2012</ref>; but see <ref type="bibr">Davies et al., 2012;</ref><ref type="bibr">Paradis, 2014)</ref>. For these reasons, it is important to incorporate phylogenetic information in all statistical analyses with comparative data <ref type="bibr">(Felsenstein, 1985;</ref><ref type="bibr">Martins &amp; Hansen, 1997)</ref>. Altogether, phylogenetically informed comparative analyses provide an important perspective on patterns and processes of trait macroevolution than cannot be earned by other means <ref type="bibr">(Freckleton, Cooper &amp; Jetz, 2011)</ref>.</p><p>Several recent good reviews and books have been written on phylogenetic comparative analyses, which we suggest researchers consult for more in-depth coverage of these methods <ref type="bibr">(Harvey &amp; Pagel, 1991;</ref><ref type="bibr">Gingerich, 2009;</ref><ref type="bibr">Nunn, 2011;</ref><ref type="bibr">Garamszegi, 2014;</ref><ref type="bibr">Joy et al., 2016)</ref>. Here, we briefly summarize the range of available phylogenetic comparative approaches, discuss some relevant bioacoustics case studies, and outline general limitations of comparative approaches to behavioural data. We finish by providing guidance for applying these methods to acoustic data given particular data structures and phylogenetic analysis of interest (Fig. <ref type="figure">3</ref>).</p><p>(</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Types of phylogenetic analyses</head><p>There are several broad classes of phylogenetic comparative analysis and questions that can be addressed with phylogenetic comparative approaches. Below, we summarize current methods employed within each broad class. A major distinction among the methods within each class is whether they are appropriate for discrete (e.g. presence/absence or small/ medium/large) versus continuous trait data <ref type="bibr">(Harvey &amp; Pagel, 1991;</ref><ref type="bibr">Garamszegi, 2014;</ref><ref type="bibr">e.g. Odom, Omland &amp; Price, 2015)</ref>. In some instances, certain approaches have been created for multi-state versus binary traits or response variables. Therefore, as with any statistical analysis, thinking in advance about data structure and the types of variables present in your data set will enable you to determine the appropriate analysis. In the case studies, we emphasize methods for continuous data, as the acoustic analyses described above are primarily aimed at extraction of continuous variables, but we list a range of analyses for a variety of data types, as major bioacoustics questions can be addressed with discrete data types <ref type="bibr">(Odom et al., 2014;</ref><ref type="bibr">Riede et al., 2016;</ref><ref type="bibr">Tobias et al., 2016;</ref><ref type="bibr">Snyder &amp; Creanza, 2019)</ref>.</p><p>(a) Quantifying phylogenetic signal Phylogenetic signal refers to the extent to which closely related species resemble one another based on a trait of interest <ref type="bibr">(Pagel, 1999a;</ref><ref type="bibr">Blomberg et al., 2003;</ref><ref type="bibr">Davies et al., 2012;</ref><ref type="bibr">M&#252;nkem&#252;ller et al., 2012)</ref>. For most phylogenetic comparative studies, it is important to quantify and understand the extent to which the trait of interest remains similar across close taxa. Phylogenetic signal can be measured using several parameters. Popular methods include Pagel's lambda (&#955;), Blomberg's K, Grafen's &#961;, Ornstein-Uhlenbeck (OU) model parameter &#945;, and Fritz &amp; Purvis's D <ref type="bibr">(Revell, Harmon &amp; Collar, 2008;</ref><ref type="bibr">Kamilar &amp; Cooper, 2013;</ref><ref type="bibr">Symonds &amp; Blomberg, 2014)</ref>. Recent methods also allow for tests of phylogenetic signal in multi-dimensional traits <ref type="bibr">(Adams, 2014a)</ref>. Note that Pagel's lambda is often provided with the output of phylogenetic comparative analyses <ref type="bibr">(Hadfield, 2010a,b)</ref>. However, when given as output with regression analyses, lambda applies to the residual errors of the regression model, not the response variable. Therefore, these estimates of phylogenetic signal do not represent the phylogenetic signal of the response variable, but rather for the variance explained by phylogeny for the regression of the traits' response and predictor variables compared to one another <ref type="bibr">(Symonds &amp; Blomberg, 2014)</ref>. In most instances, it is important to directly calculate phylogenetic signal for the input variables and, depending on the question, it may be important to investigate both. See <ref type="bibr">Revell et al. (2008)</ref> for discussion of appropriate and inappropriate interpretation of phylogenetic signal.</p><p>Few comparative bioacoustics studies have directly focused on estimating phylogenetic signal; however, phylogenetic signal is usually reported for the acoustic traits within the study, which we encourage. Among bioacoustics studies that have focused on phylogenetic signal, <ref type="bibr">Gingras et al. (2013)</ref> measured the strength of phylogenetic signal in five acoustic parameters from advertisement calls among 90 species of anurans. Despite strong selection on calls via mating preferences, the authors found that there was a strong phylogenetic signal in all five acoustic traits. They argued that this might indicate constraints on signal diversification among these anurans. It is important to note, however, that simulations demonstrate that phylogenetic signal alone is not a clear indicator of the rate of evolution or the evolutionary processes leading to diversification <ref type="bibr">(Revell et al., 2008)</ref>. In a study of phylogenetic signal in the territorial songs of crests and kinglets (Aves: Regulus), <ref type="bibr">P&#228;ckert et al. (2003)</ref> measured several different features of song including syntax, subunit measures, and abundance of certain components. Interestingly, the measured traits also differed in the extent to which they were learned or innate. The authors found that some measures, such as the syntax of whole-song structure, showed a strong phylogenetic signal while other learned song components did not. This study reveals interesting use of phylogenetic signal to test hypotheses about expectations for which aspects of a signal may be more closely tied to phylogeny, an important and valuable use of this metric <ref type="bibr">(P&#228;ckert et al., 2003)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>(b) Evolutionary rate and diversification</head><p>Evolutionary rate analysis allows researchers to examine the extent to which traits change over evolutionary time <ref type="bibr">(Gingerich, 2009)</ref>, to compare evolutionary rates among traits <ref type="bibr">(Adams, 2014b)</ref> and to evaluate variation in evolutionary rates between phenotypic sequence subunits, which can be applied to multi-sylable vocalizations <ref type="bibr">(Caetano &amp; Beaulieu, 2020)</ref>. Additional methods have been developed to evaluate whether the evolutionary rate of a continuous trait is affected by another trait, either discrete <ref type="bibr">(O'Meara et al., 2006)</ref> or continuous <ref type="bibr">(Weir &amp; Lawson, 2015)</ref>, and the following analyses have been extended to allow for or to test correlation between speciation and discrete traits (binary traits biSSE; <ref type="bibr">Maddison, Midford &amp; Otto, 2007)</ref>, multi-state traits (muSSE; FitzJohn, 2012), continuous traits (quaSSE; FitzJohn, 2010) and trait evolutionary rates <ref type="bibr">(Adams et al., 2009;</ref><ref type="bibr">Rabosky et al., 2014)</ref>. A more recently developed method allows for simultaneously testing of the effect of a discrete trait on trait evolutionary rate and the association between traits <ref type="bibr">(Fuentes-G et al., 2016)</ref>. These methods have been used to compare the rate of evolution of morphological and acoustic traits (Medina-Garc&#237;a, <ref type="bibr">Araya-Salas &amp; Wright, 2015)</ref> and to test the association between evolutionary rate and signal function <ref type="bibr">(Weir, Wheatcroft &amp; Price, 2012)</ref>, as well as developmental mechanisms of vocal signals <ref type="bibr">(Mason et al., 2017a)</ref>.</p><p>Generally, analyses of evolutionary rate and diversification require well-resolved sets of trees with dated nodes <ref type="bibr">(Tarver &amp; Donoghue, 2011;</ref><ref type="bibr">Paradis, 2013)</ref>. An important consideration when using these models with acoustic data is that they often require assessing the appropriate underlying model of evolution (e.g. Brownian Motion, OU). Therefore, it is important that any data-reduction steps prior to evolutionary rate and diversification analyses should take phylogeny into account. Nevertheless, these analyses are highly subject to model misspecification, and improved models are still being developed (e.g. <ref type="bibr">Harmon &amp; Glor, 2010;</ref><ref type="bibr">Uyeda et al., 2015;</ref><ref type="bibr">Adams &amp; Collyer, 2018)</ref>. Until improved methods are developed, we encourage researchers to compare comparative tests of diversification and rate for acoustic data using raw acoustic variables in addition to any planned multivariate data <ref type="bibr">(Mason et al., 2017a)</ref>. Another consideration is whether the phylogenetic hypothesis contains sufficient information to accurately reconstruct diversification dynamics. The ability to infer diversification parameters from extant species phylogenies has been questioned and fossil calibration is advised <ref type="bibr">[Quental &amp; Marshall, 2010;</ref><ref type="bibr">Louca &amp; Pennell, 2020</ref>; but see Dos <ref type="bibr">Reis &amp; Yang (2013)</ref> for possible caveats of fossil callibration].</p><p>Despite these present limitations, several comparative studies have incorporated evolutionary rate and diversification into analyses of acoustic signals. For instance, <ref type="bibr">Mason et al. (2017a)</ref> used raw acoustic data (i.e. signal metrics sensu Table <ref type="table">S1</ref>), as well as pPCA, to quantify shifts in the rate of vocal evolution and speciation across two major radiations of passerine birds. They found evidence for coincident evolutionary bursts in rates of speciation and song evolution among both groups. Further, several studies have investigated the potential role of acoustic signals in speciation across closely related taxa. For instance, <ref type="bibr">Delmore et al. (2015)</ref> quantified divergence in song, plumage, and morphology among sister pairs of North American migratory birds with different migratory strategies to evaluate diversification of each set of traits as a measure of reproductive isolation. Such studies highlight the potential for acoustic signals to play a large role in diversification and speciation among diverse taxa, and thus when evolutionary models can be properly incorporated, represent a promising avenue for future research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>(c) Ancestral state reconstruction</head><p>Ancestral state reconstruction is used to infer the most likely ancestral states for a trait of interest and evaluate how that trait changed over evolutionary time. Three main methods exist: maximum parsimony, maximum likelihood, and Bayesian estimation <ref type="bibr">(Pagel, 1999b;</ref><ref type="bibr">Pagel, Meade &amp; Barker, 2004;</ref><ref type="bibr">Maddison &amp; Colu, 2015)</ref>. Parsimony minimizes the number of changes required to explain the distribution of characters in extant taxa <ref type="bibr">(Harvey &amp; Pagel, 1991;</ref><ref type="bibr">Joy et al., 2016)</ref>. While parsimony may give accurate reconstructions, especially when evolutionary rates are expected to be slow <ref type="bibr">(Cunningham, Omland &amp; Oakley, 1998)</ref>, maximum likelihood (ML) and Bayesian approaches are more sophisticated and have more realistic assumptions <ref type="bibr">(Royer-Carenzi, Pontarotti &amp; Didier, 2013;</ref><ref type="bibr">Joy et al., 2016)</ref>. ML attempts to find the parameter values that maximize the probability of the data given the underlying phylogeny, thus taking branch lengths into account as estimates of evolutionary time <ref type="bibr">(Joy et al., 2016)</ref>. In addition, different forward and reverse rates of evolution can be specified in one-versus two-parameter models <ref type="bibr">(Mooers &amp; Schluter, 1999)</ref>. Nevertheless, ML requires a priori specification of such parameters, which are often unknown. Furthermore, rates of change may not be stable over evolutionary time and are sensitive to variation in tree topology <ref type="bibr">(Schultz &amp; Churchill, 1999;</ref><ref type="bibr">Joy et al., 2016)</ref>. Bayesian approaches can integrate uncertainty in tree topology, branch lengths, and parameter estimates into the ancestral state reconstruction using Markov Chain Monte Carlo techniques. Such techniques account for sources of uncertainty as distributions or 'liability' terms calculated from the data and incorporated into the model <ref type="bibr">(Huelsenbeck &amp; Bollback, 2001;</ref><ref type="bibr">Pagel et al., 2004;</ref><ref type="bibr">Revell, 2014)</ref>. Both empirical and hierarchical Bayesian approaches can be employed, but hierarchical Bayesian methods are especially useful for averaging probabilities over a set of possible trees <ref type="bibr">(Joy et al., 2016)</ref>. Still, users of ancestral state reconstruction approaches should be aware of underlying assumptions and limitations due to taxon sampling and tree topology <ref type="bibr">(Losos, 1999;</ref><ref type="bibr">Omland, 1999;</ref><ref type="bibr">Salisbury &amp; Kim, 2001;</ref><ref type="bibr">Li, Steel &amp; Zhang, 2008;</ref><ref type="bibr">Revell et al., 2008;</ref><ref type="bibr">Marshall, 2017)</ref>. Specifically, ancestral state reconstruction is always an inference of how and when traits evolved in the past, as demonstrated by the fact that confidence intervals surrounding ancestral states are usually quite large <ref type="bibr">(Cunningham et al., 1998;</ref><ref type="bibr">Garland &amp; Ives, 2000;</ref><ref type="bibr">Oakley &amp; Cunningham, 2000)</ref>. To partially deal with this uncertainty it is advisable to compare results using various underlying evolutionary models and reconstruction methods (e.g. compare ML and Bayesian approaches), to conduct sensitivity analyses <ref type="bibr">(Cunningham et al., 1998)</ref>, in addition to report confidence intervals <ref type="bibr">(Garland &amp; Ives, 2000;</ref><ref type="bibr">Revell, 2013)</ref>.</p><p>Ancestral state reconstruction of acoustic signals has provided valuable insights into when certain major signalling strategies and behaviours evolved <ref type="bibr">(Shelley &amp; Blumstein, 2005;</ref><ref type="bibr">Odom et al., 2014;</ref><ref type="bibr">Riede et al., 2016;</ref><ref type="bibr">Tobias et al., 2016;</ref><ref type="bibr">Forti et al., 2018)</ref>. Most of these studies, however, have primarily used discrete traits to conclude when broad categories of vocalizations came to exist, whereas surprisingly few studies have applied ancestral state reconstruction methods to continuous or multiple acoustic features (exceptions include <ref type="bibr">Rand &amp; Ryan, 1987;</ref><ref type="bibr">Price &amp; Lanyon, 2002;</ref><ref type="bibr">Price, Friedman &amp; Omland, 2007;</ref><ref type="bibr">Goutte et al., 2016)</ref>. In one notable exception, <ref type="bibr">Price et al. (2007)</ref> scored song as presence/absence of 26 vocal characters to investigate how songs have changed over time in the New World blackbirds, with the ultimate goal of comparing evolutionary patterns of song to plumage evolution. They found that New World blackbird song is fairly evolutionarily labile, which paralleled patterns of convergent evolution in plumage. This study demonstrates an interesting approach for evaluating vocal evolution with discrete traits and analyses, while still extracting a considerable amount of vocal variation. With the current availability of continuous comparative approaches, these kinds of analyses can also be conducted with continuous acoustic variables. For example, a recent study with torrent-dwelling frogs compared several measured vocal variables to reconstructed features of calling-site habitat <ref type="bibr">(Goutte et al., 2016)</ref>. They were able to show that vocalizations of torrent-dwelling frogs have likely been constrained by the noisy environments in which they evolved. In another creative use of ancestral state reconstruction, <ref type="bibr">Rand &amp; Ryan (1987)</ref> reconstructed ancestral male T&#250;ngara frog (Engystomops pustulosus) calls to assess the response of females to ancestral vocal features. Similarly, we think an exciting future use of ancestral state reconstruction will be to reconstruct composites of acoustic features to investigate the origins of complex acoustic signals (e.g. <ref type="bibr">Adams, 2014b;</ref><ref type="bibr">Sauquet et al., 2017)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>(d) Phylogenetic correlative analyses</head><p>Phylogenetic correlative analyses are used to assess correlated evolution of two or more traits, including both discrete and continuous variables <ref type="bibr">(Martins &amp; Hansen, 1997;</ref><ref type="bibr">Paradis, 2014</ref>). Pagel's discrete test of correlated change is a popular approach for analysing correlated evolution and transition rates among discrete character states <ref type="bibr">(Pagel, 1994)</ref>. For continuous data, <ref type="bibr">Felsenstein's (1985)</ref> independent contrasts (PIC; <ref type="bibr">Garland, Bennett &amp; Rezende, 2005)</ref>, phylogenetic least squares (PGLS; <ref type="bibr">Grafen, 1989;</ref><ref type="bibr">Paradis, 2011)</ref>, and Markov Chain Monte Carlo simulations in a mixed-model framework [phylogenetic mixed models (PGLMMs); <ref type="bibr">Martins &amp; Garland, 1991;</ref><ref type="bibr">Housworth, Martins &amp; Lynch, 2004;</ref><ref type="bibr">Hadfield &amp; Nakagawa, 2010)</ref>] all allow for phylogenetically controlled correlations that take topology and branch lengths into account. A key strength of the PGLS approach is that it is a straightforward regression model akin to ordinary least squares (OLS; in fact, PGLS approaches OLS output when the phylogenetic signal is weak; <ref type="bibr">Symonds &amp; Blomberg, 2014)</ref>. Plus, this approach allows for alternative underlying evolutionary models to be incorporated and evaluated (e.g. Brownian motion, early burst, OU; see Section III.2d). If the data set contains repeated measures per species, however, the data are likely better suited to PGLMMs, as they can account for within-species variation <ref type="bibr">(Martins &amp; Hansen, 1997;</ref><ref type="bibr">Garamszegi, 2014)</ref>. In addition, the Bayesian framework of some PGLMMs (e.g. the R package MCMCglmm) enables uncertainty about the phylogeny to be incorporated as a set of alternative tree topologies <ref type="bibr">(Hadfield, 2010b;</ref><ref type="bibr">Hadfield &amp; Nakagawa, 2010;</ref><ref type="bibr">Garamszegi &amp; Gonzalez-Voyer, 2014)</ref>. Phylogenetically controlled ANOVA, MANOVA, and ANCOVA procedures also exist and can be useful <ref type="bibr">(Revell, 2012;</ref><ref type="bibr">Goolsby, 2015;</ref><ref type="bibr">Fuentes-G et al., 2016)</ref>. Similar to the transition rate analyses of Pagel's discrete test, path analysis is a promising and appropriate procedure for examining the evolutionary order of events using continuous data sets <ref type="bibr">(Gonzalez-Voyer &amp; von Hardenberg, 2014)</ref>. Recently developed phylogenetic variable-rate regression models can be used to measure branch-wise rates of trait evolution in order to infer positive phenotypic selection and its link to other traits <ref type="bibr">(Baker et al., 2016)</ref>.</p><p>Evolutionary biologists interested in acoustic evolution have answered a range of interesting and diverse questions using phylogenetic correlative analyses on signal metrics. For example, several phylogenetic comparative approaches were incorporated into the analyses of <ref type="bibr">Gonzalez-Voyer et al. (2013)</ref> who investigated evolutionary relationships among a number of traits (including vocalizations) in barbet bird species. These authors lay out the logic behind their analytical decisions and use a number of current phylogenetic approaches required to facilitate confidence in their findings that larger barbet species produce longer, lower-frequency notes, that species living at higher altitudes produce longer songs, and that different elements of barbet vocalizations show distinct evolutionary rates. Likewise, an investigation into the influence of vocal learning on acoustic diversification of parrot vocalizations by <ref type="bibr">Medina-Garcia et al. (2015)</ref> analysed several signal metrics separately (e.g. peak frequency, entropy, duration) to determine if these parameters exhibit different evolutionary trajectories than morphological traits. This powerful study combined 'traditional' correlative analyses (e.g. phylogenetic regression) with those focused on analyses of evolutionary rates <ref type="bibr">(Adams, 2013)</ref> and found that the acoustic elements studied showed similar patterns and rates of evolutionary change to morphological traits, despite the potential for learning to increase the evolutionary rate of vocalizations. By contrast, <ref type="bibr">Ligon et al. (2018)</ref> used multivariate metrics (see Section II.2c) to input signal measurements into a multidimensional feature space, which facilitated the classification of notes into distinct types. After identifying all the notes in an individual sequence, the authors analysed 'acoustic richness' and 'acoustic diversity' using sliding-window analyses that identified the most complex vocal sequence for each individual from all species. It was these computational measures (acoustic richness, diversity) that were then analysed using phylogenetic correlative analyses. Using these approaches, <ref type="bibr">Ligon et al. (2018)</ref> found that vocal complexity is positively correlated with both behavioural and chromatic complexity at an evolutionary scale in the birds-of-paradise.</p><p>(2) Caveats for phylogenetic comparative analyses Although powerful, phylogenetic comparative analyses are still developing and require important caveats. These analyses reflect our current best estimates of the evolutionary process and are dependent on the current resolution of the input trees. In addition, each approach has its own limitations and assumptions and combining these approaches with highly dimensional acoustic data can be complicated <ref type="bibr">(Cunningham, 1999;</ref><ref type="bibr">Losos, 2011;</ref><ref type="bibr">Uyeda et al., 2015;</ref><ref type="bibr">Title &amp; Rabosky, 2019)</ref>. Therefore, analyses should be regarded with appropriate skepticism, and inferences about deeper evolutionary processes should be made with caution <ref type="bibr">(Losos, 2011;</ref><ref type="bibr">Marshall, 2017)</ref>. Below we identify important limitations to phylogenetic comparative analyses of animal sounds.</p><p>(a) The choice of phylogeny Any phylogenetic comparative study requires an underlying phylogenetic hypothesis, usually a phylogenetic tree or network, ideally with proposed branch lengths. All phylogenies are estimates <ref type="bibr">(Rosenberg &amp; Kumar, 2001)</ref> and while some phylogenetic comparative analyses may be somewhat robust to the underlying phylogeny, a more resolved phylogeny leads to higher confidence in the results <ref type="bibr">(Harvey &amp; Pagel, 1991;</ref><ref type="bibr">Garamszegi &amp; Gonzalez-Voyer, 2014)</ref>. The influence of accuracy of the underlying phylogeny is reviewed in <ref type="bibr">Heath, Hedtke &amp; Hillis (2008)</ref>. For discussion of the methods for used to construct the phylogeny see <ref type="bibr">Rosenberg &amp; Kumar (2001)</ref> and <ref type="bibr">Berlin, Tomaras &amp; Charlesworth (2007)</ref>.</p><p>One can compensate for phylogenetic uncertainty by including a posterior sample of phylogenetic trees within the analysis. This accounts for potential error in phylogenetic estimation and sampling. It is becoming standard practice for large sets of trees to be made available (e.g. birdtree.org) and Bayesian approaches make it possible to incorporate randomly selected sets from a posterior distribution of trees representing the range of phylogenetic uncertainty. In birds, one can sample and analyse over 500 trees <ref type="bibr">(Leighton, 2017)</ref>, although performing the analysis on several dozen trees (50) may be sufficient <ref type="bibr">(Griesser et al., 2017)</ref>. <ref type="bibr">Rubolini et al. (2015)</ref> offer good advice when using widespread phylogenetic data, such as birdtree.org.</p><p>(b) Assessing the underlying models of trait evolution Rate and phylogenetic comparative analyses require specification of an underlying model of evolution <ref type="bibr">(Freckleton et al., 2011)</ref>. These analyses assume that the chosen model is a realistic estimation of the evolutionary process for that data set. For behavioural or life-history data, however, the appropriate underlying model is usually unknown or may not be intuitive. Some of the more common models of evolution used are Brownian motion, OU, lambda and ACDC. A Brownian motion model represents trait evolution as responding to many small evolutionary forces and is commonly used as a null model expected in the absence of strong stabilizing selection. The OU process models the strength of the evolution towards one or more theoretical optima, which can be used to represent adaptive evolution <ref type="bibr">(Butler &amp; King, 2004)</ref>. Lambda models the contribution of phylogeny to trait values in which the lambda parameter ranges between no phylogenetic effect and pure Brownian motion <ref type="bibr">(Pagel, 1999a)</ref>, while ACDC <ref type="bibr">(Blomberg et al., 2003</ref>; also called the Early-Burst model sensu <ref type="bibr">Harmon et al., 2010)</ref> represents a trend in evolutionary rate across the phylogeny, exponentially increasing or decreasing. PGLS procedures recommend using model selection criteria (e.g. Akaike information criterion, AIC) to assess the fit of multiple models of evolution to the data set before continuing with analyses <ref type="bibr">(Paradis, 2011;</ref><ref type="bibr">Symonds &amp; Blomberg, 2014)</ref> and similar procedures apply to other analyses <ref type="bibr">(Sullivan &amp; Joyce, 2005)</ref>.</p><p>(c) Sample size and taxon sampling Taxon sampling is important in resolving the underlying tree, and is also important for making broad inferences across lineages. While <ref type="bibr">Blomberg et al. (2003)</ref> concluded that trees with greater than 20 species have more robust results, researchers need to take care to also consider whether species are clustered on a tree, and the topology (symmetry) of the tree. Consequently, researchers should balance increasing the total number of sampled taxa while also identifying taxa and clades that will provide the most clarity given the hypotheses the researcher is testing. <ref type="bibr">Arnold &amp; Nunn (2010)</ref> provide methods for determining the proper taxonomic sampling and data collection (i.e. 'phylogenetic targeting').</p><p>Sample size is also relevant for estimates of trait evolution. In most cases, measures within a species are variable. Previously, many comparative analyses simply used single or mean trait values to represent a species. Recent methodological developments for analyses of trait evolution recommend taking within-species variation into account <ref type="bibr">(Garamszegi, 2014)</ref>. Repeated sampling within and among species is likely especially important for vocal behaviour, which can vary dramatically within and among individuals and populations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>(d) Trait evolution and behaviour</head><p>Traits differ in their phylogenetic signal. <ref type="bibr">Blomberg et al. (2003)</ref> showed that behavioural traits exhibit a lower phylogenetic signal than morphological, life-history, or physiological traits, and concluded that "behavior is relatively labile evolutionarily" (p. 730). Consequently, researchers have raised concerns about the validity of applying phylogeny to behavioural data when phylogenetic signal is low, especially ancestral state reconstruction, because of the potential loss of evolutionary signal in highly labile traits (e.g. <ref type="bibr">Losos, 1999)</ref>. Specifically, recurring trait shifts within reconstructions with large confidence intervals suggest that two traits could occur together by chance. However, <ref type="bibr">Revell et al. (2008)</ref> conclude that phylogenetic signal does not predict evolutionary process or rate, restoring confidence that behavioural traits are evolving in similar ways as other traits. Thus, phylogenetic methods should be useful for examining the evolution of behaviour. A potential concern for highdimensional acoustic data is that transition rates are sometimes unable unambiguously to identify correlated trait shifts among multiple traits. Statistical models that allow different traits to evolve according to different models of evolution <ref type="bibr">(Losos, 2011;</ref><ref type="bibr">Garamszegi, 2014, pp. 11-12)</ref> and path analysis both offer promising ways to look at evolutionary transitions within continuous data <ref type="bibr">(Gonzalez-Voyer &amp; von Hardenberg, 2014)</ref>.</p><p>(e) Correlation is not causality An association between traits at the tips of a phylogeny can often be explained via multiple hypotheses. Although experiments provide true tests of causality, testing evolutionary causation is usually limited to species with extremely short generation times. For species that do not have short generation times, certain analyses, such as transition rates in Pagel's discrete test <ref type="bibr">(Pagel, 1994)</ref> and path analysis <ref type="bibr">(Gonzalez-Voyer &amp; von Hardenberg, 2014)</ref> allow some conjecture as to the probability that one event follows another, or whether or not a particular relationship among traits is meaningful <ref type="bibr">(Garamszegi, 2014)</ref>.  <ref type="bibr">(Janik &amp; Slater, 2000)</ref>. To date, vocal learning is known to occur in some avian (parrots, hummingbirds, oscine passerines and some suboscines) and mammalian (cetaceans, elephants, phocid pinnipeds, humans, and bats) taxa. Unlike genetically determined signals that in vertebrates are inherited vertically (from parent to offspring), socially transmitted traits can also be transmitted horizontally (between individuals within the same generation) and obliquely (across generations of unrelated individuals; <ref type="bibr">Danchin &amp; Wagner, 2010)</ref>. This flexibility in transmission modes poses a potential difficulty for phylogenetic approaches because such methods are built on the premise that traits are inherited vertically <ref type="bibr">(Mace &amp; Holden, 2005;</ref><ref type="bibr">Gray &amp; Watts, 2017;</ref><ref type="bibr">Mesoudi, 2017)</ref>. A second challenge is that learned vocalizations could evolve at faster tempos than genetically determined signals, because cultural innovations or learning errors ('cultural mutations') may occur at a faster rate than genetic mutations <ref type="bibr">(Danchin et al., 2004)</ref>. The potentially high speeds of cultural 'mutation' could increase the chance of convergence <ref type="bibr">(Price &amp; Lanyon, 2002;</ref><ref type="bibr">Delsuc, Brinkmann &amp; Philippe, 2005)</ref> or the loss of phylogenetic signal due to rapid divergence (analgous to 'saturation' sensu <ref type="bibr">Delsuc et al., 2005;</ref><ref type="bibr">Filatova et al., 2015)</ref>. Nonetheless, studies have shown that, for some taxa at least, vocal characters can be highly conserved across species <ref type="bibr">(Price &amp; Lanyon, 2002;</ref><ref type="bibr">Podos, Huber &amp; Taft, 2004;</ref><ref type="bibr">Sakata &amp; Vehrencamp, 2012;</ref><ref type="bibr">Medina-Garc&#237;a et al., 2015)</ref>. In addition, studies of human language found some evidence that culturally evolved traits can diversify in a tree-like manner with limited transfer across 'lineages', so that comparative approaches can be informative within as well as between species (reviewed in <ref type="bibr">Pagel, 2009;</ref><ref type="bibr">Currie, Greenhill &amp; Mace, 2010;</ref><ref type="bibr">Hoppitt &amp; Laland, 2013;</ref><ref type="bibr">Mesoudi, 2017)</ref>. While there are modelling approaches that explicitly test for and incorporate 'horizontal' transfer of inherited information into analyses (reviewed in <ref type="bibr">Mesoudi, 2016)</ref>, in cases of frequent horizontal or oblique inheritance, or very rapid diversification, comparative approaches are expected to be unhelpful <ref type="bibr">(Filatova et al., 2015;</ref><ref type="bibr">Mesoudi, 2016)</ref>. In general, however, phylogenetic correlative analyses provide useful methods to investigate the evolution across cladesof both learned and non-learned vocalizations and, when used with care, offer an exciting frontier for understanding the tempo and modes of evolution in cultural traits <ref type="bibr">(Mason et al., 2017a)</ref>. In addition, the cultural evolution process itself can be modelled using phylogenetic methods and the resulting phylogenetic hypotheses can be used to infer the dynamics of the cultural micro-evolutionary process from a comparative framework <ref type="bibr">(Mace &amp; Holden, 2005;</ref><ref type="bibr">Mesoudi, 2017)</ref>. This approach will add a historical perspective to studies of animal cultures and a more detailed view of mechanisms involved in the microevolution of socially acquired (vocal) traits.</p><p>(3) Best practices for phylogenetic comparative analyses with sounds</p><p>In conclusion, phylogenetic comparative analyses are powerful tools that can be leveraged to answer important questions about signal evolution, especially involving acoustics. With the proper understanding of their limitations and underlying assumptions, these methods can begin to shed light on how acoustic signals have evolved and diversified across numerous taxa. When planning and conducting phylogenetic comparative analyses, it is important to understand the broad classes of analyses available and how they can be used to address your research question. It is also important to think about the acoustic data structure (are the variables discrete or continuous? How many variables are there? Are any data reduction steps planned?). The data structure determines the specific comparative analyses within these broad classes that are appropriate, and, in turn, there may be important caveats for how to treat the data, particularly during data reduction. For example, multivariate procedures can remove underlying phylogenetic structure, therefore, taking phylogeny into account within these procedures is necessary if subsequent analyses rely on determining an appropriate underlying evolutionary model <ref type="bibr">(Catalano et al., 2010;</ref><ref type="bibr">Uyeda et al., 2015;</ref><ref type="bibr">Adams &amp; Collyer, 2018)</ref>. We provide guidelines for these considerations in Fig. <ref type="figure">3</ref>.</p><p>Within phylogenetic comparative analyses it is also best practice to consider and evaluate multiple underlying models of evolution and/or algorithms underlying the comparative analyses. For example, diversification rate and certain correlative analyses (e.g. PGLS) require evaluating whether Brownian motion, OU, or other underlying evolutionary models best explain the overall patterns of the data. This is often done by comparing the fit of models constructed with different underlying evolutionary models via an information criterion, such as AIC. Similarly, for ancestral state reconstruction, it is important to evaluate the underlying assumptions of competing models and whether the outcome of the analysis is influenced by the particular reconstruction method chosen. For analyses that attempt to infer ancestral trait values using phylogenetic reconstruction, it is customary to compare multiple reconstruction methods (e.g. Bayesian and ML) to assess that they generally provide similar results <ref type="bibr">(Omland, 1999)</ref>.</p><p>Lastly, as with all statistical models, know the inference power and associated uncertainty. Remember that these are powerful tools but are still correlative and not causative. It is important to draw conclusions within the scope of the analysis and project.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. CONCLUSIONS</head><p>(1) Combining continuous acoustic metrics and advanced phylogenetic comparative analyses has great potential to advance our understanding of signal evolution; however, implementing both approaches can be challenging.</p><p>(2) For acoustic analysis, it is most important to identify, define, and partition the acoustic signals into component structures that can be analysed and compared across the species of interest. In addition, we encourage researchers to give careful consideration to the metrics chosen for acoustic analysis. When beginning acoustic analysis, the metrics used should be aligned with the research question, the kinds of variation observed in the signals of interest, and the nature of the downstream phylogenetic comparative analyses to be employed. We also remind researchers to standardize recording specifications, quality, spectrogram parameters, and metrics as best they can prior to acoustic analysis. Standardizing recording specifications and choosing metrics can now be done in R (e.g. warbleR, seewave), so this largely comes down to good data-management practices. (3) For data reduction of acoustic data prior to comparative analysis, we ask researchers to be aware of important caveats about data-reduction procedures that are appropriate for specific phylogenetic comparative analyses. For example, phylogenetic analyses for which the output depends on assessing underlying models of evolution, any data-reduction procedures used should take underlying models of evolution into account.</p><p>For more details, we provide guidelines to help researchers determine the best steps and procedures for incorporating highly dimensional acoustic data with these analyses (Fig. <ref type="figure">3</ref>). (4) For phylogenetic comparative analyses, we encourage researchers to explore the vast amount of resources and powerful computational tools available. Recent advances in continuous approaches for phylogenetic comparative analyses, including Bayesian ancestral state reconstruction and phylogenetically controlled mixed models enable deep dives into exploring evolutionary patterns, while taking uncertainty of tree topology, model structure, and even complex data structure into account. However, it is important to remember that these are only models of our best estimates of the evolutionary process. Therefore, we encourage researchers to run phylogenetic analyses with multiple different approaches to look for overall similarity and robustness of results given alternative underlying models of evolution, input parameters, and phylogenetic hypotheses (different phylogenetic trees). ( <ref type="formula">5</ref>) With recent advances in bioacoustic practices, data availability, and phylogenetic comparative approaches, we see a bright future for analyses of acoustic signal evolution. We also see opportunities for advancing specific areas at the interface of these fields. For example, there is a need to standardize terminology. This requires increasing our knowledge of the functional diversity of these signals, as well as agreeing upon appropriate terminology for investigating this diversity. In addition, we advocate the adoption of automated methods for acoustic data collection and feature extraction wherever possible. This includes extracting robust feature measurements, but also using automated procedures for classifying signals and evaluating their syntax. We also envision more streamlined, automated processes for the detection of and separation of signals into comparable acoustic units, such that the entire procedure of acoustic signal analysis is more efficient, standardized, and reproducible. Lastly, we strongly encourage improved data-archival practices for both recordings and associated annotation data to facilitate data sharing.</p><p>With such improvements, we expect the field of comparative bioacoustics to make great strides in our understanding of signal evolution and the drivers of animal diversity at large.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. ACKNOWLEDGEMENTS AND AUTHOR CONTRIBUTIONS</head></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Biological Reviews 96 (2021) 1135-1159 &#169; 2021 Cambridge Philosophical Society.Comparative bioacoustics of diverse animal sounds</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p><ref type="bibr">Catchpole &amp; Slater (2008)</ref>;<ref type="bibr">Dalziell &amp; Cockburn (2008)</ref>; Cholewiak et al. (2013) Repertoire Set of acoustically distinct elements, syllables, calls, vocalizations, or songs produced by an individual, group or species Song type, song repertoire, element repertoire, vocal repertoire (note that these are not direct synonyms, but rather refer to the class of vocalizations that may be grouped into a repertoire) Catchpole &amp; Slater (2008); Kershenbaum et al. (2016); Harris et al. (2016); Luttrell et al. (2016) Biological Reviews 96 (2021) 1135-1159 &#169; 2021 Cambridge Philosophical Society.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>Biological Reviews 96 (2021) 1135-1159 &#169; 2021 Cambridge Philosophical Society.</p></note>
		</body>
		</text>
</TEI>
