<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Long‐term analysis of persistence and size of swallow and martin roosts in the &lt;scp&gt;US&lt;/scp&gt; Great Lakes</title></titleStmt>
			<publicationStmt>
				<publisher>Zoological Society of London</publisher>
				<date>08/01/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10477817</idno>
					<idno type="doi">10.1002/rse2.323</idno>
					<title level='j'>Remote Sensing in Ecology and Conservation</title>
<idno>2056-3485</idno>
<biblScope unit="volume">9</biblScope>
<biblScope unit="issue">4</biblScope>					

					<author>Maria Carolina Belotti</author><author>Yuting Deng</author><author>Wenlong Zhao</author><author>Victoria F. Simons</author><author>Zezhou Cheng</author><author>Gustavo Perez</author><author>Elske Tielens</author><author>Subhransu Maji</author><author>Daniel Sheldon</author><author>Jeffrey F. Kelly</author><author>Kyle G. Horton</author><author>Vincent Lecours</author><author>Matthew Van Den Broeke</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>In this study, we combined a machine learning pipeline and human supervision to identify and label swallow and martin roost locations on data captured from 2000 to 2020 by 12 Weather Surveillance Radars in the Great Lakes region of the US. We employed radar theory to extract the number of birds in each roost detected by our technique. With these data, we set out to investigate whether roosts formed consistently in the same geographic area over two decades and whether consistency was also predictive of roost size. We used a clustering algorithm to group individual roost locations into 104 high‐density regions and extracted the number of years when each of these regions was used by birds to roost. In addition, we calculated the overall population size and analyzed the daily roost size distributions. Our results support the hypothesis that more persistent roosts are also gathering more birds, but we found that on average, most individuals congregate in roosts of smaller size. Given the concentrations and consistency of roosting of swallows and martins in specific areas throughout the Great Lakes, future changes in these patterns should be monitored because they may have important ecosystem and conservation implications.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Swallows and martins (family: Hirundinidae) are highly specialized aerial insectivores that, like other species that rely on insects as their main energy source, have been facing regional and continental-wide declines in their breeding populations in the US and Canada <ref type="bibr">(Rosenberg et al., 2019;</ref><ref type="bibr">Sauer et al., 2017;</ref><ref type="bibr">Smith et al., 2015)</ref>. Some of the potential drivers of these declines include habitat loss, decreases in prey abundance, impacts from environmental contaminants and phenological mismatches due to climate change <ref type="bibr">(Cox et al., 2019;</ref><ref type="bibr">Hallmann et al., 2014;</ref><ref type="bibr">Kardynal et al., 2020;</ref><ref type="bibr">Spiller &amp; Dettmers, 2019)</ref>. Particularly, in the case of the migratory species in this family, it is still a challenge to understand how these drivers of decline affect different stages of their annual cycles and how these effects interact between seasons to produce the overall pattern observed in their breeding populations This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. <ref type="bibr">(Holmes, 2007;</ref><ref type="bibr">Marra et al., 2015;</ref><ref type="bibr">Webster et al., 2002)</ref>. Gathering metrics of population size and the variation therein across broad spatial and temporal scales would have important conservation implications, while also providing insights into the fundamental ecology of these species.</p><p>Recent advances in tracking technologies have allowed us to address some of these questions, but they are limited to individual-level data over relatively short periods <ref type="bibr">(Supp et al., 2021;</ref><ref type="bibr">Wikelski et al., 2007)</ref>. Due to these limitations, tools such as community science data <ref type="bibr">(iNaturalist, 2022;</ref><ref type="bibr">Sullivan et al., 2009;</ref><ref type="bibr">Wikiaves, 2022)</ref>, stable isotope analysis <ref type="bibr">(Imlay et al., 2018)</ref> and weather radars <ref type="bibr">(Dokter et al., 2010</ref><ref type="bibr">(Dokter et al., , 2018;;</ref><ref type="bibr">Horton et al., 2019)</ref> have recently been ascending as alternatives to observe population-wide patterns at large spatial and temporal scales. Particularly, in the case of swallows and martins, weather radars are an invaluable tool to collect data about their communal roosting behavior, though until recently we lacked the tools to leverage the potential of this data source.</p><p>After their breeding season, from mid-July until the end of September, species of the Hirundinidae family congregate in large communal premigratory roosts, which can tally more than 100 000 birds at the end of each day <ref type="bibr">(Allen &amp; Nice, 1952;</ref><ref type="bibr">Bent, 1942)</ref>. The early departure of thousands of birds in synchrony from these roosting places is so conspicuous that they can easily be identified in weather radars, where they appear as donut-shaped rings that disperse outwards in the hours near sunrise <ref type="bibr">(Russell et al., 1998)</ref>. While the primary ecological importance of these roosts is largely unknown, especially prior to a long distance migration, these behaviors offer a unique and important snapshot to monitor population changes. The main challenge of using weather surveillance radars to obtain data about these aggregations is manually finding the roosts in the collection of millions of radar scans generated across decades. Additionally, establishing systematic criteria for classifying the multiple sources of noise and weather contamination inside the roosting signature are complicated by the vast diversity of patterns captured by the radars, especially at large spatial scales.</p><p>However, machine learning approaches have the potential to alleviate this barrier by disentangling the roost signatures and automatically discerning their locations and the boundaries of their dispersal <ref type="bibr">(Cheng et al., 2020)</ref>. With such tools, we can now analyze a dataset that spans two decades and can provide high spatial and temporal sampling frequency. We decided to test this approach in the Great Lakes region, where multiple species of Hirundinidae have been shown to roost. Large congregations of Purple Martins Progne subis, Tree Swallows Tachycineta bicolor, Barn Swallows Hirundo rustica, Bank Swallows Riparia riparia and Cliff Swallows Petrochelidon pyrrhonota are regularly seen in this region between July and September <ref type="bibr">(Bridge et al., 2015;</ref><ref type="bibr">Burney, 2002;</ref><ref type="bibr">Kelly et al., 2012)</ref>. Additionally, we have found historical accounts tracing back to 1910 of roosts gathering Tree Swallows and Bank Swallows at the Black River Bay, in Lake Ontario, near Toledo, Ohio, at Cedar Point and also in Ohio <ref type="bibr">(Bent, 1942, pp. 396 and 420)</ref>. With more than two decades of radar remote sensing data available throughout this region, we set out to use these newly available tools to identify the locations of Hirundinidae roosts and provide some of the first estimates of macroscale abundance.</p><p>With these data, we set out to quantify how consistently the population in our study region returned to the same regions to roost over the years (which we refer to as 'population-level site fidelity'), and whether site usage consistency is related to roost size. Investigations of the spatial and temporal patterns of site fidelity can help us understand the role played by competition and social dynamics as drivers of nonbreeding habitat selection in migratory bird species (e.g., see <ref type="bibr">Marra &amp; Holmes, 2001;</ref><ref type="bibr">Sherry &amp; Holmes, 1996)</ref>. Additionally, if site fidelity is driven by adults who return to good habitats based on experiences from previous seasons, then examining site fidelity patterns can provide indirect information about adult survival and overall population dynamics <ref type="bibr">(Cresswell, 2014)</ref>. Lastly, populations that strongly rely on a specific region during their migratory journey might be particularly vulnerable to anthropogenic impacts on that site, making them a species of conservation concern <ref type="bibr">(Warkentin &amp; Hern andez, 1996)</ref>.</p><p>Finally, an important step in understanding the ecological implications of such postbreeding aggregations is characterizing within-season variations in their size distribution. Until now, the techniques available to quantify the sizes of these large roosts suffered from observer variability, and thus were hard to reproduce from year to year, or even on a daily basis <ref type="bibr">(Erwin, 1982;</ref><ref type="bibr">Frederick et al., 2003)</ref>. With the use of weather radars, we can not only estimate the number of birds within each roost, but also examine within-and between-season changes. We describe the wide range of roost sizes found in the Great Lakes region and how this distribution changes as the overall population increases throughout the season. radars that conduct volume scans of the atmosphere at regular intervals (5-10 min). Each scan is composed of multiple 360&#176;horizontal sweeps of the radar antenna taken at different elevation angles. Scans are organized in a 3D array following the CfRadial convention in which cells, indexed by the horizontal angle of the antenna (each angle is called an azimuth), by distance from the radar (called range) and by elevation angle corresponding to measurements of the returned signal from a voxel of the volume covered by the radar beam (see <ref type="bibr">Dixon &amp; Lee, 2021</ref> for a detailed specification of the CfRadial standard).</p><p>For the purposes of this paper, we downloaded all radar scans from the Great Lakes region (12 radar stations) available between June 1st and October 31st across 21 years (2000-2020, see Fig. <ref type="figure">1A</ref> and Appendix S2, Fig. <ref type="figure">S4</ref>, for data completeness information). Each day, we processed the data in the period starting 30 min before local sunrise and ending 90 min afterwards (an average of 16.2 scans per day). Reflectivity factor and radial velocity from the first sweep of each scan were rendered in Cartesian grids with 600 9 600 pixels (1 pixel = 0.5 km).</p><p>These matrices were then used as input to the pretrained model described below. Each station's local sunrise time was calculated using PyEphem and pytz Python packages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Automated detection and screening</head><p>The machine learning system developed by <ref type="bibr">Cheng et al. (2020)</ref> operates in two steps-detection and tracking-following the paradigm proposed by <ref type="bibr">Ren (2008)</ref> (see Fig. <ref type="figure">1B</ref>). In the detection step, given some candidate proposal regions defined by positions in a radar scan and reference roost sizes, we trained a Faster R-CNN (acronym for Region-based Convolutional Neural Network) detector to predict the probability that a roost of each candidate size exists near each reference position <ref type="bibr">(Ren et al., 2015)</ref>. A high predicted probability indicates a roost detection according to the model. The detector also predicts bounding boxes for the detected roosts. To ease implementation, we leveraged the 'R101-FPN 3x' Faster R-CNN model from Detectron2, an open-source library in PyTorch <ref type="bibr">(Wu et al., 2019)</ref>. We picked 45 anchors at each position and otherwise used the default model Figure <ref type="figure">1</ref>. Diagram representing the data processing steps, beginning with the radar data downloaded from Amazon Web Services and ending at the extraction of biological summaries from the bird counts on each roost. At each step, we report the number of detections (single frames found in one scan), tracks (sequences of detections in subsequent scans) and groups of tracks (groups of overlapping tracks that correspond to a single roost). Each group of tracks corresponds to one roost's morning dispersal event.</p><p>configurations from the library. We loaded the model checkpoint trained for the COCO detection task from Detectron2 and fine-tuned it using the annotated training dataset developed for the paper by <ref type="bibr">Laughlin et al. (2016)</ref>. Since different annotators of the training data differ in their annotation styles, we adopted the factors from <ref type="bibr">Cheng et al. (2020)</ref> to scale bounding box annotations, thus alleviating annotator biases. We used an Adam optimizer with a learning rate of 0.001 to train the detector <ref type="bibr">(Kingma &amp; Ba, 2014)</ref>. For each scan, we kept 100 topscoring roost detections with predicted probability scores of at least 0.05.</p><p>In the tracking stage, since the same roost is expected to show up in temporally consecutive scans, we reproduce the strategy described by <ref type="bibr">Cheng et al. (2020)</ref> to predict the association among roost detections from individual scans. Finally, when possible, we removed rain false positives using dual-polarization data (specifically correlation coefficient) and wind farms according to the US Wind Turbine Database <ref type="bibr">(Hoen et al., 2018)</ref>. While polarimetric products provide a robust filter for the detection of weather contamination, they are not available throughout the entire temporal extent of our study (only from 2013 onward), and thus additional manual screening was necessary for removing potential nonbiological contaminants.</p><p>We manually picked a low score threshold (0.05) to ensure high recall regardless of the low precision of the predictions since we wanted to make sure that the pipeline would capture all roosts. To remove the false positives and detections with weather or noise contamination from our dataset, we developed a screening protocol (Fig. <ref type="figure">1C</ref>) used by three different screeners (MB, YD and VS) to evaluate all the predicted positive detections (screening 1 year of data for a given station takes on average c. 45 min, see Appendix S1 for details of the screening protocol). We classified each true positive track, composed of a series of consecutive detections, as either clear, weather contaminated, having more than 20% anomalous propagation, containing other sources of unknown noise or tracking failures, in which the tracking algorithm failed to follow the roosts (e.g., see Fig. <ref type="figure">2</ref>). Days that had more than 50% of the reflectivity renderings filled with weather or anomalous propagation were discarded from the analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Overlap correction and mean shift clustering</head><p>Some of the weather radars selected for this study have clear overlapping ranges (see Fig. <ref type="figure">3A</ref>). If a roost occurred inside this overlapping region, it could be detected by more than one station, which would result in duplicated tracks and inflated estimates of the number of birds. To remove this bias, for each track, we computed the union of all its bounding boxes. We then grouped our dataset by day and calculated, for each pair of tracks, the overlap ratio, which we defined as the ratio between the area of the intersection of each pair of tracks and the area of their union (Fig. <ref type="figure">1D</ref>). If the overlap ratio was greater than 0.2, the tracks were considered to be in the same group (see Appendix S2 for details about the selection of ratio threshold).</p><p>To establish regions of repeated use throughout the years, we captured the first detection of each group described above and used it as input to the mean shift clustering algorithm (implementation by <ref type="bibr">Pedregosa et al., 2011</ref>, Fig. <ref type="figure">1E</ref>), which finds density-based clusters while making minimal assumptions about cluster shape (see Appendix S2 for detailed methods of parameter choice). With this approach, we assume that regions consistently used by birds to roost would have a higher density of points and thus be included in the same cluster. We chose to use the first detection of each group because we observed that these were the most precise representatives of the actual roost's location. The cluster labels were then extrapolated to other points within each group, which is why the final clusters overlap.</p><p>Up until this postprocessing stage, we included the first detection of all the tracks classified by the screeners as true positives (bounding boxes that contained a verified roost dispersal), even the ones that were noisecontaminated or that the algorithm failed to track. These so-called 'contaminated' tracks provide useful information about roost location, but they cannot be used to calculate estimates of number of birds, since it cannot be assumed that all the scatterers in them are exclusively birds. For this reason, from this step onwards they were removed from the analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Reflectivity extraction and postprocessing</head><p>Following manual screening to remove contaminants, we used coordinates of the center of each roost and its radius, both estimated by the detector, to obtain the bounding boxes of each departure event. We used these data to extract the raw reflectivity factor values from Level II radar data in polar coordinates (each voxel's location is defined by its range and azimuth) across all elevation angles of each full scan. To estimate the number of birds per roost, we followed the approach established by <ref type="bibr">Chilson et al. (2012)</ref>: we converted reflectivity factor (Ze) measured at each range interval sample, originally in dBZ, to linear scale (mm 6 /m 3 ), and then transformed it into reflectivity, which can be interpreted as the density of scatterers in the atmosphere (units are cm 2 /km 3 ). We can then multiply this result by the theoretical volume sampled by the radar beam at each range interval, and divide it by the radar cross section (RCS) of the scatterers assumed to be found in the roost (Fig. <ref type="figure">1F</ref>). In summing the results of these operations in the region inside each bounding box, we obtain a count estimate for that sweep.</p><p>We opted to obtain conservative estimates of the number of birds by choosing Purple Martins as our benchmark radar cross section, since they are the largest species of Hirundinidae found in North America. Though previous studies suggest that the majority of roosts in the region are Purple Martins <ref type="bibr">(Bridge et al., 2015;</ref><ref type="bibr">Kelly</ref>  et al., 2012), our ability to ascertain the species identification is still somewhat limited using radar data alone. The radar cross section of Purple Martins can be obtained from their mass-51 g, see <ref type="bibr">Dunning (2008)</ref>-adopting the relationship between mass and RCS proposed by <ref type="bibr">Horton et al. (2019)</ref>: log(RCS) = 0.699 9 log(mass). Bank Swallows, the smallest species to participate in such aggregations, would yield counts c. 2.6 times higher, since their mass is 13 g according to <ref type="bibr">Dunning (2008)</ref>.</p><p>The volume sampled by the radar is assumed to be shaped like a truncated cone with an axis aligned with the antenna's peak power axis, cut by two parallel planes at each range gate. For data before the Super Resolution upgrade, the cone's apex angle was assumed to be 1&#176;in width. After the 2007-2008 upgrade, we assumed an elliptical cone of 1&#176;vertical beam width and 0.5&#176;horizontal beam width <ref type="bibr">(Torres, 2007)</ref>. In order to further clean the data from spurious contamination, we removed voxels inside the detection's bounding box if the reflectivity factor in them exceeded 30 dBZ <ref type="bibr">(Horton et al., 2019)</ref>.</p><p>To prevent nonbiological scatterers from the upper levels of the atmosphere from artificially increasing our estimate of the number of birds, we only analyzed sweeps that had heights below 5000 m at the 150 km range. We counted birds across the entire vertical structure of the roost departure by grouping the multiple sweeps of each detection in bins of 1&#176;intervals according to their elevation angle. Afterward, we took the mean bird count of each bin and summed the counts from all available bins. With this procedure, we hoped to avoid summing regions in the airspace sampled twice by consecutive sweeps that had elevation angles with &lt;1&#176;i nterval between them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Analysis</head><p>To describe the peak activity in each group of tracks obtained as described above, we extracted the maximum number of birds in each series of detections. We then aggregated groups of tracks by day and summed the maximum bird counts obtained from roosts occurring on the same day. We did this assuming that we can sum the maximum bird count for each group without double counting since we have corrected most overlapping tracks. In doing so, we have a measurement of the maximum number of birds per day in the study region. We further aggregate this by cluster and year and compute the maximum number of birds per cluster per year, which we interpret as the peak number of birds throughout the region each year. To obtain a cluster's persistence, we sum the number of years when each cluster was detected (Fig. <ref type="figure">1G</ref>).</p><p>Because radar stations had incomplete archives for some years, we have restricted the dataset to the 225 pairs of station-years, out of a total of 246 (91.4%), in which we had at least 130 days of sampling (see Appendix S2, Fig. <ref type="figure">S4</ref> for data completeness information). All the analyses conducted in the rest of the paper will describe results from this filtered dataset. In Figure <ref type="figure">6</ref>, however, we show clusters detected every year when their nearest stations were operational, regardless of the number of days, to highlight the full temporal scope of the dataset.</p><p>We used the peak activity in each group of tracks to measure changes in the distribution of roost sizes throughout the season. We tested the null hypothesis of no change in mean roost size between the months of July, August, September and October with the Kruskal-Wallis test (implemented in base R version 4.1.3). Since the Kruskal-Wallis test suggested that the means were in fact different (p-value &lt;0.001), we performed pairwise comparisons with the Wilcoxon rank sum test using the Holm method to adjust the P-values for multiple comparisons (implemented in the base R version 4.1.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Detections, tracks and groups of tracks</head><p>Across a total of 940 641 radar scans from 2000 to 2020, we obtained 67 273 detections, or snapshots of departure events, which the machine learning pipeline grouped into 16 222 tracks representing sequences of snapshots of the same dispersal. Of those tracks, 85.32% (13 841) were considered clear, 0.62% (100) had at least 20% of anomalous propagation inside the bounding boxes, 10.27% (1667) were contaminated by weather or other unknown sources of noise and 3.78% (614) were bad-tracks. After correcting for radar range overlaps, we had a total of 14 118 groups of tracks, each one corresponding to a unique dispersal event. Each track group had an average of 4.76 detections (SD = 3.45). The first detection of each group of tracks was then used as input to the mean shift clustering algorithm, which yielded 104 clusters with silhouette coefficient of 0.703 (represented in Fig. <ref type="figure">3</ref>). After removal of tracks contaminated by noise or poorly captured by the machine learning pipeline, we had 12 248 groups of clear tracks, with an average of 4.40 detections (SD = 3.12).</p><p>Every year in which we had more than 130 sample days per station, we extracted the day when the maximum number of martins and swallows was detected in the area covered by the NEXRAD network in the Great Lakes region. This number was highest in 2007 (on August 17th), when 536 836 birds across 19 clusters were detected by the radars. In contrast, the peak number of birds was lowest in 2013, with a total 143 625 birds spread across 13 clusters on 13 August 13. In Figure <ref type="figure">4</ref>, we show, in each month, the frequency in days of each bird count bin (across all 21 years). The largest single roost in our series occurred on September 14th, 2005, in Fond du Lac, Wisconsin, when we estimated a departure of 187 736 birds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Roost size distribution</head><p>The data suggest that small roost sizes are more common than larger ones since the daily size distribution throughout the years is heavy tailed (see Fig. <ref type="figure">5A</ref>). Our results indicate that, in August, when the overall population is higher, there is an increase in the frequency of smaller sized roosts paired with the formation of roosts of size larger than 100 000 birds (Fig. <ref type="figure">5C</ref>). Pairwise comparisons showed statistically significant (p-value &lt;0.001) differences in the mean group size of all pairs of months except for September and October (p-value = 0.11). These tests reveal an interesting asymmetry between the beginning and the end of the season, with the average group size in July (mean 5388, SD = 7998) being smaller than the average observed in September (mean 9011, SD = 14 123).</p><p>During July, September and October, more than 60% of the population can be found in smaller roosts, ranging from a few individuals up to 25 000 birds (Fig. <ref type="figure">5B</ref>). This pattern changes in August, when roosts of larger size start to gather larger percentages of the population. Even though roosts of more than 100 000 individuals occur in every year of our 21-year series, they do not gather the majority of the Hirundinidae population in the Great Lakes region, since the proportion of the population aggregated in smaller roosts is larger.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Persistence and roost size analysis</head><p>Of 104 clusters, which we interpret as regions of frequent use by swallows and martins, 98 of them were detected in more than 1 year (94.2%), 11 of which occurred every year from 2000 to 2021 (10.6%, see Fig. <ref type="figure">6</ref>). We found three clusters that were active on all years when the radar stations closer to them were active: two of them occurred in east Lake Ontario for 17 years (near KTYX), and one in northern Minnesota (close to KDLH) for 20 years (see Fig. <ref type="figure">6</ref>). We used a simple linear regression to test if persistence was a significant predictor of cluster size, measured as the peak number of birds per cluster per year. To account for years when the entire season was unavailable on the archive, we used the number of persistent years over the total number of fully sampled (more than 130 days) years as a predictor variable. The fitted model was log(y) = 1.00 x + 3.33, and it was statistically significant with p-value &lt;0.001, and R-squared = 0.441 (see Fig. <ref type="figure">7</ref>), indicating that areas that were consistently used throughout the years hosted larger roosts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>Weather surveillance radars have proven to be an invaluable source of data about bird migration ecology. Particularly in the case of some species of the Hirundinidae family, due to their specific postbreeding roost formation behavior and timing, weather radars offer a rare opportunity to monitor and inventory their populations with macroscale coverage <ref type="bibr">(Kelly &amp; Horton, 2016)</ref>. The main challenge currently faced in the effort to reach such broad spatial and temporal scales is the detection of roosts, which is further complicated by the diverse sources of biological and nonbiological scattering that are typically captured by the radars. By integrating machine learning and human supervision, we have addressed both challenges, extending the temporal range from previous studies like <ref type="bibr">Bridge et al. (2015)</ref> and <ref type="bibr">Kelly et al. (2012)</ref>, but more importantly, extracting critical indices of the magnitude of each communal roost throughout the years.</p><p>An important caveat of working with weather radar data is their lack of ability to sample lower heights above ground level, especially at greater distances to the radar station. When we can assume that the observed biomass density does not change with radar range, the systematic underestimation of density at greater distances to the radar station can be corrected using vertical profiles of reflectivity <ref type="bibr">(Buler &amp; Diehl, 2009)</ref>. In the case of communal roosts, however, we can not assume that biomass density is constant at varying ranges, and thus, techniques to adjust density estimates in this scenario based on radar data alone have not yet been developed. We remark, however, that in this study, we found little correlation between distance to the radar station and roost size or number of birds (see Appendix S2, Figs. <ref type="figure">S5</ref> and <ref type="figure">S6</ref>), indicating that this issue likely has not influenced our conclusions.</p><p>We decided to use Purple Martins as our benchmark Hirundinidae because they are the largest species in this family, and thus produce conservative estimates of the number of birds per roost. Conservative abundance estimates can often be enough to diagnose population declines, monitor year-to-year abundance variation and inform conservation decision-making (e.g., see <ref type="bibr">Hunter et al., 2010</ref>), yet they limit our ability to understand the full complexity of the Hirundinidae communal roosting behavior. Further studies pairing radar roost detections with on-the-ground species identification would allow us to characterize species-specific radar signatures in terms of their dispersal time, flight speed and polarimetric measurements <ref type="bibr">(Stepanian et al., 2016)</ref>. Moreover, verified data could eventually be used as training data for a new model layer that assigns a species to each dispersal event.</p><p>Our results show 11 subregions (clusters) in the Great Lakes which were used year-after-year for the past 21 years, demonstrating how consistent the selection of premigratory roosting habitat is for these species <ref type="bibr">(Allen &amp; Nice, 1952;</ref><ref type="bibr">Bent, 1942;</ref><ref type="bibr">Bridge et al., 2015;</ref><ref type="bibr">Russell et al., 1998)</ref>. This finding, together with the discovery that more persistent clusters were also larger in size, suggests that these regions might successfully support many birds,  and prompts further questions about the role swallows and martins have in those ecosystems. A study conducted by <ref type="bibr">Powell et al. (1991)</ref> with much smaller avian aggregations demonstrated, for example, how seabirds in Florida Bay have an important role in translocating and concentrating nutrients in the regions close to their roost/colony sites, ultimately leading to changes in the adjacent seagrass communities.</p><p>We further exemplify the potential impact of a Purple Martin roost on the Great Lakes trophic network using the approach proposed by <ref type="bibr">Kelly et al. (2013)</ref> to calculate the number of insects consumed on the day of peak activity in 2007 (August 17th). Assuming an adult Purple Martin needs 137 kJ per day to maintain regular metabolic functions and that each gram of an insect's dry mass provides 23 kJ, we can derive the total dry mass needed to fulfill the energetic requirements of a single adult. With this, we can employ the relationship between insect length and dry mass proposed by <ref type="bibr">Sage (1982)</ref>, and assume an average prey will have 20 mm, to determine how many insects would be needed to satisfy that daily dry mass requirement. In doing so, we estimate that on August 17, 2007, Purple Martins may have consumed more than 33 million insects. Examining summed daily martin activity, an average season may see as many as 338 million insects consumed throughout the Great Lakes region during the roosting period. While these simple calculations make many assumptions, they might serve as a basis for deeper investigations into the trophic role played by aerial insectivores across broad regions, and what the impacts of this guild's recent declines could be.</p><p>A secondary finding of our approach is related to how birds of the Hirundinidae family occupy the airspace. Our data suggests that, during their dawn departure, they might reach up to 4.5 km above ground level, which is higher than the maximum heights found by <ref type="bibr">Helms et al. (2016)</ref> but similar to results from <ref type="bibr">Dreelin et al. (2018)</ref> for swallows and martins during their breeding season. Combining altitude loggers and radar studies to explore demographic and seasonal species-specific patterns of flight height might provide valuable insight into how these aerial insectivores use different levels of the aerosphere during their annual cycle. Additionally, because the machine learning pipeline detects and tracks each roost's dispersal, it could be possible to rebuild a three-dimensional model of the departure that improves our understanding of how the birds coordinate their collective take off, and might eventually be used to correct the range bias mentioned above.</p><p>Empirical studies of animal aggregations in fish, ungulates and house sparrows Passer domesticus have found group size distributions of all these species to be typically heavy-tailed, with smaller groups being significantly more common than larger ones (e.g., see <ref type="bibr">Bonabeau et al., 1999;</ref><ref type="bibr">Gerard et al., 2002;</ref><ref type="bibr">Griesser et al., 2011)</ref>. Theoretical studies have demonstrated that these distributions typically arise from the combined effects of animal movement and group splitting and amalgamation (also called fission and fusion) processes <ref type="bibr">(Niwa, 2004;</ref><ref type="bibr">Okubo, 1986)</ref>. Though these models accurately describe animals in movement, their translation to the communal roosting scenario is not straightforward, since roosts are not a direct result of movement dynamics, but rather of central place foragers daily gathering in a central place from which they depart the next day to forage <ref type="bibr">(Lalla, 2022)</ref>. Nonetheless, we have shown that the same pattern of group size distribution occurs in our study system. This result prompts further investigation into models that can describe the mechanisms responsible for generating the specific dynamic of roost formation behavior.</p><p>As expected, we found that the daily distribution of roost sizes changes throughout the season, yet we have, for the first time, quantified the progression of these within-season changes. Further studies of this pattern relating it to environmental variables and investigating its change throughout the years may reveal important factors influencing roost formation behavior and driving the roost size distribution. Since most of the Hirundinidae population in the Great Lakes seem to gather in groups with smaller sizes, it would be interesting to investigate why larger roosts are more persistent throughout the years, and what is the relationship between group size, roost persistence and fitness.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>&#170; 2022 The Authors. Remote Sensing in Ecology and Conservation published by John Wiley &amp; Sons Ltd on behalf of Zoological Society of London.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>20563485, 2023, 4, Downloaded from https://zslpublications.onlinelibrary.wiley.com/doi/10.1002/rse2.323, Wiley Online Library on [04/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p></note>
		</body>
		</text>
</TEI>
