<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Using spatiotemporal information in weather radar data to detect and track communal roosts</title></titleStmt>
			<publicationStmt>
				<publisher>Remote Sensing in Ecology and Conservation</publisher>
				<date>04/17/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10557818</idno>
					<idno type="doi">10.1002/rse2.388</idno>
					<title level='j'>Remote Sensing in Ecology and Conservation</title>
<idno>2056-3485</idno>
<biblScope unit="volume">10</biblScope>
<biblScope unit="issue">5</biblScope>					

					<author>Gustavo Perez</author><author>Wenlong Zhao</author><author>Zezhou Cheng</author><author>Maria_Carolina_T D Belotti</author><author>Yuting Deng</author><author>Victoria F Simons</author><author>Elske Tielens</author><author>Jeffrey F Kelly</author><author>Kyle G Horton</author><author>Subhransu Maji</author><author>Daniel Sheldon</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The exodus of flying animals from their roosting locations is often visible as expanding ring‐shaped patterns in weather radar data. The NEXRAD network, for example, archives more than 25years of data across 143 contiguous US radar stations, providing opportunities to study roosting locations and times and the ecosystems of birds and bats. However, access to this information is limited by the cost of manually annotating millions of radar scans. We develop and deploy an AI‐assisted system to annotate roosts in radar data. We build datasets with roost annotations to support the training and evaluation of automated detection models. Roosts are detected, tracked, and incorporated into our developed web‐based interface for human screening to produce research‐grade annotations. We deploy the system to collect swallow and martin roost information from 12 radar stations around the Great Lakes spanning 21years. After verifying the practical value of the system, we propose to improve the detector by incorporating both spatial and temporal channels from volumetric radar scans. The deployment on Great Lakes radar scans allows accelerated annotation of 15628 roost signatures in 612786 radar scans with 183.6 human screening hours, or 1.08s per radar scan. We estimate that the deployed system reduces human annotation time by ~7×. The temporal detector model improves the average precision at intersection‐over‐union threshold 0.5 (AP<sup>IoU=.50</sup>) by 8% over the previous model (48%→56%), further reducing human screening time by 2.3× in its pilot deployment. These data contain critical information about phenology and population trends of swallows and martins, aerial insectivore species experiencing acute declines, and have enabled novel research. We present error analyses, lay the groundwork for continent‐scale historical investigation about these species, and provide a starting point for automating the detection of other family‐specific phenomena in radar data, such as bat roosts and mayfly hatches.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Weather radar is one of the most promising technologies for studying flying animals, with networks of weather surveillance radars around the globe continuously monitoring the airspace and detecting birds, bats, and insects in addition to precipitation <ref type="bibr">(Kunz et al., 2008)</ref>. The US Next Generation Weather Radar (NEXRAD) weather radar network <ref type="bibr">(Crum &amp; Alberty, 1993)</ref>, in particular, has archived 25 years of data from 143 radar stations covering nearly the entire contiguous US <ref type="bibr">(Ansari et al., 2018)</ref> and offers the possibility of monitoring flying animals at an unprecedented scale and resolution <ref type="bibr">(Bruderer, 1997;</ref><ref type="bibr">Dokter et al., 2011;</ref><ref type="bibr">Gauthreaux et al., 2003;</ref><ref type="bibr">Gauthreaux &amp; Belser, 1998;</ref><ref type="bibr">Gauthreaux, 1970)</ref>. These data have fueled a growing number of studies at increasing scales about bird populations, including studies about nocturnal migration and stopover behaviors <ref type="bibr">(Buler &amp; Dawson, 2014;</ref><ref type="bibr">Buler &amp; Diehl, 2009;</ref><ref type="bibr">Cohen et al., 2021;</ref><ref type="bibr">Farnsworth et al., 2016;</ref><ref type="bibr">Gauthreaux et al., 2003;</ref><ref type="bibr">Horton et al., 2018)</ref>, demography <ref type="bibr">(Dokter, Farnsworth, et al., 2018)</ref>, the effects of artificial light on migration <ref type="bibr">(McLaren et al., 2018;</ref><ref type="bibr">Van Doren et al., 2017</ref><ref type="bibr">, 2021)</ref>, systems to forecast migration <ref type="bibr">(Van Doren &amp; Horton, 2018)</ref>, and landmark findings about the declines <ref type="bibr">(Rosenberg et al., 2019)</ref> and shifting phenologies of North American birds <ref type="bibr">(Horton et al., 2020)</ref>. NEXRAD data have also produced insights into changing bat <ref type="bibr">(Stepanian &amp; Wainwright, 2018)</ref> and insect <ref type="bibr">(Stepanian et al., 2020)</ref> populations in North America, while similar research programs are being carried out on other continents <ref type="bibr">(Nilsson et al., 2019;</ref><ref type="bibr">Nussbaumer et al., 2019;</ref><ref type="bibr">Shamoun-Baranes et al., 2014)</ref>.</p><p>Communal roosts occur throughout the US, especially in late summer and fall <ref type="bibr">(Russell &amp; Gauthreaux, 1998)</ref>, and are often dominated by a single species, especially Purple Martins (Progne subis) and Tree Swallows (Tachycineta bicolor) <ref type="bibr">(Bridge et al., 2015;</ref><ref type="bibr">Laughlin et al., 2016)</ref>, depending on habitat and time of year. The roosts may also gather Barn Swallows (Hirundo rustica), Bank Swallows (Riparia riparia), and Cliff Swallows (Petrochelidon pyrrhonota), with occasional sightings of individuals of Northern Rough-Winged Swallows (Stelgidopteryx ruficollis) and Violet Green Swallows (Tachycineta thalassina) joining roosts of the aforementioned species <ref type="bibr">(Winkler, 2006)</ref>. These species of martins and swallows are aerial insectivores, which are rapidly declining in North America <ref type="bibr">(Fraser et al., 2012;</ref><ref type="bibr">Nebel et al., 2010;</ref><ref type="bibr">Rosenberg et al., 2019)</ref>, so taxon-specific results about their ecology are of great interest. In general, swallow roosts are one of a relatively small number of radar phenomena that can be traced to family level due to its distinctive expanding ring pattern <ref type="bibr">(Horn &amp; Kunz, 2008;</ref><ref type="bibr">Russell &amp; Gauthreaux, 1998)</ref> (See Fig. <ref type="figure">1A</ref>).</p><p>The NEXRAD archive includes more than 0.5 petabytes of data and 240 million scans, each of which may contain a variety of patterns corresponding to different types of precipitation, clutter, or biological scatterers. Therefore, methods are needed to automatically recognize, discriminate, and track different types of biological scatterers to collect measurements at large scales. AI algorithms based on convolutional neural networks have shown tremendous success at related visual recognition tasks, and are excellent candidates for recognizing biological patterns in weather radar. Past work has focused on discriminating precipitation from broad-scale bird migration in radar data through the use of AI classification <ref type="bibr">(Horton et al., 2019;</ref><ref type="bibr">RoyChowdhury et al., 2016;</ref><ref type="bibr">Van Doren &amp; Horton, 2018)</ref> or segmentation <ref type="bibr">(Lin et al., 2019)</ref> methods. An AI-assisted system to monitor martins and swallow roosting activity in past, present, and future radar scans could provide information urgently needed for basic science and conservation of these species. Past studies have used radar to study swallow roosts using human annotation to identify roosts in radar <ref type="bibr">(Bridge et al., 2015;</ref><ref type="bibr">Kelly &amp; Pletschet, 2017;</ref><ref type="bibr">Laughlin et al., 2013</ref><ref type="bibr">Laughlin et al., , 2016;;</ref><ref type="bibr">Winkler, 2006</ref>), but cannot be easily repeated or expanded due to the cost of human effort.</p><p>In this paper, we investigate the design, deployment, and analysis of an AI-assisted system capable of extracting research-grade roost annotations from NEXRAD data as shown in Figure <ref type="figure">1</ref>. Building on a prior system <ref type="bibr">(Cheng et al., 2020)</ref>, we make a number of novel contributions. We construct a standardized dataset of roost annotations to remove labeling style differences and to support standard training and evaluation methods. We enhance the detector neural network architecture to be able to process extra channels of radar moments at different elevations and from multiple scans, which allow the model to recognize the distinctive expanding movement dynamics of roost rings improving from 48% to 56% average precision at intersection-over-union threshold 0.5 (AP IoU = .50 ). <ref type="foot">1</ref> We develop open-sourced software for the automated roost detection and a user interface where humans can visualize and screen the machine predictions. We deploy the AI-assisted system on 21 years of radar scans from 12 radar stations near the Great Lakes region of the US, successfully extracting research-grade historical roost data for 8% of the radar stations in the contiguous US that cover a region with large swallow populations. The types of biological research possible with these data are demonstrated in our case study as well as recent research <ref type="bibr">(Belotti et al., 2023;</ref><ref type="bibr">Deng et al., 2023)</ref> that quantify long-term phenological patterns of aerial insectivores and perform long-term analyses of the persistence of roosts. We report on estimated human annotation cost saving of: $ 7&#215;, model error analyses and the applicability of the system to bats that are another species of flying animals whose roosts are observable from radar data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Materials and Methods</head><p>In this section, we give preliminaries of the radar data and the rendering process ( &#167; 2.1). Then, we describe our standardized dataset used for system training and evaluation, the development and deployment of our proposed AI-assisted system for roost annotation ( &#167; 2.2), the incorporation of temporal information to improve the detection model ( &#167; 2.3), and a biology case study to demonstrate the usefulness of our collected annotations ( &#167; 2.4).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Radar data</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Preliminaries</head><p>The Weather Surveillance Radar-1988 Doppler (WSR-88D, also called NEXRAD) network operated by the U.S. National Weather Service contains 143 stations in the contiguous U.S. and 16 stations in Alaska, Hawaii, and other U.S. territories. It has archived radar data products since the 1990s. The products are collected through radar volume scans ("scans"), each consisting of sweeps that collect various radar moments at several elevation angles (Fig. <ref type="figure">4A top</ref>). In each sweep, the radar antenna rotates 360&#176;around the vertical axis to sample cone-shaped "slices" of the surrounding airspace. The scans happen every 4-10 min.</p><p>Conventionally, WSR-88D radars collect 3 radar moments at various elevation angles. Reflectivity factor measures the density of objects in the atmosphere. Radial velocity measures the speed at which objects are moving relative to the radar station using the Doppler shift of the reflected radio waves. Reflectivity-weighted mean and standard deviation of radial velocity are collected as the radial velocity and spectrum width radar moments. In this paper, our detectors are trained using subsets of these products. Between 2011 and 2013, the radar stations were upgraded to also collect 3 dual polarization ("dual-pol") radar moments <ref type="bibr">(Stepanian, 2015;</ref><ref type="bibr">Stepanian et al., 2016)</ref>: differential reflectivity, differential phase, and correlation coefficient. These products result from both horizontal and vertical radar waves and can effectively been used to identify rain <ref type="bibr">(Cheng et al., 2020;</ref><ref type="bibr">Dokter, Desmet, et al., 2018;</ref><ref type="bibr">Stepanian et al., 2016;</ref><ref type="bibr">Zrni&#263; &amp; Ryzhkov, 1998)</ref>. We use dual-pol products, whenever available, to reduce false roost detections caused by rain during deployment.</p><p>Rendering WSR-88D radar sweeps produce two-dimensional arrays in polar coordinates indexed by range and azimuth (antenna pointing direction in the horizontal plane), with fixed antenna elevation angle. We use nearest neighbor interpolation <ref type="bibr">(Parker et al., 1983)</ref> to resample the 300 &#215; 300 km region centered at the radar station in each sweep onto a fixed 600 &#215; 600 Cartesian grid. Each pixel corresponds to 500 m. This rendering is known as a plan position indicator and corresponds to a top-down view of the cone shown at the top of Figure <ref type="figure">4A</ref>. Each radar volume scan is rendered as an array of the shape "number of radar moments &#215; number of elevations &#215; 600 &#215; 600." </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>AI-assisted system: development and deployment</head><p>We release our standardized dataset v0.1.0 with roost annotations<ref type="foot">foot_3</ref> and scripts and the library<ref type="foot">foot_4</ref> to develop and deploy machine learning models for automatically recognizing roosts. These reusable resources can continue to be enriched and collect more annotated data, now that hundreds of radar stations are performing new scans every few minutes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Roost dataset preparation and release</head><p>We obtain the same training, validation, and testing splits of 88 972 radar scans used by <ref type="bibr">(Cheng et al., 2020)</ref> and convert them into the COCO <ref type="bibr">(Lin et al., 2014)</ref> format that is commonly adopted for computer vision datasets. After removing the scans with rendering errors, the three splits have 53 266, 11 599, and 23 587 scans, respectively, and 88 452 scans in total. These radar scans were manually annotated by different annotators for prior ecological research <ref type="bibr">(Laughlin et al., 2016)</ref>. The training, validation, and testing splits have 37 619, 5139, and 10 942 roost labels. Each label records the position and radius of a circle that approximates the roost. We convert the circle labels into their bounding boxes. Since the annotators have different annotation styles, <ref type="bibr">Cheng et al. (2020)</ref> propose a latent-variable model and an expectationmaximization algorithm <ref type="bibr">(Dempster et al., 1977)</ref> to jointly learn a detection model and scaling factors specific to annotators. We adopt their learned factors to scale and standardize the annotations in our dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Roost detector training and configuration</head><p>We train a Faster R-CNN detector with a ResNet101-FPN backbone and 45 anchors ranging from 16&#215;16 to 512&#215;512. The backbone is pretrained on ImageNet classification and MS-COCO detection. The detector takes three channels as input, for which we selected reflectivity at 0.5&#176;and 1.5&#176;and radial velocity at 0.5&#176;. We scale the three-channel input to 1200 &#215; 1200 since Faster R-CNN that we use performs better on images of this size. The deployed detector achieves a 48.74 average precision at intersection-over-union threshold 0.5 (AP IoU = .50 ) on the test set described in AI-Assisted System: Development and Deployment. During deployment, we keep the 100 top-scoring roost detections with predicted probability scores of at least 0.05 in each scan for further processing. We select this low score threshold to ensure high recall, so that the predictions capture almost all roosts and could be later screened to remove false positives, see Appendix B for preliminaries about object detection with neural network models and Appendix E for ablation studies that support our design choices.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Tracking</head><p>We follow <ref type="bibr">(Cheng et al., 2020)</ref> and employ a simple greedy heuristic to associate single-frame detections into tracks <ref type="bibr">(Ren, 2008)</ref>. We start with high-scoring detections and add unmatched detections in neighboring frames with high overlap. We apply the Kalman filter <ref type="bibr">(Kalman, 1960)</ref> to smooth the roost tracks using a linear dynamical system for the bounding box center and radius. The linear system captures the dynamics of roost formation and expansion with parameters estimated from the ground truth annotations (e.g., the rate of expansion of roost bounding boxes).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Post-processing</head><p>As an optional step, when additional information about rain and wind farms are available, we apply a post-processing step to identify tracks with false-positive detections due to these sources. We consider a detection as rain and remove it if the majority of pixels in its bounding box has co-polar cross-correlation coefficient q HV [0.95, following the common rule of identifying rain <ref type="bibr">(Dokter, Desmet, et al., 2018)</ref>. We check whether the detections correspond to wind farms using known turbine locations from the U.S. Wind Turbine Database <ref type="bibr">(Hoen et al., 2019)</ref>, and if so, mark them as false positives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Deployment</head><p>We deployed the above described models to collect information about swallow and martin roosts using data from 12 radar stations in the Great Lakes region (See Fig. <ref type="figure">2</ref>). The radar stations include KAPX, KBUF, KCLE, KDLH, KDTX, KGRB, KGRR, KIWX, KLOT, KMKX, KMQT, and KTYX. We use this deployment experiment to verify the effectiveness of the AI system for studying roosts and as a baseline for further improving the AI system.</p><p>Swallow and martin roosts are usually detected by radars during their early morning dispersal, from mid-June to mid-September, after they finished breeding and before starting their southwards migration. Previous studies suggest that radar signatures found during this time period in North America are most likely the result of a Hirundinidae roost dispersal <ref type="bibr">Bridge et al. (2015)</ref>; <ref type="bibr">Kelly and Pletschet (2017)</ref>. We thus downloaded scans from 30 min before to 90 min after local sunrise between June 1 and October 31 of the 21 years ranging from 2000 to 2020. Each station's local sunrise times is obtained using the PyEphem <ref type="bibr">(Rhodes, 2011)</ref> and pytz Python packages. In each 120-min window, we set 41 reference times spaced 3 min apart; we selected the scan closest to each reference time to render (Radar Data) and process, or none if there were no scans within 3 min of the time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>User interface and screening</head><p>We built a web-based interface to display the remaining tracks together with the underlying radar imagery for further screening by humans (Fig. <ref type="figure">3</ref>). Any track with at least 2 detections, an average detection score of at least 0.15, and at least 1 detection with a score at least 0.5 was given the initial label of roost, considered "high-confidence tracks," and displayed with full opacity; other tracks were given the initial label of non-roost, considered "low-confidence tracks", and displayed faintly with low opacity. Our biologist teammates screened the system predictions, classifying predicted tracks into 7 categories. We removed from the analysis scans that had more than half of its pixels filled with weather or with the effects of anomalous propagation (which occurs when the radar beam bends toward the ground and captures ground clutter). In these scans, we would not be able to identify roosts if they were present. In other scans, clear roosts were labeled as roost; roosts contaminated by weather, anomalous propagation, and unknown noise were labeled as weather-roost, ap-roost, and unknown-noise-roost; duplicated or incomplete tracks were labeled as duplicate and bad-track.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Roost detection with temporal information</head><p>Many studies <ref type="bibr">(Ribani &amp; Marengoni, 2019;</ref><ref type="bibr">Yosinski et al., 2014;</ref><ref type="bibr">Zhuang et al., 2019)</ref> have shown that pretrained networks lead to faster convergence and greater robustness to hyperparameter settings when training object detectors. However, to use networks pretrained on three-channel (RGB) images, we must account for the domain shift and the different number of channels available due to various radar products, elevation angles, and sweeps from temporally adjacent scans. We propose a learnable adaptor, which maps an arbitrary k -channel array x &#8712; R k&#194;n&#194;m to a 3 -channel image (see Fig. <ref type="figure">4</ref>). This is implemented as a single convolutional layer with three filters of size k&#215;1&#215;1, each of which implements a learned linear mapping from R k to R 1 . While other choices are possible, including a nonlinear adaptor and replicating filters in the first layer of the network to match the shape   of an input with more channels, our preliminary study suggested that linear adaptors are the most effective, see <ref type="bibr">Perez and Maji (2022)</ref> for a study on how the architecture of adaptor affects transfer learning. Also, see Section E.2 and Table E.5b for an evaluation of the benefits of pretraining.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Biology case studies</head><p>To demonstrate the types of ecological questions that could be answered with our machine learning pipeline, we selected a dense subset of tracks found in the Walpole Island First Nation Reserve. This region is within the ranges of both the KDTX and KCLE stations, which could result in the same roost being detected twice. For the purposes of this paper, we selected tracks captured only by KCLE. After manually screening the detections to remove contaminants, as described above, we used the bounding boxes estimated by the model to extract the raw reflectivity factor values from Level II radar data in polar coordinates (the location of each voxel or sampling volume is defined by its range and azimuth) across all elevation angles of each full scan.</p><p>The number of birds within each bounding box can be estimated using the approach established by <ref type="bibr">Chilson et al. (2012)</ref>. This method relies on two assumptions: (1) that each bounding box only contains biological scatterers and ( <ref type="formula">2</ref>) that the birds are uniformly distributed within each radar sampling volume. For our analysis, we filtered only roosts labeled as clear roosts (593 tracks from our case study region), thus minimizing the chance that the two assumptions were violated. We convert equivalent reflectivity factor Z e , originally in dBZ, to linear scale (mm 6 mm &#192;3 ), and then transform it into reflectivity (&#951;), which we interpret as the density of scatterers in the atmosphere (in cm 2 km &#192;3 ). We can then multiply the reflectivity measurement at each azimuth and interval by the theoretical volume sampled by the radar at that range. Finally, we divide the result by the specific radar cross section (RCS) of the bird species assumed to be found in the roost, thus obtaining an estimate of the number of birds.</p><p>To obtain conservative estimates, we used Purple Martins as our benchmark average radar cross section, since they are the largest species of Hirundinidae found in North America. The average RCS of Purple Martins can be obtained from their mass -51 g, see <ref type="bibr">(Dunning, 2008)</ref> adopting the relationship between mass and RCS proposed by Horton et al. <ref type="bibr">(Horton et al., 2019)</ref>: log (RCS) = 0.699 &#215; log (mass). Bank Swallows (Riparia riparia), possibly the smallest species to participate in such aggregations, would yield count estimates approximately 2.6 times higher, since their mass is 13 g according to <ref type="bibr">Dunning (2008)</ref>. The volume sampled by the radar is assumed to be shaped like a truncated cone with axis aligned with the antenna's peak power axis, cut by two parallel planes at each range gate. For data before the Super Resolution upgrade, the cone's apex angle was assumed to be 1&#176;. After the 2007-2008 upgrade, we assumed an elliptical cone of 1&#176;vertical beam width and 0.5&#176;horizontal beam width <ref type="bibr">(Torres, 2007)</ref>.</p><p>We extracted the number of birds from all sweeps available in each radar scan. In a post-processing stage, we filtered the sweeps that had height lower than 5000 m within the 150 km radius of each station. In order to capture the entire 3D structure of the roost departure, grouped the sweeps in bins of 1&#176;interval according to their elevation angle, and we calculated the mean count within each bin. Finally, we summed the estimates from each elevation bin to get the bird count of each roost. This procedure aims to avoid double counting birds due to sweeps where the radar beam of 1&#176;beamwidth would sample the same area of the airspace twice (that is, if the variation in elevation between two consecutive sweeps is lower than 1&#176;).</p><p>To obtain a single estimate per roost dispersal, we further summarized the bird counts across detections by taking the mean estimate per track. We then explored phenological trends at the spatial scale of this roost, which consistently occurs from 2000 to 2020. Data were considered as missing on days when scans had more than 50% weather contamination, intense anomalous propagation, or when the sampling window was shorter than 100 min. Days without detections were considered as true zeros. To calculate the daily bird count within the roost for each day, we derived the maximum number of birds for each roost track. We then fit a generalized additive model (GAM) to each roost-year to model the roosting activity throughout a roosting season. We constructed GAMs with daily estimates of the number of birds as the response variable and ordinal date as the independent variable with the smoothing parameter k set to 5 using a quasi-Poisson distribution. We used this model construction to predict estimates throughout the season and selected the 50% passage date, i.e., the first date in which the cumulative predicted estimates of number of birds exceeded half of the yearly total, as our phenology estimate for that year.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head><p>We report the results of our deployed AI system ( &#167; 3.1), the improvements of our detection model with temporal information ( &#167; 3.2), and biological case studies ( &#167; 3.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>System deployment</head><p>Here we report the results of the deployment experiment described in AI-Assisted System: Development and Deployment. Table <ref type="table">1</ref> shows statistics for the results of the automated steps. There were six station-years for which data were not available or could not be rendered: the year 2000 for KDLH and KGRB and the years 2001-2004 for KTYX. For the remaining station-years, 612 786 scans were successfully rendered in total. The automated annotation steps predicted 31 313 "high-confidence tracks" assembled from 140 036 single-scan detections and 230 088 "low-confidence tracks" with 372 594 single-scans detections.</p><p>Table <ref type="table">2</ref> shows the statistics for human-screened results. After human screening of machine predictions, we identified 13 860 clean roost tracks, 477 roost tracks with minor weather contamination, 100 roost tracks with minor anomalous propagation, and 1191 with minor unknown noise. These four categories together produced 15 628 roost tracks (64 620 detections) that can be used for ornithology research. Other machine-predicted tracks are considered false-positive predictions.</p><p>Among the "high-confidence" machine-predicted tracks, 12 025 tracks (56 354 detections) were marked as one of the four roost categories. Among the "low-confidence" tracks that are by default displayed as "non-roost"s, only 3603 tracks (8266 detections) out of 230 088 tracks (372 594 detections) needed to be marked as roosts; the remaining do not require screeners to change their labels.</p><p>Table <ref type="table">1</ref>. Statistics for rendered scans, system predictions, and time needed to screen the predictions. System-predicted detections and tracks are either of high or of low confidence and displayed as roost or non-roost by default in the screening interface; see AI-Assisted System: Development and Deployment for details. The bounding boxes are shown with high and low opacity, respectively, to ease screening. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Station</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Savings in human labeling efforts</head><p>Our AI system reduces human labor to screening system predictions and enables tractable annotation to obtain high-quality roost data. The screening amounts to 183.6 annotator hours and an average of 17.6 seconds per radar-station day (see Table <ref type="table">1</ref>). If the annotators were to find roosts, draw bounding boxes for all roosts, and assemble them into tracks completely manually, the annotation process would be significantly more time consuming and error prone. For instance, we estimate a reduction from 31.4 weeks to 4.6 weeks of full-time work at 40 hours/week of total manual annotation (1254.6 &#8594; 183.6 h, a 6.8&#215; saving) considering the same amount of screened days (from June to October) and the 246 season-years from the Great Lakes (See Table <ref type="table">1</ref>). We calculate the total time of the annotation process from scratch (i.e., 1254.6 h) using a measured annotation time of 1 hour per station-month (120 seconds per day) and extrapolating to the complete screening period of our deployed system (153 days of 246 station-years).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Roost detection using temporal information</head><p>A significant technical advance was to incorporate temporal information from past frames to capture roost dynamics. When viewed in a size-constrained window rain can take many different shapes, including the appearance of a ring-shaped roost. However, the movement of weather is different from roostsrain often moves in a straight trajectory, while roosts diverge from a point. Temporal information reduces false positives due to small patches of rain with a roost-like appearance (see Fig. The best-performing temporal model overall included the adaptor layer and used as input 3 consecutive time frames, x t&#192;2 , x t&#192;1 , and x t , with three input channels per frame: reflectivity at 0.5&#176;, reflectivity at 1.5&#176;, and radial velocity at 0.5&#176;(see Fig. <ref type="figure">4</ref>). These 9 total input channels were rendered at size 1100 &#215; 1100 as the inputs to a Faster R-CNN detection model with linear adaptor and a pretrained ResNet101-FPN backbone, then trained for more than 40 k iterations with a batch size of 4 samples, see Appendix E for ablation of the roost detector experiments and Appendix F for qualitative results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Further savings in human efforts</head><p>We perform the manual screening of the station-year KTYX 2020 as a pilot study to calculate the time savings when using the predictions of the temporal detector. We were able to reduce the time needed to extract phenology information from 1.17 h to 30 min, suggesting a further labeling effort reduction of 2.3&#215; with the inclusion of temporal information to the pipeline.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Biology case studies results</head><p>Figure <ref type="figure">5A</ref> shows the locations of the first detection in each of the 702 tracks, colored by year, in the study area at Walpole Island around Lake Saint Claire. Martins and swallows gathered in this region to roost every year from 2000 to 2021, with an average of approximately 33 days of detections per year. The yearly maximum number of days when birds were detected occurred in 2019, when birds congregated around the lake on 42 days between July 23 and September 8.</p><p>The peak timing (50% passage date) for roosting activity in the study area was 20 August 2000, and 21 August 2020 (Fig. <ref type="figure">5B</ref> and <ref type="figure">C</ref>). The average peak number of birds per year detected in the region throughout our study period was 73 925 birds (SD = 37 743). The year when the roost received the highest number of birds occurred on 2010, when our estimates of number of birds reached 156 864 on August 8th. In contrast, the year with lowest peak estimate was 2013, when we detected at most 23 949 birds on August 9.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>False positives due to noisy evaluation data Future research can focus on improving an AI system through careful error analysis. Figure <ref type="figure">6B</ref> shows the distribution of top-scoring false positives (detection with a score of 0.5 or higher) for the temporal detector. Localization errors (0 \ IoU \ 0.5) account for nearly 44% of the false positives, while detections on background regions (IoU = 0), frequently due to weather, static structures, or anomalous propagation, account for the remaining 56%. Figure <ref type="figure">6A</ref> shows the precision-recall curve by varying the overlap threshold at which a detection is considered a true positive. By reducing the IoU, the performance of the detector improves from 56.3% AP IoU&#188;:50 to 66.4% AP IoU&#188;:40 to 73.5% AP IoU&#188;:20 , while still producing useful roost detections. See examples of localization errors in Figure <ref type="figure">7A-C</ref>, missed roosts in Figure <ref type="figure">7D</ref>, and detections on background regions in Figure <ref type="figure">7E</ref>.</p><p>Another source of false positives are missing roost annotations. We perform a manual inspection of all nonoverlapping false positives (IoU = 0 with any annotations) at a recall of 20% and find 94% of these to be actual roosts, thus increasing the precision of our detector from 88% to 92%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sources of false negatives</head><p>We found several sources of false negatives, which are roosts that were annotated by researchers who originally created our training dataset <ref type="bibr">(Laughlin et al., 2016)</ref> but missed by our system. The first source is radar scans that contained roosts together with large amounts of other noise such as anomalous propagation, ground clutter, or other biological scatterers. Figure <ref type="figure">8A</ref> shows an example of three consecutive frames with a single annotated roost (green) amidst anomalous propagation and ground clutter with reflectivity values so high that the model is unable to discriminate the roost from the background. In this case, the human labeler likely relied on contextual  High-scoring detections on rain show a similar morphology as roosts in a single frame, but the temporal model is able to reduce many of these false positives. We show human annotations in green and predictions by the model in red.</p><p>information such as the presence of roosts in the same location across many frames and days (see more examples of noisy roosts in Figure <ref type="figure">7D</ref>). Also note that the model successfully detects roosts in radar scans with anomalous propagation when the reflectivity values are not so high (see examples in Another source of false negatives is roosts that lacked the prototypical ring shape. Figure <ref type="figure">8B</ref> shows annotations for a non-ring-shaped roost in three consecutive frames. These are encountered rarely in the dataset and the morphology can easily be confused with weather and other biology (see more examples of roosts without ring shape in Fig. <ref type="figure">7D</ref>). Figure F.2c shows an example where the model is able to detect a roost without the usual ring shape; however, there is a higher correspondence between consecutive frames and a more consistent diverging pattern compared to the example in Figure <ref type="figure">8B</ref>. Detection of non-ring-shaped roosts could be complicated by the fact that annotators were required to use a circle to annotate roosts, and often adopted different labeling styles, especially for how to label a roost that was not ring shaped <ref type="bibr">(Cheng et al., 2020)</ref>.</p><p>One more source of false negatives is bird roosts that appear too close to each other in the radar scan. As shown in Figure <ref type="figure">8C</ref>, the model detects only one of the two annotated roosts. Again, the human annotators may have used complex contextual information to interpret these overlapping patterns as two different roosts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Future research for AI-assisted roost detection</head><p>There are a number of future directions to improve an AI-assisted system for roost measurement. Iterative refinements to an existing model can often improve performance significantly over time, especially when driven by a careful error analysis. Adding new sources of information, including new input features, training signals, or training labels, together with data curation efforts to improve the quality of training data, are among the most promising ways to improve performance. In the specific context of our work, adding the results of human-screened predictions back into the model as training data may provide a significant boost in model performance to scale our model to larger geographical regions with less human effort.</p><p>Future research can incorporate additional sources of information related to the contextual cues that humans use to detect roosts and discriminate them from other patterns in radar images. One such cue is the persistence of roosts in the same or similar locations across days and years, which may allow humans to confidently detect roosts in noisy radar scans including high-reflectivity anomalous propagation, weather, or other biological scatterers. A model could be given input features from radar scans in previous days or years, similar to the way we added within-day temporal information in this paper; or, it could be given historical detections of a simpler roost detection model <ref type="bibr">(Zhou et al., 2020)</ref>. We found that rain is a persistent source of false positives, even though humans have a relatively easy time distinguishing rain from roosts using the full context of a radar image sequence, and AI models can successfully discriminate rain from broad-scale bird migration <ref type="bibr">(Lin et al., 2019)</ref>. There may be a mismatch between the "local" scale of an object detection model and the broader scope of contextual information needed to discriminate rain. A possible remedy is to train a multihead model to jointly perform roost detection together with another task such as rain segmentation <ref type="bibr">(He et al., 2017;</ref><ref type="bibr">Kirillov et al., 2018;</ref><ref type="bibr">Li et al., 2022;</ref><ref type="bibr">Shen et al., 2021)</ref>, or provide the predictions of a rain segmentation model <ref type="bibr">(Lin et al., 2019)</ref> as input to the roost detection model.</p><p>Ultimately, the goal is to collect roost measurements from a very large but finite set of images. Future research should focus on effective ways to combine human effort, computational effort, AI training, and statistical estimation to achieve the desired scientific outcomes. For example, what is the tolerance of measurements of swallow phenology or population declines to an AI system with a certain performance level? Is human screening of outputs required? What are the most effective strategies for interleaving human annotation and model training to analyze a very large image data set? An interesting research direction is to pose the scientific question (e.g., how many roosts) as a statistical estimation problem and to consider statistical estimators that can give high confidence bounds after examining only a subset of the images <ref type="bibr">(Meng et al., 2021;</ref><ref type="bibr">Perez et al., 2023)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Broader directions for biological recognition</head><p>There are several broader directions in the recognition of biological patterns in radar data that can be informed by our work. Our roost-detection model can serve as a starting point for AI models to detect and track related taxa-specific biological phenomena in radar data, including bat roosts <ref type="bibr">(Stepanian &amp; Wainwright, 2018)</ref>, roosts of non-swallow bird species (Van Den Broeke, 2019), mayfly hatches <ref type="bibr">(Stepanian et al., 2020), and</ref><ref type="bibr">waterfowl (O'Neal et al., 2010)</ref>. Of these, bat roosts are most similar to swallow roosts and may require the least adaptation to the model. As a proof of concept, we deploy our best model found in Section E.2 on bat roosts around the KEWX radar station in Texas. We process scans from 90 min before local sunset to 150 min after the sunset in June 2012. We observe that our system developed using bird roost data performs reasonably well at detecting and tracking bats without any customization (Fig. <ref type="figure">9</ref>). Roosts of other bird species including robins, blackbirds, starlings, and waterfowl are also visible on radar <ref type="bibr">(Russell et al., 1998)</ref>, but are often less obvious for humans to discern and usually lack the distinct "expanding ring" pattern of swallow roosts, probably due to differences in roost emergence behavior. Mayfly hatches have a distinct and rather different appearance than bird roosts. The automatic detection of these phenomena is an interesting frontier for AI methods in radar aeroecology. Investigating whether it is possible to distinguish among roosts of different swallow species, e.g., Purple Martins and Tree Swallows, from fine-grained radar characteristics is an interesting open question that could have important biological implications. This task could be potentially accomplished, for example, by pairing radar detections from our system with records from large-scale citizen science datasets. Finally, extending these models to radar networks outside the US could provide information to track bird species beyond national borders.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>Average precision is an evaluation metric commonly used in the visual object detection literature in computer vision and described in the Evaluation paragraph of Appendix B.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>&#170; 2024 The Authors. Remote Sensing in Ecology and Conservation published by John Wiley &amp; Sons Ltd on behalf of Zoological Society of London.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>20563485, 2024, 5, Downloaded from https://zslpublications.onlinelibrary.wiley.com/doi/10.1002/rse2.388, Wiley Online Library on [01/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_3"><p>https://github.com/darkecology/roost-dataset.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_4"><p>https://github.com/darkecology/roost-system/tree/ fd2530fa8dba59a976da815c43dff9da8ad8e09d.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_5"><p>20563485, 2024, 5, Downloaded from https://zslpublications.onlinelibrary.wiley.com/doi/10.1002/rse2.388, Wiley Online Library on [01/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License &#170; 2024 The Authors. Remote Sensing in Ecology and Conservation published by John Wiley &amp; Sons Ltd on behalf of Zoological Society of London.</p></note>
		</body>
		</text>
</TEI>
