skip to main content


Title: A Historical and Policy Perspective on Broadening Participation in STEM: Insights from National Reports (1974-2016)
Over the last 40 years, more than 25 national reports have been published focused on broadening participation in science, technology, engineering and mathematics (STEM). Although scholarly literature oftentimes serves as one source of information on how to move forward, national reports—produced by organizations, such as the National Academy of Engineering (NAE) and the National Society of Black Engineers (NSBE), and committees, such as the Committee on Women in Science and Engineering (CWSE)—are an underutilized source of insights. This paper presents the results of a quasi-umbrella review of 29 national reports published during 1974–2016. The reports in this analysis included 134 unique recommendations, which were synthesized into four themes, broadly labeled: (1) Practices & Policies, (2) Culture & Climate, (3) Information & Knowledge, and (4) Investments & Commitments. These recommendations have implications for a wide range of stakeholders interested in addressing this longstanding problem, and the findings of this study provide a historical and policy perspective that is useful for informing next steps that will ideally lead to the forms of progress that have been long awaited.  more » « less
Award ID(s):
1647327
NSF-PAR ID:
10080892
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2018 CoNECD - The Collaborative Network for Engineering and Computing Diversity Conference
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do not have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA. 
    more » « less
  2. Background: Recent reports have raised concern about the risk of vessel wall injury (VWI) when pulling out current laser-cut stent retrievers with active strut apposition to the vessel walls. Development of braided stroke thrombectomy-assist devices for use in conjunction with aspiration systems may be gentler in the internal carotid (ICA) and basilar vessels (with regards to radial force) compared to existing laser-cut stent retrievers. Methods: Radial force (RF) bench testing was performed using a radial compression station (Blockwise Engineering, Phoenix, AZ). The average total radial force (RF) in Newtons (N) generated (average of 3 readings) in vessel diameters (d) (Range 3.25 to 4.00mm) seen in proximal LVOs of the anterior circulation (such as in the internal carotid artery - ICA), and vessel diameters (d) (Range 2.50 to 3.25mm) seen in the posterior circulation (such as in the basilar artery - BA) was measured. The Solitaire Platinum Revascularization Device (Medtronic, Irvine, CA) was used as the predicate device. All thrombectomy and thrombectomy-assist devices were compared in terms of the RF being higher or lower (%) to the predicate device. Results: The results of the radial force testing are shown in the table below. The total radial force (RF) of the SHELTER® Retriever (part of Insera System, Insera Therapeutics, Inc., Dallas, TX), a braided thrombectomy-assist device is significantly lower (@ d=2.5mm: 58%) than the predicate device (@ d=2.5mm: 100%) and other laser-cut stent retrievers (@ d=2.5mm: 103% to 152%). Thrombectomy devices with lower OD had higher radial forces than larger devices. Conclusion: Novel braided stroke thrombectomy-assist devices for use in conjunction with aspiration systems have lower radial force compared to existing laser-cut stent retrievers in the ICA and BA vessel diameters. Further studies in-vivo need to assess the impact of lower radial force on minimizing VWI. Funding Source: This study was funded in part by a research grant (NSF Award: 1819491; PI: Vallabh Janardhan, MD) from the National Science Foundation (NSF). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Conference Proceeding: This paper was presented in part at the 2018 Annual Meeting of the Society of Vascular & Interventional Neurology (SVIN), November 14-17, 2018 in San Diego, CA 
    more » « less
  3. Nicewonger, Todd E. ; McNair, Lisa D. ; Fritz, Stacey (Ed.)
    https://pressbooks.lib.vt.edu/alaskanative/ At the start of the pandemic, the editors of this annotated bibliography initiated a remote (i.e., largely virtual) ethnographic research project that investigated how COVID-19 was impacting off-site modular construction practices in Alaska Native communities. Many of these communities are located off the road system and thus face not only dramatically higher costs but multiple logistical challenges in securing licensed tradesmen and construction crews and in shipping building supplies and equipment to their communities. These barriers, as well as the region’s long winters and short building seasons, complicate the construction of homes and related infrastructure projects. Historically, these communities have also grappled with inadequate housing, including severe overcrowding and poor-quality building stock that is rarely designed for northern Alaska’s climate (Marino 2015). Moreover, state and federal bureaucracies and their associated funding opportunities often further complicate home building by failing to accommodate the digital divide in rural Alaska and the cultural values and practices of Native communities.[1] It is not surprising, then, that as we were conducting fieldwork for this project, we began hearing stories about these issues and about how the restrictions caused by the pandemic were further exacerbating them. Amidst these stories, we learned about how modular home construction was being imagined as a possible means for addressing both the complications caused by the pandemic and the need for housing in the region (McKinstry 2021). As a result, we began to investigate how modular construction practices were figuring into emergent responses to housing needs in Alaska communities. We soon realized that we needed to broaden our focus to capture a variety of prefabricated building methods that are often colloquially or idiomatically referred to as “modular.” This included a range of prefabricated building systems (e.g., manufactured, volumetric modular, system-built, and Quonset huts and other reused military buildings[2]). Our further questions about prefabricated housing in the region became the basis for this annotated bibliography. Thus, while this bibliography is one of multiple methods used to investigate these issues, it played a significant role in guiding our research and helped us bring together the diverse perspectives we were hearing from our interviews with building experts in the region and the wider debates that were circulating in the media and, to a lesser degree, in academia. The actual research for each of three sections was carried out by graduate students Lauren Criss-Carboy and Laura Supple.[3] They worked with us to identify source materials and their hard work led to the team identifying three themes that cover intersecting topics related to housing security in Alaska during the pandemic. The source materials collected in these sections can be used in a variety of ways depending on what readers are interested in exploring, including insights into debates on housing security in the region as the pandemic was unfolding (2021-2022). The bibliography can also be used as a tool for thinking about the relational aspects of these themes or the diversity of ways in which information on housing was circulating during the pandemic (and the implications that may have had on community well-being and preparedness). That said, this bibliography is not a comprehensive analysis. Instead, by bringing these three sections together with one another to provide a snapshot of what was happening at that time, it provides a critical jumping off point for scholars working on these issues. The first section focuses on how modular housing figured into pandemic responses to housing needs. In exploring this issue, author Laura Supple attends to both state and national perspectives as part of a broader effort to situate Alaska issues with modular housing in relation to wider national trends. This led to the identification of multiple kinds of literature, ranging from published articles to publicly circulated memos, blog posts, and presentations. These materials are important source materials that will likely fade in the vastness of the Internet and thus may help provide researchers with specific insights into how off-site modular construction was used – and perhaps hyped – to address pandemic concerns over housing, which in turn may raise wider questions about how networks, institutions, and historical experiences with modular construction are organized and positioned to respond to major societal disruptions like the pandemic. As Supple pointed out, most of the material identified in this review speaks to national issues and only a scattering of examples was identified that reflect on the Alaskan context. The second section gathers a diverse set of communications exploring housing security and homelessness in the region. The lack of adequate, healthy housing in remote Alaska communities, often referred to as Alaska’s housing crisis, is well-documented and preceded the pandemic (Guy 2020). As the pandemic unfolded, journalists and other writers reported on the immense stress that was placed on already taxed housing resources in these communities (Smith 2020; Lerner 2021). The resulting picture led the editors to describe in their work how housing security in the region exists along a spectrum that includes poor quality housing as well as various forms of houselessness including, particularly relevant for the context, “hidden homelessness” (Hope 2020; Rogers 2020). The term houseless is a revised notion of homelessness because it captures a richer array of both permanent and temporary forms of housing precarity that people may experience in a region (Christensen et al. 2107). By identifying sources that reflect on the multiple forms of housing insecurity that people were facing, this section highlights the forms of disparity that complicated pandemic responses. Moreover, this section underscores ingenuity (Graham 2019; Smith 2020; Jason and Fashant 2021) that people on the ground used to address the needs of their communities. The third section provides a snapshot from the first year of the pandemic into how CARES Act funds were allocated to Native Alaska communities and used to address housing security. This subject was extremely complicated in Alaska due to the existence of for-profit Alaska Native Corporations and disputes over eligibility for the funds impacted disbursements nationwide. The resources in this section cover that dispute, impacts of the pandemic on housing security, and efforts to use the funds for housing as well as barriers Alaska communities faced trying to secure and use the funds. In summary, this annotated bibliography provides an overview of what was happening, in real time, during the pandemic around a specific topic: housing security in largely remote Alaska Native communities. The media used by housing specialists to communicate the issues discussed here are diverse, ranging from news reports to podcasts and from blogs to journal articles. This diversity speaks to the multiple ways in which information was circulating on housing at a time when the nightly news and radio broadcasts focused heavily on national and state health updates and policy developments. Finding these materials took time, and we share them here because they illustrate why attention to housing security issues is critical for addressing crises like the pandemic. For instance, one theme that emerged out of a recent National Science Foundation workshop on COVID research in the North NSF Conference[4] was that Indigenous communities are not only recovering from the pandemic but also evaluating lessons learned to better prepare for the next one, and resilience will depend significantly on more—and more adaptable—infrastructure and greater housing security. 
    more » « less
  4. Obeid, Iyad ; Picone, Joseph ; Selesnick, Ivan (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing a large open source database of high-resolution digital pathology images known as the Temple University Digital Pathology Corpus (TUDP) [1]. Our long-term goal is to release one million images. We expect to release the first 100,000 image corpus by December 2020. The data is being acquired at the Department of Pathology at Temple University Hospital (TUH) using a Leica Biosystems Aperio AT2 scanner [2] and consists entirely of clinical pathology images. More information about the data and the project can be found in Shawki et al. [3]. We currently have a National Science Foundation (NSF) planning grant [4] to explore how best the community can leverage this resource. One goal of this poster presentation is to stimulate community-wide discussions about this project and determine how this valuable resource can best meet the needs of the public. The computing infrastructure required to support this database is extensive [5] and includes two HIPAA-secure computer networks, dual petabyte file servers, and Aperio’s eSlide Manager (eSM) software [6]. We currently have digitized over 50,000 slides from 2,846 patients and 2,942 clinical cases. There is an average of 12.4 slides per patient and 10.5 slides per case with one report per case. The data is organized by tissue type as shown below: Filenames: tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_0a001_00123456_lvl0001_s000.svs tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_00123456.docx Explanation: tudp: root directory of the corpus v1.0.0: version number of the release svs: the image data type gastro: the type of tissue 000001: six-digit sequence number used to control directory complexity 00123456: 8-digit patient MRN 2015_03_05: the date the specimen was captured 0s15_12345: the clinical case name 0s15_12345_0a001_00123456_lvl0001_s000.svs: the actual image filename consisting of a repeat of the case name, a site code (e.g., 0a001), the type and depth of the cut (e.g., lvl0001) and a token number (e.g., s000) 0s15_12345_00123456.docx: the filename for the corresponding case report We currently recognize fifteen tissue types in the first installment of the corpus. The raw image data is stored in Aperio’s “.svs” format, which is a multi-layered compressed JPEG format [3,7]. Pathology reports containing a summary of how a pathologist interpreted the slide are also provided in a flat text file format. A more complete summary of the demographics of this pilot corpus will be presented at the conference. Another goal of this poster presentation is to share our experiences with the larger community since many of these details have not been adequately documented in scientific publications. There are quite a few obstacles in collecting this data that have slowed down the process and need to be discussed publicly. Our backlog of slides dates back to 1997, meaning there are a lot that need to be sifted through and discarded for peeling or cracking. Additionally, during scanning a slide can get stuck, stalling a scan session for hours, resulting in a significant loss of productivity. Over the past two years, we have accumulated significant experience with how to scan a diverse inventory of slides using the Aperio AT2 high-volume scanner. We have been working closely with the vendor to resolve many problems associated with the use of this scanner for research purposes. This scanning project began in January of 2018 when the scanner was first installed. The scanning process was slow at first since there was a learning curve with how the scanner worked and how to obtain samples from the hospital. From its start date until May of 2019 ~20,000 slides we scanned. In the past 6 months from May to November we have tripled that number and how hold ~60,000 slides in our database. This dramatic increase in productivity was due to additional undergraduate staff members and an emphasis on efficient workflow. The Aperio AT2 scans 400 slides a day, requiring at least eight hours of scan time. The efficiency of these scans can vary greatly. When our team first started, approximately 5% of slides failed the scanning process due to focal point errors. We have been able to reduce that to 1% through a variety of means: (1) best practices regarding daily and monthly recalibrations, (2) tweaking the software such as the tissue finder parameter settings, and (3) experience with how to clean and prep slides so they scan properly. Nevertheless, this is not a completely automated process, making it very difficult to reach our production targets. With a staff of three undergraduate workers spending a total of 30 hours per week, we find it difficult to scan more than 2,000 slides per week using a single scanner (400 slides per night x 5 nights per week). The main limitation in achieving this level of production is the lack of a completely automated scanning process, it takes a couple of hours to sort, clean and load slides. We have streamlined all other aspects of the workflow required to database the scanned slides so that there are no additional bottlenecks. To bridge the gap between hospital operations and research, we are using Aperio’s eSM software. Our goal is to provide pathologists access to high quality digital images of their patients’ slides. eSM is a secure website that holds the images with their metadata labels, patient report, and path to where the image is located on our file server. Although eSM includes significant infrastructure to import slides into the database using barcodes, TUH does not currently support barcode use. Therefore, we manage the data using a mixture of Python scripts and manual import functions available in eSM. The database and associated tools are based on proprietary formats developed by Aperio, making this another important point of community-wide discussion on how best to disseminate such information. Our near-term goal for the TUDP Corpus is to release 100,000 slides by December 2020. We hope to continue data collection over the next decade until we reach one million slides. We are creating two pilot corpora using the first 50,000 slides we have collected. The first corpus consists of 500 slides with a marker stain and another 500 without it. This set was designed to let people debug their basic deep learning processing flow on these high-resolution images. We discuss our preliminary experiments on this corpus and the challenges in processing these high-resolution images using deep learning in [3]. We are able to achieve a mean sensitivity of 99.0% for slides with pen marks, and 98.9% for slides without marks, using a multistage deep learning algorithm. While this dataset was very useful in initial debugging, we are in the midst of creating a new, more challenging pilot corpus using actual tissue samples annotated by experts. The task will be to detect ductal carcinoma (DCIS) or invasive breast cancer tissue. There will be approximately 1,000 images per class in this corpus. Based on the number of features annotated, we can train on a two class problem of DCIS or benign, or increase the difficulty by increasing the classes to include DCIS, benign, stroma, pink tissue, non-neoplastic etc. Those interested in the corpus or in participating in community-wide discussions should join our listserv, nedc_tuh_dpath@googlegroups.com, to be kept informed of the latest developments in this project. You can learn more from our project website: https://www.isip.piconepress.com/projects/nsf_dpath. 
    more » « less
  5. Abstract This paper describes a dataset mined from the public archive (1999–2020) of the US National Incident Management System Incident Status Summary (ICS-209) forms (a total of 187,160 reports for 35,170 incidents, including 34,478 wildland fires). This system captures detailed daily/regular information on incident development and response, including social and economic impacts. Most (98.4%) reports are wildland fire-related, with other incident types including hurricane, hazardous materials, flood, tornado, search and rescue, civil unrest, and winter storms. The archive, although publicly available, has been difficult to use for research due to multiple record formats, inconsistent data entry, and no clean pathway from individual reports to high-level incident analysis. Here, we describe the open-source, reproducible methods used to produce a science-grade version of the data, including formal connections made to other published wildland fire data products. Among other applications, this integrated and spatially augmented dataset enables exploration of the daily progression of the most costly, damaging, and deadly environmental-hazard events in recent US history. 
    more » « less