skip to main content


Title: NSEC 2015 Conference Proceedings
This is the 2015 NSEC National Conference Program Guide with abstracts and presentations held on June 3-4, 2015. Keynote speaker was Susan R. Singer, Division Director for Undergraduate Education at the National Science Foundation and the Laurence McKinley Gould Professor, in the Biology and Cognitive Science Departments at Carleton College.  more » « less
Award ID(s):
1524832
NSF-PAR ID:
10302912
Author(s) / Creator(s):
;
Editor(s):
Redd, Kacy; Finkelstein, Noah
Date Published:
Journal Name:
Network of STEM Education Centers
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This dataset incorporates Mexico City related essential data files associated with Beth Tellman's dissertation: Mapping and Modeling Illicit and Clandestine Drivers of Land Use Change: Urban Expansion in Mexico City and Deforestation in Central America. It contains spatio-temporal datasets covering three domains; i) urban expansion from 1992-2015, ii) district and section electoral records for 6 elections from 2000-2015, iii) land titling (regularization) data for informal settlements from 1997-2012 on private and ejido land. The urban expansion data includes 30m resolution urban land cover for 1992 and 2013 (methods published in Goldblatt et al 2018), and a shapefile of digitized urban informal expansion in conservation land from 2000-2015 using the Worldview-2 satellite. The electoral records include shapefiles with the geospatial boundaries of electoral districts and sections for each election, and .csv files of the number of votes per party for mayoral, delegate, and legislature candidates. The private land titling data includes the approximate (in coordinates) location and date of titles given by the city government (DGRT) extracted from public records (Diario Oficial) from 1997-2012. The titling data on ejido land includes a shapefile of georeferenced polygons taken from photos in the CORETT office or ejido land that has been expropriated by the government, and including an accompany .csv from the National Agrarian Registry detailing the date and reason for expropriation from 1987-2007. Further details are provided in the dissertation and subsequent article publication (Tellman et al 2021). The Mexico City portion of these data were generated via a National Science Foundation sponsored project (No. 1657773, DDRI: Mapping and Modeling Clandestine Drivers of Urban Expansion in Mexico City). The project P.I. is Beth Tellman with collaborators at ASU (B.L Turner II and Hallie Eakin). Other collaborators include the National Autonomous University of Mexico (UNAM), at the Institute of Geography via Dr. Armando Peralta Higuera, who provided support for two students, Juan Alberto Guerra Moreno and Kimberly Mendez Gomez for validating the Landsat urbanization algorithm. Fidel Serrano-Candela, at the UNAM Laboratory of the National Laboratory for Sustainability Sciences (LANCIS) also provided support for urbanization algorithm development and validation, and Rodrigo Garcia Herrera, who provided support for hosting data at LANCIS (at: http://patung.lancis.ecologia.unam.mx/tellman/). Additional collaborators include Enrique Castelán, who provided support for the informal urbanization data from SEDEMA (Ministry of the Environmental for Mexico City). Electoral, land titling, and land zoning data were digitized with support from Juana Martinez, Natalia Hernandez, Alexia Macario Sanchez, Enrique Ruiz Durazo, in collaboration with Felipe de Alba, at CESOP (Center of Social Studies and Public Opinion, at the Mexican Legislative Assembly). The data include geospatial time series data regarding changes in urban land cover, digitized electoral results, land titling, land zoning, and public housing. Additional funding for this work was provided by NSF under Grant No. 1414052, CNH: The Dynamics of Multiscalar Adaptation in Megacities (PI H. Eakin), and the NSF-CONACYT GROW fellowship NSF No. 026257-001 and CONACYT number 291303 (PI Bojórquez). References: Tellman, B., Eakin, H., Janssen, M.A., Alba, F. De, Ii, B.L.T., 2021. The Role of Institutional Entrepreneurs and Informal Land Transactions in Mexico City’s Urban Expansion. World Dev. 140, 1–44. https://doi.org/10.1016/j.worlddev.2020.105374 Goldblatt, R., Stuhlmacher, M.F., Tellman, B., Clinton, N., Hanson, G., Georgescu, M., Wang, C., Serrano-Candela, F., Khandelwal, A.K., Cheng, W.-H., Balling, R.C., 2018. Using Landsat and nighttime lights for supervised pixel-based image classification of urban land cover. Remote Sens. Environ. 205, 253–275. https://doi.org/10.1016/j.rse.2017.11.026 
    more » « less
  2. Abstract We investigate the link between individual differences in science reasoning skills and mock jurors’ deliberation behavior; specifically, how much they talk about the scientific evidence presented in a complicated, ecologically valid case during deliberation. Consistent with our preregistered hypothesis, mock jurors strong in scientific reasoning discussed the scientific evidence more during deliberation than those with weaker science reasoning skills. Summary With increasing frequency, legal disputes involve complex scientific information (Faigman et al., 2014; Federal Judicial Center, 2011; National Research Council, 2009). Yet people often have trouble consuming scientific information effectively (McAuliff et al., 2009; National Science Board, 2014; Resnick et al., 2016). Individual differences in reasoning styles and skills can affect how people comprehend complex evidence (e.g., Hans, Kaye, Dann, Farley, Alberston, 2011; McAuliff & Kovera, 2008). Recently, scholars have highlighted the importance of studying group deliberation contexts as well as individual decision contexts (Salerno & Diamond, 2010; Kovera, 2017). If individual differences influence how jurors understand scientific evidence, it invites questions about how these individual differences may affect the way jurors discuss science during group deliberations. The purpose of the current study was to examine how individual differences in the way people process scientific information affects the extent to which jurors discuss scientific evidence during deliberations. Methods We preregistered the data collection plan, sample size, and hypotheses on the Open Science Framework. Jury-eligible community participants (303 jurors across 50 juries) from Phoenix, AZ (Mage=37.4, SD=16.9; 58.8% female; 51.5% White, 23.7% Latinx, 9.9% African-American, 4.3% Asian) were paid $55 for a 3-hour mock jury study. Participants completed a set of individual questionnaires related to science reasoning skills and attitudes toward science prior to watching a 45-minute mock armed-robbery trial. The trial included various pieces of evidence and testimony, including forensic experts testifying about mitochondrial DNA evidence (mtDNA; based on Hans et al. 2011 materials). Participants were then given 45 minutes to deliberate. The deliberations were video recorded and transcribed to text for analysis. We analyzed the deliberation content for discussions related to the scientific evidence presented during trial. We hypothesized that those with stronger scientific and numeric reasoning skills, higher need for cognition, and more positive views towards science would discuss scientific evidence more than their counterparts during deliberation. Measures We measured Attitudes Toward Science (ATS) with indices of scientific promise and scientific reservations (Hans et al., 2011; originally developed by the National Science Board, 2004; 2006). We used Drummond and Fischhoff’s (2015) Scientific Reasoning Scale (SRS) to measure scientific reasoning skills. Weller et al.’s (2012) Numeracy Scale (WNS) measured proficiency in reasoning with quantitative information. The NFC-Short Form (Cacioppo et al., 1984) measured need for cognition. Coding We identified verbal utterances related to the scientific evidence presented in court. For instance, references to DNA evidence in general (e.g. nuclear DNA being more conclusive than mtDNA), the database that was used to compare the DNA sample (e.g. the database size, how representative it was), exclusion rates (e.g. how many other people could not be excluded as a possible match), and the forensic DNA experts (e.g. how credible they were perceived). We used word count to operationalize the extent to which each juror discussed scientific information. First we calculated the total word count for each complete jury deliberation transcript. Based on the above coding scheme we determined the number of words each juror spent discussing scientific information. To compare across juries, we wanted to account for the differing length of deliberation; thus, we calculated each juror’s scientific deliberation word count as a proportion of their jury’s total word count. Results On average, jurors discussed the science for about 4% of their total deliberation (SD=4%, range 0-22%). We regressed proportion of the deliberation jurors spend discussing scientific information on the four individual difference measures (i.e., SRS, NFC, WNS, ATS). Using the adjusted R-squared, the measures significantly accounted for 5.5% of the variability in scientific information deliberation discussion, SE=0.04, F(4, 199)=3.93, p=0.004. When controlling for all other variables in the model, the Scientific Reasoning Scale was the only measure that remained significant, b=0.003, SE=0.001, t(203)=2.02, p=0.045. To analyze how much variability each measure accounted for, we performed a stepwise regression, with NFC entered at step 1, ATS entered at step 2, WNS entered at step 3, and SRS entered at step 4. At step 1, NFC accounted for 2.4% of the variability, F(1, 202)=5.95, p=0.02. At step 2, ATS did not significantly account for any additional variability. At step 3, WNS accounted for an additional 2.4% of variability, ΔF(1, 200)=5.02, p=0.03. Finally, at step 4, SRS significantly accounted for an additional 1.9% of variability in scientific information discussion, ΔF(1, 199)=4.06, p=0.045, total adjusted R-squared of 0.055. Discussion This study provides additional support for previous findings that scientific reasoning skills affect the way jurors comprehend and use scientific evidence. It expands on previous findings by suggesting that these individual differences also impact the way scientific evidence is discussed during juror deliberations. In addition, this study advances the literature by identifying Scientific Reasoning Skills as a potentially more robust explanatory individual differences variable than more well-studied constructs like Need for Cognition in jury research. Our next steps for this research, which we plan to present at AP-LS as part of this presentation, incudes further analysis of the deliberation content (e.g., not just the mention of, but the accuracy of the references to scientific evidence in discussion). We are currently coding this data with a software program called Noldus Observer XT, which will allow us to present more sophisticated results from this data during the presentation. Learning Objective: Participants will be able to describe how individual differences in scientific reasoning skills affect how much jurors discuss scientific evidence during deliberation. 
    more » « less
  3. PLEASE CONTACT AUTHORS IF YOU CONTRIBUTE AND WOULD LIKE TO BE LISTED AS A CO-AUTHOR. (this message will be removed some time weeks/months after the first publication)

    Terrestrial Parasite Tracker indexed biotic interactions and review summary.

    The Terrestrial Parasite Tracker (TPT) project began in 2019 and is funded by the National Science foundation to mobilize data from vector and ectoparasite collections to data aggregators (e.g., iDigBio, GBIF) to help build a comprehensive picture of arthropod host-association evolution, distributions, and the ecological interactions of disease vectors which will assist scientists, educators, land managers, and policy makers. Arthropod parasites often are important to human and wildlife health and safety as vectors of pathogens, and it is critical to digitize these specimens so that they, and their biotic interaction data, will be available to help understand and predict the spread of human and wildlife disease.

    This data publication contains versioned TPT associated datasets and related data products that were tracked, reviewed and indexed by Global Biotic Interactions (GloBI) and associated tools. GloBI provides open access to finding species interaction data (e.g., predator-prey, pollinator-plant, pathogen-host, parasite-host) by combining existing open datasets using open source software.

    If you have questions or comments about this publication, please open an issue at https://github.com/ParasiteTracker/tpt-reporting or contact the authors by email.

    Funding:
    The creation of this archive was made possible by the National Science Foundation award "Collaborative Research: Digitization TCN: Digitizing collections to trace parasite-host associations and predict the spread of vector-borne disease," Award numbers DBI:1901932 and DBI:1901926

    References:
    Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.

    GloBI Data Review Report

    Datasets under review:
     - University of Michigan Museum of Zoology Insect Division. Full Database Export 2020-11-20 provided by Erika Tucker and Barry Oconner. accessed via https://github.com/EMTuckerLabUMMZ/ummzi/archive/6731357a377e9c2748fc931faa2ff3dc0ce3ea7a.zip on 2022-06-24T14:02:48.801Z
     - Academy of Natural Sciences Entomology Collection for the Parasite Tracker Project accessed via https://github.com/globalbioticinteractions/ansp-para/archive/5e6592ad09ec89ba7958266ad71ec9d5d21d1a44.zip on 2022-06-24T14:04:22.091Z
     - Bernice Pauahi Bishop Museum, J. Linsley Gressitt Center for Research in Entomology accessed via https://github.com/globalbioticinteractions/bpbm-ent/archive/c085398dddd36f8a1169b9cf57de2a572229341b.zip on 2022-06-24T14:04:37.692Z
     - Texas A&M University, Biodiversity Teaching and Research Collections accessed via https://github.com/globalbioticinteractions/brtc-para/archive/f0a718145b05ed484c4d88947ff712d5f6395446.zip on 2022-06-24T14:06:40.154Z
     - Brigham Young University Arthropod Museum accessed via https://github.com/globalbioticinteractions/byu-byuc/archive/4a609ac6a9a03425e2720b6cdebca6438488f029.zip on 2022-06-24T14:06:51.420Z
     - California Academy of Sciences Entomology accessed via https://github.com/globalbioticinteractions/cas-ent/archive/562aea232ec74ab615f771239451e57b057dc7c0.zip on 2022-06-24T14:07:16.371Z
     - Clemson University Arthropod Collection accessed via https://github.com/globalbioticinteractions/cu-cuac/archive/6cdcbbaa4f7cec8e1eac705be3a999bc5259e00f.zip on 2022-06-24T14:07:40.925Z
     - Denver Museum of Nature and Science (DMNS) Parasite specimens (DMNS:Para) accessed via https://github.com/globalbioticinteractions/dmns-para/archive/a037beb816226eb8196533489ee5f98a6dfda452.zip on 2022-06-24T14:08:00.730Z
     - Field Museum of Natural History IPT accessed via https://github.com/globalbioticinteractions/fmnh/archive/6bfc1b7e46140e93f5561c4e837826204adb3c2f.zip on 2022-06-24T14:18:51.995Z
     - Illinois Natural History Survey Insect Collection accessed via https://github.com/globalbioticinteractions/inhs-insects/archive/38692496f590577074c7cecf8ea37f85d0594ae1.zip on 2022-06-24T14:19:37.563Z
     - UMSP / University of Minnesota / University of Minnesota Insect Collection accessed via https://github.com/globalbioticinteractions/min-umsp/archive/3f1b9d32f947dcb80b9aaab50523e097f0e8776e.zip on 2022-06-24T14:20:27.232Z
     - Milwaukee Public Museum Biological Collections Data Portal accessed via https://github.com/globalbioticinteractions/mpm/archive/9f44e99c49ec5aba3f8592cfced07c38d3223dcd.zip on 2022-06-24T14:20:46.185Z
     - Museum for Southern Biology (MSB) Parasite Collection accessed via https://github.com/globalbioticinteractions/msb-para/archive/178a0b7aa0a8e14b3fe953e770703fe331eadacc.zip on 2022-06-24T15:16:07.223Z
     - The Albert J. Cook Arthropod Research Collection accessed via https://github.com/globalbioticinteractions/msu-msuc/archive/38960906380443bd8108c9e44aeff4590d8d0b50.zip on 2022-06-24T16:09:40.702Z
     - Ohio State University Acarology Laboratory accessed via https://github.com/globalbioticinteractions/osal-ar/archive/876269d66a6a94175dbb6b9a604897f8032b93dd.zip on 2022-06-24T16:10:00.281Z
     - Frost Entomological Museum, Pennsylvania State University accessed via https://github.com/globalbioticinteractions/psuc-ento/archive/30b1f96619a6e9f10da18b42fb93ff22cc4f72e2.zip on 2022-06-24T16:10:07.741Z
     - Purdue Entomological Research Collection accessed via https://github.com/globalbioticinteractions/pu-perc/archive/e0909a7ca0a8df5effccb288ba64b28141e388ba.zip on 2022-06-24T16:10:26.654Z
     - Texas A&M University Insect Collection accessed via https://github.com/globalbioticinteractions/tamuic-ent/archive/f261a8c192021408da67c39626a4aac56e3bac41.zip on 2022-06-24T16:10:58.496Z
     - University of California Santa Barbara Invertebrate Zoology Collection accessed via https://github.com/globalbioticinteractions/ucsb-izc/archive/825678ad02df93f6d4469f9d8b7cc30151b9aa45.zip on 2022-06-24T16:12:29.854Z
     - University of Hawaii Insect Museum accessed via https://github.com/globalbioticinteractions/uhim/archive/53fa790309e48f25685e41ded78ce6a51bafde76.zip on 2022-06-24T16:12:41.408Z
     - University of New Hampshire Collection of Insects and other Arthropods UNHC-UNHC accessed via https://github.com/globalbioticinteractions/unhc/archive/f72575a72edda8a4e6126de79b4681b25593d434.zip on 2022-06-24T16:12:59.500Z
     - Scott L. Gardner and Gabor R. Racz (2021). University of Nebraska State Museum - Parasitology. Harold W. Manter Laboratory of Parasitology. University of Nebraska State Museum. accessed via https://github.com/globalbioticinteractions/unl-nsm/archive/6bcd8aec22e4309b7f4e8be1afe8191d391e73c6.zip on 2022-06-24T16:13:06.914Z
     - Data were obtained from specimens belonging to the United States National Museum of Natural History (USNM), Smithsonian Institution, Washington DC and digitized by the Walter Reed Biosystematics Unit (WRBU). accessed via https://github.com/globalbioticinteractions/usnmentflea/archive/ce5cb1ed2bbc13ee10062b6f75a158fd465ce9bb.zip on 2022-06-24T16:13:38.013Z
     - US National Museum of Natural History Ixodes Records accessed via https://github.com/globalbioticinteractions/usnm-ixodes/archive/c5fcd5f34ce412002783544afb628a33db7f47a6.zip on 2022-06-24T16:13:45.666Z
     - Price Institute of Parasite Research, School of Biological Sciences, University of Utah accessed via https://github.com/globalbioticinteractions/utah-piper/archive/43da8db550b5776c1e3d17803831c696fe9b8285.zip on 2022-06-24T16:13:54.724Z
     - University of Wisconsin Stevens Point, Stephen J. Taft Parasitological Collection accessed via https://github.com/globalbioticinteractions/uwsp-para/archive/f9d0d52cd671731c7f002325e84187979bca4a5b.zip on 2022-06-24T16:14:04.745Z
     - Giraldo-Calderón, G. I., Emrich, S. J., MacCallum, R. M., Maslen, G., Dialynas, E., Topalis, P., … Lawson, D. (2015). VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic acids research, 43(Database issue), D707–D713. doi:10.1093/nar/gku1117. accessed via https://github.com/globalbioticinteractions/vectorbase/archive/00d6285cd4e9f4edd18cb2778624ab31b34b23b8.zip on 2022-06-24T16:14:11.965Z
     - WIRC / University of Wisconsin Madison WIS-IH / Wisconsin Insect Research Collection accessed via https://github.com/globalbioticinteractions/wis-ih-wirc/archive/34162b86c0ade4b493471543231ae017cc84816e.zip on 2022-06-24T16:14:29.743Z
     - Yale University Peabody Museum Collections Data Portal accessed via https://github.com/globalbioticinteractions/yale-peabody/archive/43be869f17749d71d26fc820c8bd931d6149fe8e.zip on 2022-06-24T16:23:29.289Z

    Generated on:
    2022-06-24

    by:
    GloBI's Elton 0.12.4 
    (see https://github.com/globalbioticinteractions/elton).

    Note that all files ending with .tsv are files formatted 
    as UTF8 encoded tab-separated values files.

    https://www.iana.org/assignments/media-types/text/tab-separated-values


    Included in this review archive are:

    README:
      This file.

    review_summary.tsv:
      Summary across all reviewed collections of total number of distinct review comments.

    review_summary_by_collection.tsv:
      Summary by reviewed collection of total number of distinct review comments.

    indexed_interactions_by_collection.tsv: 
      Summary of number of indexed interaction records by institutionCode and collectionCode.

    review_comments.tsv.gz:
      All review comments by collection.

    indexed_interactions_full.tsv.gz:
      All indexed interactions for all reviewed collections.

    indexed_interactions_simple.tsv.gz:
      All indexed interactions for all reviewed collections selecting only sourceInstitutionCode, sourceCollectionCode, sourceCatalogNumber, sourceTaxonName, interactionTypeName and targetTaxonName.

    datasets_under_review.tsv:
      Details on the datasets under review.

    elton.jar: 
      Program used to update datasets and generate the review reports and associated indexed interactions.

    datasets.zip:
      Source datasets used by elton.jar in process of executing the generate_report.sh script.

    generate_report.sh:
      Program used to generate the report

    generate_report.log:
      Log file generated as part of running the generate_report.sh script
     

     
    more » « less
  4. Purpose This paper addresses the significance of training students in entrepreneurship to enable sustained national and international competitiveness in the knowledge-based global marketplace. Entrepreneurial education is varied, ranging from basic to in-depth courses, including customer-focused programs, such as the National Science Foundation (NSF) sponsored Innovation Corps (I-Corps) program. This program is nationally-renowned with strong academic roots. A full site was launched at the University of Central Florida (UCF) in January 2015 and was the first I-Corps program in the state of Florida. Design/methodology/approach This paper addresses the importance of entrepreneurship education, reviews the available national training programs in entrepreneurship, presents the design methodology of the NSF I-Corps program, and analyzes the results of the teams who have participated in the NSF I-Corps program. Findings The results are categorized into innovative areas and show the percentage of teams who participated in the I-Corps program in each area. It also identifies the percentage of teams who engaged in actual startup activities following I-Corps participation. Practical implications Educators, students, and trainers can use the findings to benchmark the outcomes of training programs in entrepreneurship. Students and innovators interested in participating in I-Corps can use this paper to obtain insights and a broader understanding of what was done in terms of results and implications. Originality/value This paper contributes a unique analysis of the I-Corps program approach and its outcomes since its launch in 2015 and can be used as a reference for any training program in entrepreneurship. 
    more » « less
  5. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do not have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA. 
    more » « less