skip to main content

Title: NSEC 2015 Conference Proceedings
This is the 2015 NSEC National Conference Program Guide with abstracts and presentations held on June 3-4, 2015. Keynote speaker was Susan R. Singer, Division Director for Undergraduate Education at the National Science Foundation and the Laurence McKinley Gould Professor, in the Biology and Cognitive Science Departments at Carleton College.
Redd, Kacy; Finkelstein, Noah
Award ID(s):
Publication Date:
Journal Name:
Network of STEM Education Centers
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract
    This dataset incorporates Mexico City related essential data files associated with Beth Tellman's dissertation: Mapping and Modeling Illicit and Clandestine Drivers of Land Use Change: Urban Expansion in Mexico City and Deforestation in Central America. It contains spatio-temporal datasets covering three domains; i) urban expansion from 1992-2015, ii) district and section electoral records for 6 elections from 2000-2015, iii) land titling (regularization) data for informal settlements from 1997-2012 on private and ejido land. The urban expansion data includes 30m resolution urban land cover for 1992 and 2013 (methods published in Goldblatt et al 2018), and a shapefile of digitizedMore>>
  2. Abstract We investigate the link between individual differences in science reasoning skills and mock jurors’ deliberation behavior; specifically, how much they talk about the scientific evidence presented in a complicated, ecologically valid case during deliberation. Consistent with our preregistered hypothesis, mock jurors strong in scientific reasoning discussed the scientific evidence more during deliberation than those with weaker science reasoning skills. Summary With increasing frequency, legal disputes involve complex scientific information (Faigman et al., 2014; Federal Judicial Center, 2011; National Research Council, 2009). Yet people often have trouble consuming scientific information effectively (McAuliff et al., 2009; National Science Board, 2014; Resnickmore »et al., 2016). Individual differences in reasoning styles and skills can affect how people comprehend complex evidence (e.g., Hans, Kaye, Dann, Farley, Alberston, 2011; McAuliff & Kovera, 2008). Recently, scholars have highlighted the importance of studying group deliberation contexts as well as individual decision contexts (Salerno & Diamond, 2010; Kovera, 2017). If individual differences influence how jurors understand scientific evidence, it invites questions about how these individual differences may affect the way jurors discuss science during group deliberations. The purpose of the current study was to examine how individual differences in the way people process scientific information affects the extent to which jurors discuss scientific evidence during deliberations. Methods We preregistered the data collection plan, sample size, and hypotheses on the Open Science Framework. Jury-eligible community participants (303 jurors across 50 juries) from Phoenix, AZ (Mage=37.4, SD=16.9; 58.8% female; 51.5% White, 23.7% Latinx, 9.9% African-American, 4.3% Asian) were paid $55 for a 3-hour mock jury study. Participants completed a set of individual questionnaires related to science reasoning skills and attitudes toward science prior to watching a 45-minute mock armed-robbery trial. The trial included various pieces of evidence and testimony, including forensic experts testifying about mitochondrial DNA evidence (mtDNA; based on Hans et al. 2011 materials). Participants were then given 45 minutes to deliberate. The deliberations were video recorded and transcribed to text for analysis. We analyzed the deliberation content for discussions related to the scientific evidence presented during trial. We hypothesized that those with stronger scientific and numeric reasoning skills, higher need for cognition, and more positive views towards science would discuss scientific evidence more than their counterparts during deliberation. Measures We measured Attitudes Toward Science (ATS) with indices of scientific promise and scientific reservations (Hans et al., 2011; originally developed by the National Science Board, 2004; 2006). We used Drummond and Fischhoff’s (2015) Scientific Reasoning Scale (SRS) to measure scientific reasoning skills. Weller et al.’s (2012) Numeracy Scale (WNS) measured proficiency in reasoning with quantitative information. The NFC-Short Form (Cacioppo et al., 1984) measured need for cognition. Coding We identified verbal utterances related to the scientific evidence presented in court. For instance, references to DNA evidence in general (e.g. nuclear DNA being more conclusive than mtDNA), the database that was used to compare the DNA sample (e.g. the database size, how representative it was), exclusion rates (e.g. how many other people could not be excluded as a possible match), and the forensic DNA experts (e.g. how credible they were perceived). We used word count to operationalize the extent to which each juror discussed scientific information. First we calculated the total word count for each complete jury deliberation transcript. Based on the above coding scheme we determined the number of words each juror spent discussing scientific information. To compare across juries, we wanted to account for the differing length of deliberation; thus, we calculated each juror’s scientific deliberation word count as a proportion of their jury’s total word count. Results On average, jurors discussed the science for about 4% of their total deliberation (SD=4%, range 0-22%). We regressed proportion of the deliberation jurors spend discussing scientific information on the four individual difference measures (i.e., SRS, NFC, WNS, ATS). Using the adjusted R-squared, the measures significantly accounted for 5.5% of the variability in scientific information deliberation discussion, SE=0.04, F(4, 199)=3.93, p=0.004. When controlling for all other variables in the model, the Scientific Reasoning Scale was the only measure that remained significant, b=0.003, SE=0.001, t(203)=2.02, p=0.045. To analyze how much variability each measure accounted for, we performed a stepwise regression, with NFC entered at step 1, ATS entered at step 2, WNS entered at step 3, and SRS entered at step 4. At step 1, NFC accounted for 2.4% of the variability, F(1, 202)=5.95, p=0.02. At step 2, ATS did not significantly account for any additional variability. At step 3, WNS accounted for an additional 2.4% of variability, ΔF(1, 200)=5.02, p=0.03. Finally, at step 4, SRS significantly accounted for an additional 1.9% of variability in scientific information discussion, ΔF(1, 199)=4.06, p=0.045, total adjusted R-squared of 0.055. Discussion This study provides additional support for previous findings that scientific reasoning skills affect the way jurors comprehend and use scientific evidence. It expands on previous findings by suggesting that these individual differences also impact the way scientific evidence is discussed during juror deliberations. In addition, this study advances the literature by identifying Scientific Reasoning Skills as a potentially more robust explanatory individual differences variable than more well-studied constructs like Need for Cognition in jury research. Our next steps for this research, which we plan to present at AP-LS as part of this presentation, incudes further analysis of the deliberation content (e.g., not just the mention of, but the accuracy of the references to scientific evidence in discussion). We are currently coding this data with a software program called Noldus Observer XT, which will allow us to present more sophisticated results from this data during the presentation. Learning Objective: Participants will be able to describe how individual differences in scientific reasoning skills affect how much jurors discuss scientific evidence during deliberation.« less
  3. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients withmore »cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do not have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository ( -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016.« less
  4. The University of Akron has had two National Science Foundation (NSF) funded science, technology, engineering and mathematics scholarship (S-STEM) programs. The cohort of the first S-STEM program (2010-2015) were students that were directly admitted to their selected discipline’s department. The current NSF S-STEM cohort (2015-2020) is a mix of students who were either directly admitted to their major or college-ready students. The university classifies college-ready students as those who are ready for college but lack either a requisite high school GPA, ACT score or completion of a high school science or math course. Each program spanned five years with sciencemore »disciplines typically graduating in four years and engineering students that participated in co-operative education graduating in five years. The final year of each S- STEM was used to provide peer mentoring in a pseudo-formal environment. In each, seniors who had already participated in the S-STEM program for four years mentored new freshmen for one year. This paper will describe demographics of each S-STEM cohort, the activities used during the peer mentoring, observable differences between direct admit and college-ready freshmen with respect to peer mentoring, and possible peer mentoring activities that can be implemented at other institutions.« less
  5. Abstract Expert testimony varies in scientific quality and jurors have a difficult time evaluating evidence quality (McAuliff et al., 2009). In the current study, we apply Fuzzy Trace Theory principles, examining whether visual and gist aids help jurors calibrate to the strength of scientific evidence. Additionally we were interested in the role of jurors’ individual differences in scientific reasoning skills in their understanding of case evidence. Contrary to our preregistered hypotheses, there was no effect of evidence condition or gist aid on evidence understanding. However, individual differences between jurors’ numeracy skills predicted evidence understanding. Summary Poor-quality expert evidence is sometimesmore »admitted into court (Smithburn, 2004). Jurors’ calibration to evidence strength varies widely and is not robustly understood. For instance, previous research has established jurors lack understanding of the role of control groups, confounds, and sample sizes in scientific research (McAuliff, Kovera, & Nunez, 2009; Mill, Gray, & Mandel, 1994). Still others have found that jurors can distinguish weak from strong evidence when the evidence is presented alone, yet not when simultaneously presented with case details (Smith, Bull, & Holliday, 2011). This research highlights the need to present evidence to jurors in a way they can understand. Fuzzy Trace Theory purports that people encode information in exact, verbatim representations and through “gist” representations, which represent summary of meaning (Reyna & Brainerd, 1995). It is possible that the presenting complex scientific evidence to people with verbatim content or appealing to the gist, or bottom-line meaning of the information may influence juror understanding of that evidence. Application of Fuzzy Trace Theory in the medical field has shown that gist representations are beneficial for helping laypeople better understand risk and benefits of medical treatment (Brust-Renck, Reyna, Wilhelms, & Lazar, 2016). Yet, little research has applied Fuzzy Trace Theory to information comprehension and application within the context of a jury (c.f. Reyna et. al., 2015). Additionally, it is likely that jurors’ individual characteristics, such as scientific reasoning abilities and cognitive tendencies, influence their ability to understand and apply complex scientific information (Coutinho, 2006). Methods The purpose of this study was to examine how jurors calibrate to the strength of scientific information, and whether individual difference variables and gist aids inspired by Fuzzy Trace Theory help jurors better understand complicated science of differing quality. We used a 2 (quality of scientific evidence: high vs. low) x 2 (decision aid to improve calibration - gist information vs. no gist information), between-subjects design. All hypotheses were preregistered on the Open Science Framework. Jury-eligible community participants (430 jurors across 90 juries; Mage = 37.58, SD = 16.17, 58% female, 56.93% White). Each jury was randomly assigned to one of the four possible conditions. Participants were asked to individually fill out measures related to their scientific reasoning skills prior to watching a mock jury trial. The trial was about an armed bank robbery and consisted of various pieces of testimony and evidence (e.g. an eyewitness testimony, police lineup identification, and a sweatshirt found with the stolen bank money). The key piece of evidence was mitochondrial DNA (mtDNA) evidence collected from hair on a sweatshirt (materials from Hans et al., 2011). Two experts presented opposing opinions about the scientific evidence related to the mtDNA match estimate for the defendant’s identification. The quality and content of this mtDNA evidence differed based on the two conditions. The high quality evidence condition used a larger database than the low quality evidence to compare to the mtDNA sample and could exclude a larger percentage of people. In the decision aid condition, experts in the gist information group presented gist aid inspired visuals and examples to help explain the proportion of people that could not be excluded as a match. Those in the no gist information group were not given any aid to help them understand the mtDNA evidence presented. After viewing the trial, participants filled out a questionnaire on how well they understood the mtDNA evidence and their overall judgments of the case (e.g. verdict, witness credibility, scientific evidence strength). They filled this questionnaire out again after a 45-minute deliberation. Measures We measured Attitudes Toward Science (ATS) with indices of scientific promise and scientific reservations (Hans et al., 2011; originally developed by National Science Board, 2004; 2006). We used Drummond and Fischhoff’s (2015) Scientific Reasoning Scale (SRS) to measure scientific reasoning skills. Weller et al.’s (2012) Numeracy Scale (WNS) measured proficiency in reasoning with quantitative information. The NFC-Short Form (Cacioppo et al., 1984) measured need for cognition. We developed a 20-item multiple-choice comprehension test for the mtDNA scientific information in the cases (modeled on Hans et al., 2011, and McAuliff et al., 2009). Participants were shown 20 statements related to DNA evidence and asked whether these statements were True or False. The test was then scored out of 20 points. Results For this project, we measured calibration to the scientific evidence in a few different ways. We are building a full model with these various operationalizations to be presented at APLS, but focus only on one of the calibration DVs (i.e., objective understanding of the mtDNA evidence) in the current proposal. We conducted a general linear model with total score on the mtDNA understanding measure as the DV and quality of scientific evidence condition, decision aid condition, and the four individual difference measures (i.e., NFC, ATS, WNS, and SRS) as predictors. Contrary to our main hypotheses, neither evidence quality nor decision aid condition affected juror understanding. However, the individual difference variables did: we found significant main effects for Scientific Reasoning Skills, F(1, 427) = 16.03, p <.001, np2 = .04, Weller Numeracy Scale, F(1, 427) = 15.19, p <.001, np2 = .03, and Need for Cognition, F(1, 427) = 16.80, p <.001, np2 = .04, such that those who scored higher on these measures displayed better understanding of the scientific evidence. In addition there was a significant interaction of evidence quality condition and scores on the Weller’s Numeracy Scale, F(1, 427) = 4.10, p = .04, np2 = .01. Further results will be discussed. Discussion These data suggest jurors are not sensitive to differences in the quality of scientific mtDNA evidence, and also that our attempt at helping sensitize them with Fuzzy Trace Theory-inspired aids did not improve calibration. Individual scientific reasoning abilities and general cognition styles were better predictors of understanding this scientific information. These results suggest a need for further exploration of approaches to help jurors differentiate between high and low quality evidence. Note: The 3rd author was supported by an AP-LS AP Award for her role in this research. Learning Objective: Participants will be able to describe how individual differences in scientific reasoning skills help jurors understand complex scientific evidence.« less