skip to main content


Title: Clustering learners’ feedback processing patterns based on their response latency
In intelligent tutoring systems (ITS) abundant supportive messages are provided to learners. One implicit assumption behind this design is that learners would actively process and benefit from feedback messages when interacting with ITS individually. However, this is not true for all learners; some gain little after numerous practice opportunities. In the current research, we assume that if the learner invests enough cognitive effort to review feedback messages provided by the system, the learner’s performance should be improved as practice opportunities accumulate. We expect that the learner’s cognitive effort investment could be reflected to some extent by the response latency, then the learner’s improvement should also be correlated with the response latency. Therefore, based on this core hypothesis, we conduct a cluster analysis by exploring features relevant to learners’ response latency. We expect to find several features that could be used as indicators of the feedback usage of learners; consequently, these features may be used to predict learners’ learning gain in future research. Our results suggest that learners’ prior knowledge level plays a role when interacting with ITS and different patterns of response latency. Learners with higher prior knowledge levels tend to interact flexibly with the system and use feedback messages more effectively. The quality of their previous attempts influences their response latency. However, learners with lower prior knowledge perform two opposite patterns, some tend to respond more quickly, and some tend to respond more slowly. One common characteristic of these learners is their incorrect response latency is not influenced by the quality of their previous performance. One interesting result is that those quick responders forget faster. Thus, we concluded that for learners with lower prior knowledge, it is better for them not to react hastily to obtain a more durable memory.  more » « less
Award ID(s):
1934745
NSF-PAR ID:
10353236
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of The Third Workshop of the Learner Data Institute , The 15th International Conference on Educational Data Mining (EDM 2022)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do not have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA. 
    more » « less
  2. null (Ed.)
    The DeepLearningEpilepsyDetectionChallenge: design, implementation, andtestofanewcrowd-sourced AIchallengeecosystem Isabell Kiral*, Subhrajit Roy*, Todd Mummert*, Alan Braz*, Jason Tsay, Jianbin Tang, Umar Asif, Thomas Schaffter, Eren Mehmet, The IBM Epilepsy Consortium◊ , Joseph Picone, Iyad Obeid, Bruno De Assis Marques, Stefan Maetschke, Rania Khalaf†, Michal Rosen-Zvi† , Gustavo Stolovitzky† , Mahtab Mirmomeni† , Stefan Harrer† * These authors contributed equally to this work † Corresponding authors: rkhalaf@us.ibm.com, rosen@il.ibm.com, gustavo@us.ibm.com, mahtabm@au1.ibm.com, sharrer@au.ibm.com ◊ Members of the IBM Epilepsy Consortium are listed in the Acknowledgements section J. Picone and I. Obeid are with Temple University, USA. T. Schaffter is with Sage Bionetworks, USA. E. Mehmet is with the University of Illinois at Urbana-Champaign, USA. All other authors are with IBM Research in USA, Israel and Australia. Introduction This decade has seen an ever-growing number of scientific fields benefitting from the advances in machine learning technology and tooling. More recently, this trend reached the medical domain, with applications reaching from cancer diagnosis [1] to the development of brain-machine-interfaces [2]. While Kaggle has pioneered the crowd-sourcing of machine learning challenges to incentivise data scientists from around the world to advance algorithm and model design, the increasing complexity of problem statements demands of participants to be expert data scientists, deeply knowledgeable in at least one other scientific domain, and competent software engineers with access to large compute resources. People who match this description are few and far between, unfortunately leading to a shrinking pool of possible participants and a loss of experts dedicating their time to solving important problems. Participation is even further restricted in the context of any challenge run on confidential use cases or with sensitive data. Recently, we designed and ran a deep learning challenge to crowd-source the development of an automated labelling system for brain recordings, aiming to advance epilepsy research. A focus of this challenge, run internally in IBM, was the development of a platform that lowers the barrier of entry and therefore mitigates the risk of excluding interested parties from participating. The challenge: enabling wide participation With the goal to run a challenge that mobilises the largest possible pool of participants from IBM (global), we designed a use case around previous work in epileptic seizure prediction [3]. In this “Deep Learning Epilepsy Detection Challenge”, participants were asked to develop an automatic labelling system to reduce the time a clinician would need to diagnose patients with epilepsy. Labelled training and blind validation data for the challenge were generously provided by Temple University Hospital (TUH) [4]. TUH also devised a novel scoring metric for the detection of seizures that was used as basis for algorithm evaluation [5]. In order to provide an experience with a low barrier of entry, we designed a generalisable challenge platform under the following principles: 1. No participant should need to have in-depth knowledge of the specific domain. (i.e. no participant should need to be a neuroscientist or epileptologist.) 2. No participant should need to be an expert data scientist. 3. No participant should need more than basic programming knowledge. (i.e. no participant should need to learn how to process fringe data formats and stream data efficiently.) 4. No participant should need to provide their own computing resources. In addition to the above, our platform should further • guide participants through the entire process from sign-up to model submission, • facilitate collaboration, and • provide instant feedback to the participants through data visualisation and intermediate online leaderboards. The platform The architecture of the platform that was designed and developed is shown in Figure 1. The entire system consists of a number of interacting components. (1) A web portal serves as the entry point to challenge participation, providing challenge information, such as timelines and challenge rules, and scientific background. The portal also facilitated the formation of teams and provided participants with an intermediate leaderboard of submitted results and a final leaderboard at the end of the challenge. (2) IBM Watson Studio [6] is the umbrella term for a number of services offered by IBM. Upon creation of a user account through the web portal, an IBM Watson Studio account was automatically created for each participant that allowed users access to IBM's Data Science Experience (DSX), the analytics engine Watson Machine Learning (WML), and IBM's Cloud Object Storage (COS) [7], all of which will be described in more detail in further sections. (3) The user interface and starter kit were hosted on IBM's Data Science Experience platform (DSX) and formed the main component for designing and testing models during the challenge. DSX allows for real-time collaboration on shared notebooks between team members. A starter kit in the form of a Python notebook, supporting the popular deep learning libraries TensorFLow [8] and PyTorch [9], was provided to all teams to guide them through the challenge process. Upon instantiation, the starter kit loaded necessary python libraries and custom functions for the invisible integration with COS and WML. In dedicated spots in the notebook, participants could write custom pre-processing code, machine learning models, and post-processing algorithms. The starter kit provided instant feedback about participants' custom routines through data visualisations. Using the notebook only, teams were able to run the code on WML, making use of a compute cluster of IBM's resources. The starter kit also enabled submission of the final code to a data storage to which only the challenge team had access. (4) Watson Machine Learning provided access to shared compute resources (GPUs). Code was bundled up automatically in the starter kit and deployed to and run on WML. WML in turn had access to shared storage from which it requested recorded data and to which it stored the participant's code and trained models. (5) IBM's Cloud Object Storage held the data for this challenge. Using the starter kit, participants could investigate their results as well as data samples in order to better design custom algorithms. (6) Utility Functions were loaded into the starter kit at instantiation. This set of functions included code to pre-process data into a more common format, to optimise streaming through the use of the NutsFlow and NutsML libraries [10], and to provide seamless access to the all IBM services used. Not captured in the diagram is the final code evaluation, which was conducted in an automated way as soon as code was submitted though the starter kit, minimising the burden on the challenge organising team. Figure 1: High-level architecture of the challenge platform Measuring success The competitive phase of the "Deep Learning Epilepsy Detection Challenge" ran for 6 months. Twenty-five teams, with a total number of 87 scientists and software engineers from 14 global locations participated. All participants made use of the starter kit we provided and ran algorithms on IBM's infrastructure WML. Seven teams persisted until the end of the challenge and submitted final solutions. The best performing solutions reached seizure detection performances which allow to reduce hundred-fold the time eliptologists need to annotate continuous EEG recordings. Thus, we expect the developed algorithms to aid in the diagnosis of epilepsy by significantly shortening manual labelling time. Detailed results are currently in preparation for publication. Equally important to solving the scientific challenge, however, was to understand whether we managed to encourage participation from non-expert data scientists. Figure 2: Primary occupation as reported by challenge participants Out of the 40 participants for whom we have occupational information, 23 reported Data Science or AI as their main job description, 11 reported being a Software Engineer, and 2 people had expertise in Neuroscience. Figure 2 shows that participants had a variety of specialisations, including some that are in no way related to data science, software engineering, or neuroscience. No participant had deep knowledge and experience in data science, software engineering and neuroscience. Conclusion Given the growing complexity of data science problems and increasing dataset sizes, in order to solve these problems, it is imperative to enable collaboration between people with differences in expertise with a focus on inclusiveness and having a low barrier of entry. We designed, implemented, and tested a challenge platform to address exactly this. Using our platform, we ran a deep-learning challenge for epileptic seizure detection. 87 IBM employees from several business units including but not limited to IBM Research with a variety of skills, including sales and design, participated in this highly technical challenge. 
    more » « less
  3. Objective Over the past decade, we developed and studied a face-to-face video-based analysis-of-practice professional development (PD) model. In a cluster randomized trial, we found that the face-to-face model enhanced elementary science teacher knowledge and practice and resulted in important improvements to student science achievement (student treatment effect, d = 0.52; Taylor et al, 2017; Roth et al, 2018). The face-to-face PD model is expensive and difficult to scale. In this paper, we present the results of a two-year design-based research study to translate the face-to-face PD into a facilitated online PD experience. The purpose is to create an effective, flexible, and cost-efficient PD model that will reach a broader audience of teachers. Perspective/Theoretical Framework The face-to-face PD model is grounded in situated cognition and cognitive apprenticeship frameworks. Teachers engage in learning science content and effective science teaching practices in the context in which they will be teaching. There are scaffolded opportunities for teachers to learn from analysis of model videos by experienced teachers, to try teaching model units, to analyze video of their own teaching efforts, and ultimately to develop their own unit, with guidance. The PD model attends to the key features of effective PD as described by Desimone (2009) and others. We adhered closely to the design principles of the face-to-face model as described by Authors, 2019. Methods We followed a design-based research approach (DBR; Cobb et al., 2003; Shavelson et al., 2003) to examine the online program components and how they promoted or interfered with the development of teachers’ knowledge and reflective practice. Of central interest was the examination of mechanisms for facilitating teacher learning (Confrey, 2006). To accomplish this goal, design researchers engaged in iterative cycles of problem analysis, design, implementation, examination, and redesign (Wang & Hannafin, 2005) in phase one of the project before studying its effect. Data Three small pilot groups of teachers engaged in both synchronous and asynchronous components of the larger online course which began implementation with a 10-week summer course that leads into study groups of participants meeting through one academic year. We iteratively designed, tested, and revised 17 modules across three pilot versions. On average, pilot groups completed one module every two weeks. Pilot 1 began the work in May 2019; Pilot 2 began in August 2019, and Pilot 3 began in October 2019. Pilot teachers responded to surveys and took part in interviews related to the PD. The PD facilitators took extensive notes after each iteration. The development team met weekly to discuss revisions. We revised all modules between each pilot group and used what we learned to inform our development of later modules within each pilot. For example, we applied what we learned from testing Module 3 with Pilot 1 to the development of Module 3 for Pilots 2, and also applied what we learned from Module 3 with Pilot 1 to the development of Module 7 for Pilot 1. Results We found that community building required the same incremental trust-building activities that occur in face-to-face PD. Teachers began with low-risk activities and gradually engaged in activities that required greater vulnerability (sharing a video of themselves teaching a model unit for analysis and critique by the group). We also identified how to contextualize technical tools with instructional prompts to allow teachers to productively interact with one another about science ideas asynchronously. As part of that effort, we crafted crux questions to surface teachers’ confusions or challenges related to content or pedagogy. We called them crux questions because they revealed teachers’ uncertainty and deepened learning during the discussion. Facilitators leveraged asynchronous responses to crux questions in the synchronous sessions to push teacher thinking further than would have otherwise been possible in a 2-hour synchronous video-conference. Significance Supporting teachers with effective, flexible, and cost-efficient PD is difficult under the best of circumstances. In the era of covid-19, online PD has taken on new urgency. NARST members will gain insight into the translation of an effective face-to-face PD model to an online environment. 
    more » « less
  4. Background: The field of mathematics education has made progress toward generating a set of instructional practices that could support improvements in the learning opportunities made available to groups of students who historically have been underserved and marginalized. Studies that contribute to this growing body of work are often conducted in learning environments that are framed as “successful.” As a researcher who is concerned with issues of equity and who acknowledges the importance of closely attending to the quality of the mathematical activity in which students are being asked to engage, my stance on “successful learning environments” pulls from both Gutiérrez’s descriptions of what characterizes classrooms as aiming for equity and the emphasis on the importance of conceptually oriented goals for student learning that is outlined in documents like the Standards. Though as a field we are growing in our knowledge of practices that support these successful learning environments, this knowledge has not yet been reflected in many of the observational tools, rubrics, and protocols used to study these environments. In addition, there is a growing need to develop empirically grounded ways of attending to the extent to which the practices that are being outlined in research literature actually contribute to the “success” of these learning environments. Purpose: The purpose of this article is to explore one way of meeting this growing need by describing the complex work of developing a set of classroom observation rubrics (the Equity and Access Rubrics for Mathematics Instruction, EAR-MI) designed to support efforts in identifying and observing critical features of classrooms characterized as having potential for “success.” In developing the rubrics, I took as my starting place findings from an analysis that compared a set of classrooms that were characterized as demonstrating aspects of successful learning environments and a set of classrooms that were not characterized as successful. This paper not only describes the process of developing the rubrics, but also outlines some of the qualitative differences that distinguished more and less effective examples of the practices the rubrics are designed to capture. Research Design: In designing the rubrics, I engaged in multiple cycles of qualitative analyses of video data collected from a large-scale study. Specifically, I iteratively designed, tested, and revised the developing rubrics while consistently collaborating with, consulting with, and receiving feedback from different experts in the field of education. Conclusions: Although I fully acknowledge and recognize that there are several tensions and limitations of this work, I argue that developing rubrics like the EAR-MI is still worthwhile. One reason that I give for continuing these types of efforts is that it contributes to the work of breaking down forms of practice into components and identifying key aspects of specific practices that are critical for supporting student learning in ways that make potentially productive routines of action visible to and learnable by others, which may ultimately contribute to the development of more successful learning environments. I also argue that rubrics like the EAR-MI have the potential to support researchers in developing stronger evidence of the effectiveness of practices that prior research has identified as critical for marginalized students and in more accurately and concretely identifying and describing learning environments as having potential for “success.” 
    more » « less
  5. Objective Over the past decade, we developed and studied a face-to-face video-based analysis-of-practice PD model. In a cluster randomized trial, we found that the face-to-face model enhanced elementary science teacher knowledge and practice, and resulted in important improvements to student science achievement (student treatment effect, d = 0.52; Taylor et al., 2017: Roth et al., 2018). The face-to-face PD model is expensive and difficult to scale. In this poster, we present the results of a two-year design-based research study to translate the face-to-face PD into a facilitated online PD experience. The purpose is to create an effective, flexible, and cost-efficient PD model that will reach a broader audience of teachers. Perspective/Theoretical Framework The face-to-face PD model is grounded in situated cognition and cognitive apprenticeship frameworks. Teachers engage in learning science content and practices in the context in which they will be teaching. In addition, there are scaffolded opportunities for teachers to learn from model videos by experienced teachers, try model units, and ultimately develop their own unit, with guidance. The PD model also attends to the key features of effective PD as described by Desimone (2009) and others. We adhered closely to the design principles of the face-to-face model as described by Roth et al., 2018. Methods We followed a design-based research approach (DBR: Cobb et al., 2003: Shavelson et al., 2003) to examine the online program components and how they promoted or interfered with the development of teachers’ knowledge and reflective practice. Of central interest was the examination of mechanisms for facilitating teacher learning (Confrey, 2006). To accomplish this goal, design researchers engaged in iterative cycles of problem analysis, design, implementation, examination, and redesign (Wang & Hannafin, 2005). Data We iteratively designed, tested, and revised 17 modules across three pilot versions. Three small groups of teachers engaged in both synchronous and asynchronous components of the larger online course. They responded to surveys and took part in interviews related to the PD. The PD facilitators took extensive notes after each iteration. The development team met weekly to discuss revisions. Results We found that community building required the same incremental trust-building activities that occur in face-to-face PD. Teachers began with low-risk activities and gradually engaged in activities that required greater vulnerability (sharing a video of themselves teaching a model unit for analysis and critique by the group). We also identified how to contextualize technical tools with instructional prompts to allow teachers to productively interact with one another about science ideas asynchronously. As part of that effort, we crafted crux questions to surface teachers’ confusions or challenges related to content or pedagogy. Facilitators leveraged asynchronous responses to crux questions in the synchronous sessions to push teacher thinking further than would have otherwise been possible in a 2-hour synchronous video-conference. Significance Supporting teachers with effective, flexible, and cost-efficient PD is difficult under the best of circumstances. In the era of COVID-19, online PD has taken on new urgency. AERA members will gain insight into the construction of an online PD for elementary science teachers/ Full digital poster available at: https://aera21-aera.ipostersessions.com/default.aspx?s=64-5F-86-2E-15-F8-C3-C0-45-C6-A0-B7-1D-90-BE-46 
    more » « less