skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Validation of Temporal Scoring Metrics for Automatic Seizure Detection
There has been a lack of standardization of the evaluation of sequential decoding systems in the bioengineering community. Assessment of the accuracy of a candidate system’s segmentations and measurement of a false alarm rate are examples of two performance metrics that are very critical to the operational acceptance of a technology. However, measurement of such quantities in a consistent manner require many scoring software implementation details to be resolved. Results can be highly sensitive to these implementation details. In this paper, we revisit and evaluate a set of metrics introduced in our open source scoring software for sequential decoding of multichannel signals. This software was used to rank sixteen automatic seizure detection systems recently developed for the 2020 Neureka® Epilepsy Challenge. The systems produced by the participants provided us with a broad range of design variations that allowed assessment of the consistency of the proposed metrics. We present a comprehensive assessment of four of these new metrics and validate our findings with our previous studies. We also validate a proposed new metric, time-aligned event scoring, that focuses on the segmentation behavior of an algorithm. We demonstrate how we can gain insight into the performance of a system using these metrics.  more » « less
Award ID(s):
1827565
PAR ID:
10199681
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Obeid, Iyad; Selesnick, Ivan; Picone, Joseph
Date Published:
Journal Name:
Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB)
Volume:
1
Issue:
1
ISSN:
2473-716X
Page Range / eLocation ID:
1-5
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Creativity is increasingly recognized as a core competency for the 21st century, making its development a priority in education, research, and industry. To effectively cultivate creativity, researchers and educators need reliable and accessible assessment tools. Recent software developments have significantly enhanced the administration and scoring of creativity measures; however, existing software often requires expertise in experiment design and computer programming, limiting its accessibility to many educators and researchers. In the current work, we introduce CAP—the Creativity Assessment Platform—a free web application for building creativity assessments, collecting data, and automatically scoring responses (cap.ist.psu.edu). CAP allows users to create custom creativity assessments in ten languages using a simple, point-and-click interface, selecting from tasks such as the Short Story Task, Drawing Task, and Scientific Creative Thinking Test. Users can automatically score task responses using machine learning models trained to match human creativity ratings—with multilingual capabilities, including the new Cross-Lingual Alternate Uses Scoring (CLAUS), a large language model achieving strong prediction of human creativity ratings in ten languages. CAP also provides a centralized dashboard to monitor data collection, score assessments, and automatically generate text for a Methods section based on the study’s tasks, metrics, and instructions—with a single click—promoting transparency and reproducibility in creativity assessment. Designed for ease of use, CAP aims to democratize creativity measurement for researchers, educators, and everyone in between. 
    more » « less
  2. Obeid, Iyad; Selesnick, Ivan; Picone, Joseph (Ed.)
    The evaluation of machine learning algorithms in biomedical fields for ap-plications involving sequential data lacks both rigor and standardization. Common quantitative scalar evaluation metrics such as sensitivity and specificity can often be misleading and not accurately integrate application requirements. Evaluation metrics must ultimately reflect the needs of users yet be sufficiently sensitive to guide algorithm development. For example, feedback from critical care clinicians who use automated event detection software in clinical applications has been overwhelmingly emphatic that a low false alarm rate, typically measured in units of the number of errors per 24 hours, is the single most important criterion for user acceptance. Though using a single metric is not often as insightful as examining performance over a range of operating conditions, there is, nevertheless, a need for a sin-gle scalar figure of merit. In this chapter, we discuss the deficiencies of existing metrics for a seizure detection task and propose several new metrics that offer a more balanced view of performance. We demonstrate these metrics on a seizure detection task based on the TUH EEG Seizure Corpus. We introduce two promising metrics: (1) a measure based on a concept borrowed from the spoken term detection literature, Actual Term-Weighted Value, and (2) a new metric, Time-Aligned Event Scoring (TAES), that accounts for the temporal align-ment of the hypothesis to the reference annotation. We demonstrate that state of the art technology based on deep learning, though impressive in its performance, still needs significant improvement before it will meet very strict user acceptance guidelines. 
    more » « less
  3. This paper details the implementation and usage of software-based performance counters to understand the performance of a particular implementation of the MPI standard, Open MPI. Such counters can expose intrinsic features of the software stack that are not available otherwise in a generic and portable way. The PMPI-interface is useful for instrumenting MPI applications at a user level, however it is insufficient for providing meaningful internal MPI performance details. While the Peruse interface provides more detailed information on state changes within Open MPI, it has not seen widespread adoption. We introduce a simple low-level approach that instruments the Open MPI code at key locations to provide fine-grained MPI performance metrics. We evaluate the overhead associated with adding these counters to Open MPI as well as their use in determining bottlenecks and areas for improvement both in user code and the MPI implementation itself. 
    more » « less
  4. Abstract This paper describes outcomes of the 2019 Cryo-EM Model Challenge. The goals were to (1) assess the quality of models that can be produced from cryogenic electron microscopy (cryo-EM) maps using current modeling software, (2) evaluate reproducibility of modeling results from different software developers and users and (3) compare performance of current metrics used for model evaluation, particularly Fit-to-Map metrics, with focus on near-atomic resolution. Our findings demonstrate the relatively high accuracy and reproducibility of cryo-EM models derived by 13 participating teams from four benchmark maps, including three forming a resolution series (1.8 to 3.1 Å). The results permit specific recommendations to be made about validating near-atomic cryo-EM structures both in the context of individual experiments and structure data archives such as the Protein Data Bank. We recommend the adoption of multiple scoring parameters to provide full and objective annotation and assessment of the model, reflective of the observed cryo-EM map density. 
    more » « less
  5. As the number and severity of security incidents continue to increase, remediating vulnerabilities and weaknesses has become a daunting task due to the sheer number of known vulnerabilities. Different scoring systems have been developed to provide qualitative and quantitative assessments of the severity of common vulnerabilities and weaknesses, and guide the prioritization of vulnerability remediation. However, these scoring systems provide only generic rankings of common weaknesses, which do not consider the specific vulnerabilities that exist in each system. To address this limitation, and building on recent principled approaches to vulnerability scoring, we propose new common weakness scoring metrics that consider the findings of vulnerability scanners, including the number of instances of each vulnerability across a system, and enable system-specific rankings that can provide actionable intelligence to security administrators. We built a small testbed to evaluate the proposed metrics against an existing metric, and show that the results are consistent with our intuition. 
    more » « less