Recognizing the importance of whole genome alignment (WGA), the National Institutes for Health maintains LASTZ, a sequential WGA application. As genomic data grows, there is a compelling need for scalable, high-performance WGA. Unfortunately, high-sensitivity, `gapped' alignment which uses dynamic programming (DP) is slow, whereas faster alignment with ungapped filtering is often less sensitive. We develop FastZ, a GPU-accelerated, gapped WGA software which matches gapped LASTZ in sensitivity. FastZ employs a novel inspector-executor scheme in which (a) the lightweight inspector elides DP traceback except in common, extremely short alignments, where the inspector performs limited, eager traceback to eliminate the executor, and (b) executor trimming avoids unnecessary work. Further, FastZ employs register-based cyclic-buffering to drastically reduce memory traffic, and groups DP problems by size for load balance. FastZ running on an RTX 3080 GPU and our multicore implementation of LASTZ achieve 111x and 20x speedups over the sequential LASTZ, respectively.
more »
« less
Supporting Multidimensional Data Analysis for High-School Students in the Era of Machine Learning
Machine Learning (ML) opens exciting scientific opportunities in K-12 STEM classrooms. However, students struggle with interpreting ML patterns due to limited data literacy. Face glyphs offer unique benefit by leveraging our brain’s facial feature processing. Yet, they have limitations like lacking contextual information and data biases. To address this, we created three enhanced face glyph visualizations: feature-independent and feature-aligned range views, and the sequential feature inspector. In a study with 25 high school students, feature-aligned range visualization helped contextual analysis, and the sequential feature inspector reduced missing data risks. Face glyphs also benefit the global interpretation of data.
more »
« less
- Award ID(s):
- 2225227
- PAR ID:
- 10519137
- Publisher / Repository:
- International Society of the Learning Sciences
- Date Published:
- Page Range / eLocation ID:
- 1255 to 1258
- Format(s):
- Medium: X
- Location:
- Buffalo, New York
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This research paper systematically identifies the perceptions of learning machine learning (ML) topics. To keep up with the ever-increasing need for professionals with ML expertise, for-profit and non-profit organizations conduct a wide range of ML-related courses at undergraduate and graduate levels. Despite the availability of ML-related education materials, there is lack of understanding how students perceive ML-related topics and the dissemination of ML-related topics. A systematic categorization of students' perceptions of these courses can aid educators in understanding the challenges that students face, and use that understanding for better dissemination of ML-related topics in courses. The goal of this paper is to help educators teach machine learning (ML) topics by providing an experience report of students' perceptions related to learning ML. We accomplish our research goal by conducting an empirical study where we deploy a survey with 83 students across five academic institutions. These students are recruited from a mixture of undergraduate and graduate courses. We apply a qualitative analysis technique called open coding to identify challenges that students encounter while studying ML-related topics. Using the same qualitative analysis technique we identify quality aspects do students prioritize ML-related topics. From our survey, we identify 11 challenges that students face when learning about ML topics, amongst which data quality is the most frequent, followed by hardware-related challenges. We observe the majority of the students prefer hands-on projects over theoretical lectures. Furthermore, we find the surveyed students to consider ethics, security, privacy, correctness, and performance as essential considerations while developing ML-based systems. Based on our findings, we recommend educators who teach ML-related courses to (i) incorporate hands-on projects to teach ML-related topics, (ii) dedicate course materials related to data quality, (iii) use lightweight virtualization tools to showcase computationally intensive topics, such as deep neural networks, and (iv) empirical evaluation of how large language models can be used in ML-related education.more » « less
-
People with disabilities are underrepresented in STEM as well as information, communication, and technology (ICT) careers. The underrepresentation of individuals with disabilities in STEM may reflect systemic issues of access. Curricular materials that allow students to demonstrate their current fraction knowledge through multiple means and provide opportunities to share and explain their thinking with others may address issues of access students face in elementary school. In this study, we employed a sequential mixed-methods design to investigate how game-enhanced fraction intervention impacts students’ fraction knowledge, engagement, and STEM interests. Quantitative results revealed statistically significant effects of the program on students’ fraction understanding and engagement but not their STEM interest. Qualitative analyses revealed three themes—(1) Accessible, Enjoyable Learning, (2) Can’t Relate, and (3) Dreaming Bigger—that provided contextual backing for the quantitative results. Implications for future research and development are shared.more » « less
-
null (Ed.)The rapid increase in both quantity and complexity of data that are being generated daily in the field of environmental science and engineering (ESE) demands accompanied advancement in data analytics. Advanced data analysis approaches, such as machine learning (ML), have become indispensable tools for revealing hidden patterns or deducing correlations for which conventional analytical methods face limitations or challenges. However, ML concepts and practices have not been widely utilized by researchers in ESE. This feature explores the potential of ML to revolutionize data analysis and modeling in the ESE field, and covers the essential knowledge needed for such applications. First, we use five examples to illustrate how ML addresses complex ESE problems. We then summarize four major types of applications of ML in ESE: making predictions; extracting feature importance; detecting anomalies; and discovering new materials or chemicals. Next, we introduce the essential knowledge required and current shortcomings in ML applications in ESE, with a focus on three important but often overlooked components when applying ML: correct model development; proper model interpretation; and sound applicability analysis. Finally, we discuss challenges and future opportunities in the application of ML tools in ESE to highlight the potential of ML in this field.more » « less
-
Brain responses in visual cortex are typically modeled as a positively and negatively weighted sum of all features within a deep neural network (DNN) layer. However, this linear fit can dramatically alter a given feature space, making it unclear whether brain prediction levels stem more from the DNN itself, or from the flexibility of the encoding model. As such, studies of alignment may benefit from a paradigm shift toward more constrained and theoretically driven mapping methods. As a proof of concept, here we present a case study of face and scene selectivity, showing that typical encoding analyses do not differentiate between aligned and misaligned tuning bases in model-to-brain predictivity. We introduce a new alignment complexity measure -- tuning reorientation -- which favors DNNs that achieve high brain alignment via minimal distortion of the original feature space. We show that this measure helps arbitrate between models that are superficially equal in their predictivity, but which differ in alignment complexity. Our experiments broadly signal the benefit of sparse, positive-weighted encoding procedures, which directly enforce an analogy between the tuning directions of model and brain feature spaces.more » « less
An official website of the United States government

