skip to main content


Title: Data Literacies and Social Justice: Exploring Critical Data Literacies Through Sociocultural Perspectives
I've already deposited this and this record is a duplicate. I apparently can't move on with the project report unless I submit a duplicate for some reason.  more » « less
Award ID(s):
1900606
NSF-PAR ID:
10209456
Author(s) / Creator(s):
; ;
Editor(s):
Gresalfi, M. &
Date Published:
Journal Name:
Proceedings of the International Conference for the Learning Sciences
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Molecular ecologists seek to genotype hundreds to thousands of loci from hundreds to thousands of individuals at minimal cost per sample. Current methods, such as restriction‐site‐associatedDNAsequencing (RADseq) and sequence capture, are constrained by costs associated with inefficient use of sequencing data and sample preparation. Here, we introduceRADcap, an approach that combines the major benefits ofRADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches.RADcap uses a new version of dual‐digestRADseq (3RAD) to identify candidateSNPloci for capture bait design and subsequently uses custom sequence capture baits to consistently enrich candidateSNPloci across many individuals. We combined this approach with a new library preparation method for identifying and removingPCRduplicates from 3RADlibraries, which allows researchers to processRADseq data using traditional pipelines, and we tested theRADcap method by genotyping sets of 96–384Wisteriaplants. Our results demonstrate that ourRADcap method: (i) methodologically reduces (to <5%) and allows computational removal ofPCRduplicate reads from data, (ii) achieves 80–90% reads on target in 11 of 12 enrichments, (iii) returns consistent coverage (≥4×) across >90% of individuals at up to 99.8% of the targeted loci, (iv) produces consistently high occupancy matrices of genotypes across hundreds of individuals and (v) costs significantly less than current approaches.

     
    more » « less
  2. Rebekah, Rogers (Ed.)
    Abstract Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication. 
    more » « less
  3. With the advancement and dominant service of Internet videos, the content-based video deduplication system becomes an essential and dependent infrastructure for Internet video service. However, the explosively growing video data on the Internet challenges the system design and implementation for its scalability in several ways. (1) Although the quantization-based indexing techniques are effective for searching visual features at a large scale, the costly re-training over the complete dataset must be done periodically. (2) The high-dimensional vectors for visual features demand increasingly large SSD space, degrading I/O performance. (3) Videos crawled from the Internet are diverse, and visually similar videos are not necessarily the duplicates, increasing deduplication complexity. (4) Most videos are edited ones. The duplicate contents are more likely discovered as clips inside the videos, demanding processing techniques with close attention to details. To address above-mentioned issues, we propose Maze, a full-fledged video deduplication system. Maze has an ANNS layer that indexes and searches the high dimensional feature vectors. The architecture of the ANNS layer supports efficient reads and writes and eliminates the data migration caused by re-training. Maze adopts the CNN-based feature and the ORB feature as the visual features, which are optimized for the specific video deduplication task. The features are compact and fully reside in the memory. Acoustic features are also incorporated in Maze so that the visually similar videos but having different audio tracks are recognizable. A clip-based matching algorithm is developed to discover duplicate contents at a fine granularity. Maze has been deployed as a production system for two years. It has indexed 1.3 billion videos and is indexing ~800 thousand videos per day. For the ANNS layer, the average read latency is 4 seconds and the average write latency is at most 4.84 seconds. The re-training over the complete dataset is no longer required no matter how many new data sets are added, eliminating the costly data migration between nodes. Maze recognizes the duplicate live streaming videos with both the similar appearance and the similar audio at a recall of 98%. Most importantly, Maze is also cost-effective. For example, the compact feature design helps save 5800 SSDs and the computation resources devoted to running the whole system decrease to 250K standard cores per billion videos. 
    more » « less
  4. Abstract

    210Bi (t1/2: 5.01 d)—the daughter of210Pb and parent of210Po—has rarely been measured in aquatic systems, and its behavior in the water column is poorly understood. In this article, I present a method for quickly measuring210Pb,210Bi, and210Po in aquatic samples, where (1)210Bi and210Po are scavenged onto an anion solid‐phase extraction disk within 15 min of pretreating the sample; (2) beta decay of210Bi is counted on the disk immediately thereafter; (3)210Po is subsequently removed from the disk and redeposited on a copper plate for α‐spectroscopy; and (4)210Pb is determined via the ingrowth of210Bi. I present decay‐corrected calculations for total, dissolved, and particle‐bound fractions of each nuclide and conclude with an analysis of210Pb,210Bi, and210Po activities in rain,dreissenid(quagga) mussels, and water samples from the Milwaukee Inner Harbor in Lake Michigan. Results show that the loss of lead on the anion solid‐phase extraction disks was negligible (0.2% ± 2.1%; ± 1 SD,n= 4), and the sorption of bismuth was complete (99% ± 2%; ± 1 SD,n= 16). Relative mean absolute deviations of duplicate sample analyses of lake water were 2.4% ± 1.9% for210Pb (geometric mean of total sample activity: 3.0 disintegrations per minute [dpm],n= 6), 7.7% ± 5.8% for210Bi (geometric mean of total sample activity: 2.6 dpm,n= 8), and 2.7% ± 1.7% for210Po (geometric mean of total sample activity: 1.4 dpm,n= 8).

     
    more » « less
  5. Background and Context: In this theory paper, we explore the concept of translanguaging from bilingual education, and its implications for teaching and learning programming and computing in especially computer science (CS) for all initiatives. Objective: We use translanguaging to examine how programming is and isn't like using human languages. We frame CS as computational literacies. We describe a pedagogical approach for teaching computational literacies. Method: We review theory from applied linguistics, literacy, and computational literacy. We provide a design narrative of our pedagogical approach by describing activities from bilingual middle school classrooms integrating Scratch into academic subjects. Findings: Translanguaging pedagogy can leverage learners' (bilingual and otherwise) full linguistic repertoires as they engage with computational literacies. Implications: Our data helps demonstrate how translanguaging can be mobilized to do CS, which has implications for increasing equitable participation in computer science. 
    more » « less