NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Submarine: A subscription‐based data streaming framework for integrating large facilities and advanced cyberinfrastructure

https://doi.org/10.1002/cpe.5256

Zamani, Ali Reza; AbdelBaky, Moustafa; Balouek‐Thomert, Daniel; Villalobos, J. J.; Rodero, Ivan; Parashar, Manish (April 2019, Concurrency and Computation: Practice and Experience)

Summary Large scientific facilities provide researchers with instrumentation, data, and data products that can accelerate scientific discovery. However, increasing data volumes coupled with limited local computational power prevents researchers from taking full advantage of what these facilities can offer. Many researchers looked into using commercial and academic cyberinfrastructure (CI) to process these data. Nevertheless, there remains a disconnect between large facilities and CI that requires researchers to be actively part of the data processing cycle. The increasing complexity of CI and data scale necessitates new data delivery models, those that can autonomously integrate large‐scale scientific facilities and CI to deliver real‐time data and insights. In this paper, we present our initial efforts using the Ocean Observatories Initiative project as a use case. In particular, we present a subscription‐based data streaming service for data delivery that leverages the Apache Kafka data streaming platform. We also show how our solution can automatically integrate large‐scale facilities with CI services for automated data processing.
more » « less
Partner‐specific prediction of RNA‐binding residues in proteins: A critical assessment

https://doi.org/10.1002/prot.25639

Jung, Yong; EL‐Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant G. (December 2018, Proteins: Structure, Function, and Bioinformatics)

Abstract RNA‐protein interactions play essential roles in regulating gene expression. While some RNA‐protein interactions are “specific”, that is, the RNA‐binding proteins preferentially bind to particular RNA sequence or structural motifs, others are “non‐RNA specific.” Deciphering the protein‐RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein‐RNA interfaces, there is a need for computational methods to identify RNA‐binding residues in proteins. While most of the existing computational methods for predicting RNA‐binding residues in RNA‐binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner‐specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner‐specific protein‐RNA interface prediction tools, PS‐PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA‐specificity metric (RSM), for quantifying the RNA‐specificity of the RNA binding residues predicted by such tools. Our results show that the RNA‐binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner‐agnostic metrics, RNA partner‐specific methods are outperformed by the state‐of‐the‐art partner‐agnostic methods. We conjecture that either (a) the protein‐RNA complexes in PDB are not representative of the protein‐RNA interactions in nature, or (b) the current methods for partner‐specific prediction of RNA‐binding residues in proteins fail to account for the differences in RNA partner‐specific versus partner‐agnostic protein‐RNA interactions, or both.
more » « less
Toward Democratizing Access to Facilities Data: A Framework for Intelligent Data Discovery and Delivery

https://doi.org/10.1109/MCSE.2022.3179408

Qin, Yubo; Rodero, Ivan; Parashar, Manish (May 2022, Computing in Science & Engineering)

Full Text Available
Leveraging user access patterns and advanced cyberinfrastructure to accelerate data delivery from shared-use scientific observatories

https://doi.org/10.1016/j.future.2021.03.004

Qin, Yubo; Rodero, Ivan; Simonet, Anthony; Meertens, Charles; Reiner, Daniel; Riley, James; Parashar, Manish (September 2021, Future Generation Computer Systems)
null (Ed.)
Full Text Available
Facilitating Data Discovery for Large-scale Science Facilities using Knowledge Networks

https://doi.org/10.1109/IPDPS49936.2021.00073

Qin, Yubo; Rodero, Ivan; Parashar, Manish (May 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS))
null (Ed.)
Large-scale multiuser scientific facilities, such as geographically distributed observatories, remote instruments, and experimental platforms, represent some of the largest national investments and can enable dramatic advances across many areas of science. Recent examples of such advances include the detection of gravitational waves and the imaging of a black hole’s event horizon. However, as the number of such facilities and their users grow, along with the complexity, diversity, and volumes of their data products, finding and accessing relevant data is becoming increasingly challenging, limiting the potential impact of facilities. These challenges are further amplified as scientists and application workflows increasingly try to integrate facilities’ data from diverse domains. In this paper, we leverage concepts underlying recommender systems, which are extremely effective in e-commerce, to address these data-discovery and data-access challenges for large-scale distributed scientific facilities. We first analyze data from facilities and identify and model user-query patterns in terms of facility location and spatial localities, domain-specific data models, and user associations. We then use this analysis to generate a knowledge graph and develop the collaborative knowledge-aware graph attention network (CKAT) recommendation model, which leverages graph neural networks (GNNs) to explicitly encode the collaborative signals through propagation and combine them with knowledge associations. Moreover, we integrate a knowledge-aware neural attention mechanism to enable the CKAT to pay more attention to key information while reducing irrelevant noise, thereby increasing the accuracy of the recommendations. We apply the proposed model on two real-world facility datasets and empirically demonstrate that the CKAT can effectively facilitate data discovery, significantly outperforming several compelling state-of-the-art baseline models.
more » « less
Full Text Available
Harnessing the Computing Continuum for Urgent Science

https://doi.org/10.1145/3439602.3439618

Balouek-Thomert, Daniel; Rodero, Ivan; Parashar, Manish (November 2020, ACM SIGMETRICS Performance Evaluation Review)

Full Text Available
An edge-aware autonomic runtime for data streaming and in-transit processing

https://doi.org/10.1016/j.future.2020.03.037

Zamani, Ali Reza; Balouek-Thomert, Daniel; Villalobos, J.J.; Rodero, Ivan; Parashar, Manish (September 2020, Future Generation Computer Systems)
null (Ed.)
Full Text Available
A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning

https://doi.org/10.1609/aaai.v34i01.5376

Fauvel, Kevin; Balouek-Thomert, Daniel; Melgar, Diego; Silva, Pedro; Simonet, Anthony; Antoniu, Gabriel; Costan, Alexandru; Masson, Véronique; Parashar, Manish; Rodero, Ivan; et al (June 2020, Proceedings of the AAAI Conference on Artificial Intelligence)

Our research aims to improve the accuracy of Earthquake Early Warning (EEW) systems by means of machine learning. EEW systems are designed to detect and characterize medium and large earthquakes before their damaging effects reach a certain location. Traditional EEW methods based on seismometers fail to accurately identify large earthquakes due to their sensitivity to the ground motion velocity. The recently introduced high-precision GPS stations, on the other hand, are ineffective to identify medium earthquakes due to its propensity to produce noisy data. In addition, GPS stations and seismometers may be deployed in large numbers across different locations and may produce a significant volume of data consequently, affecting the response time and the robustness of EEW systems.In practice, EEW can be seen as a typical classification problem in the machine learning field: multi-sensor data are given in input, and earthquake severity is the classification result. In this paper, we introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, a novel machine learning-based approach that combines data from both types of sensors (GPS stations and seismometers) to detect medium and large earthquakes. DMSEEW is based on a new stacking ensemble method which has been evaluated on a real-world dataset validated with geoscientists. The system builds on a geographically distributed infrastructure, ensuring an efficient computation in terms of response time and robustness to partial infrastructure failures. Our experiments show that DMSEEW is more accurate than the traditional seismometer-only approach and the combined-sensors (GPS and seismometers) approach that adopts the rule of relative strength.
more » « less
Full Text Available
The Virtual Data Collaboratory: A Regional Cyberinfrastructure for Collaborative Data-Driven Research

https://doi.org/10.1109/MCSE.2019.2908850

Parashar, Manish; Simonet, Anthony; Rodero, Ivan; Ghahramani, Forough; Agnew, Grace; Jantz, Ron; Honavar, Vasant (May 2020, Computing in Science & Engineering)

Full Text Available
Adversarial Attacks on Graph Neural Networks via Node Injections: A Hierarchical Reinforcement Learning Approach

https://doi.org/10.1145/3366423.3380149

Sun, Yiwei; Wang, Suhang; Tang, Xianfeng; Hsieh, Tsung-Yu; Honavar, Vasant G. (January 2020, Proceedings of The Web Conference 2020 (WWW '20))

Graph Neural Networks (GNN) offer the powerful approach to node classification in complex networks across many domains including social media, E-commerce, and FinTech. However, recent studies show that GNNs are vulnerable to attacks aimed at adversely impacting their node classification performance. Existing studies of adversarial attacks on GNN focus primarily on manipulating the connectivity between existing nodes, a task that requires greater effort on the part of the attacker in real-world applications. In contrast, it is much more expedient on the part of the attacker to inject adversarial nodes, e.g., fake profiles with forged links, into existing graphs so as to reduce the performance of the GNN in classifying existing nodes. Hence, we consider a novel form of node injection poisoning attacks on graph data. We model the key steps of a node injection attack, e.g., establishing links between the injected adversarial nodes and other nodes, choosing the label of an injected node, etc. by a Markov Decision Process. We propose a novel reinforcement learning method for Node Injection Poisoning Attacks (NIPA), to sequentially modify the labels and links of the injected nodes, without changing the connectivity between existing nodes. Specifically, we introduce a hierarchical Q-learning network to manipulate the labels of the adversarial nodes and their links with other nodes in the graph, and design an appropriate reward function to guide the reinforcement learning agent to reduce the node classification performance of GNN. The results of the experiments show that NIPA is consistently more effective than the baseline node injection attack methods for poisoning graph data on three benchmark datasets.
more » « less
Full Text Available

« Prev Next »

Search for: All records