skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Demonstration of collaborative and interactive workflow-based data analytics in texera
Collaborative data analytics is becoming increasingly important due to the higher complexity of data science, more diverse skills from different disciplines, more common asynchronous schedules of team members, and the global trend of working remotely. In this demo we will show how Texera supports this emerging computing paradigm to achieve high productivity among collaborators with various backgrounds. Based on our active joint projects on the system, we use a scenario of social media analysis to show how a data science task can be conducted on a user friendly yet powerful platform by a multi-disciplinary team including domain scientists with limited coding skills and experienced machine learning experts. We will present how to do collaborative editing of a workflow and collaborative execution of the workflow in Texera. We will focus on data-centric features such as synchronization of operator schemas among the users during the construction phase, and monitoring and controlling the shared runtime during the execution phase.  more » « less
Award ID(s):
2107150
PAR ID:
10442816
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
15
Issue:
12
ISSN:
2150-8097
Page Range / eLocation ID:
3738 to 3741
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Domain experts play an important role in data science, as their knowledge can unlock valuable insights from data. As they often lack technical skills required to analyze data, they need collaborations with technical experts. In these joint efforts, productive collaborations are critical not only in the phase of constructing a data science task, but more importantly, during the execution of a task. This need stems from the inherent complexity of data science, which often involves user-defined functions or machine-learning operations. Consequently, collaborators want various interactions during runtime, such as pausing/resuming the execution, inspecting an operator's state, and modifying an operator's logic. To achieve the goal, in the past few years we have been developing an open-source system called Texera to support collaborative data analytics using GUI-based workflows as cloud services. In this paper, we present a holistic view of several important design principles we followed in the design and implementation of the system. We focus on different methods of sending messages to running workers, how these methods are adopted to support various runtime interactions from users, and their trade-offs on both performance and consistency. These principles enable Texera to provide powerful user interactions during a workflow execution to facilitate efficient collaborations in data analytics. 
    more » « less
  2. Abstract This manuscript shares the lessons learned from providing scientific computing support to over 600 researchers and discipline experts, helping them develop reproducible and scalable analytical workflows to process large amounts of heterogeneous data.When providing scientific computing support, focus is first placed on how to foster the collaborative aspects of multidisciplinary projects on the technological side by providing virtual spaces to communicate and share documents. Then insights on data management planning and how to implement a centralized data management workflow for data‐driven projects are provided.Developing reproducible workflows requires the development of code. We describe tools and practices that have been successful in fostering collaborative coding and scaling on remote servers, enabling teams to iterate more efficiently. We have found short training sessions combined with on‐demand specialized support to be the most impactful combination in helping scientists develop their technical skills.Here we share our experiences in enabling researchers to do science more collaboratively and more reproducibly beyond any specific project, with long‐lasting effects on the way researchers conduct science. We hope that other groups supporting team‐ and data‐driven science (in environmental science and beyond) will benefit from the lessons we have learned over the years through trial and error. 
    more » « less
  3. This poster displays results from a project supported by an NSF grant to enhance interdisciplinary collaboration in civil and environmental engineering education. In its second year, part of the project focused on improving team science competencies within the core research group. Key activities included workshops on collaborative writing and grant writing best practices. The team attended a Science of Team Science (SciTS) workshop to refine collaboration skills and responded to the Teaming Readiness Survey, which revealed strengths in valuing expertise but identified areas for improvement, such as role clarity and effective communication. In addition, the team responded to a Social Network Analysis Survey that showcased a growing network of research ties, indicating a robust collaborative environment, particularly among Principal Investigators. The preliminary results highlight a development in the team’s effectiveness and psychological safety ratings, fostering trust and collaboration. The social network evolved from professional to social connections, with new members gradually integrating into the team. The research team concludes that focusing on collaborative skills and effective communication strengthens interdisciplinary collaboration in the changing scientific landscape. 
    more » « less
  4. This poster displays results from a project supported by an NSF grant to enhance interdisciplinary collaboration in civil and environmental engineering education. In its second year, part of the project focused on improving team science competencies within the core research group. Key activities included workshops on collaborative writing and grant writing best practices. The team attended a Science of Team Science (SciTS) workshop to refine collaboration skills and responded to the Teaming Readiness Survey, which revealed strengths in valuing expertise but identified areas for improvement, such as role clarity and effective communication. In addition, the team responded to a Social Network Analysis Survey that showcased a growing network of research ties, indicating a robust collaborative environment, particularly among Principal Investigators. The preliminary results highlight a development in the team’s effectiveness and psychological safety ratings, fostering trust and collaboration. The social network evolved from professional to social connections, with new members gradually integrating into the team. The research team concludes that focusing on collaborative skills and effective communication strengthens interdisciplinary collaboration in the changing scientific landscape. 
    more » « less
  5. In this paper, we describe how we extended the Pegasus Workflow Management System to support edge-to-cloud workflows in an automated fashion. We discuss how Pegasus and HTCondor (its job scheduler) work together to enable this automation. We use HTCondor to form heterogeneous pools of compute resources and Pegasus to plan the workflow onto these resources and manage containers and data movement for executing workflows in hybrid edge-cloud environments. We then show how Pegasus can be used to evaluate the execution of workflows running on edge only, cloud only, and edge-cloud hybrid environments. Using the Chameleon Cloud testbed to set up and configure an edge-cloud environment, we use Pegasus to benchmark the executions of one synthetic workflow and two production workflows: CASA-Wind and the Ocean Observatories Initiative Orcasound workflow, all of which derive their data from edge devices. We present the performance impact on workflow runs of job and data placement strategies employed by Pegasus when configured to run in the above three execution environments. Results show that the synthetic workflow performs best in an edge only environment, while the CASA - Wind and Orcasound workflows see significant improvements in overall makespan when run in a cloud only environment. The results demonstrate that Pegasus can be used to automate edge-to-cloud science workflows and the workflow provenance data collection capabilities of the Pegasus monitoring daemon enable computer scientists to conduct edge-to-cloud research. 
    more » « less