Abstract We describe a recommendation system for HydroShare, a platform for scientific water data sharing. We discuss similarities, differences and challenges for implementing recommendation systems for scientific water data sharing. We discuss and analyze the behaviors that scientists exhibit in using HydroShare as documented by users’ activity logs. Unlike entertainment system users, users on HydroShare tend to be task-oriented, where the set of tasks of interest can change over time, and older interests are sometimes no longer relevant. By validating recommendation approaches against user behavior as expressed in activity logs, we conclude that a combination of content-based filtering and a latent Dirichlet allocation (LDA) topic modeling of user behavior—rather than and instead of LDA classification of dataset topics—provides a workable solution for HydroShare and compares this approach to existing recommendation methods.
more »
« less
Tessera: Discretizing Data Analysis Workflows on a Task Level
Researchers have investigated a number of strategies for capturing and analyzing data analyst event logs in order to design better tools, identify failure points, and guide users. However, this remains challenging because individual- and session-level behavioral differences lead to an explosion of complexity and there are few guarantees that log observations map to user cognition. In this paper we introduce a technique for segmenting sequential analyst event logs which combines data, interaction, and user features in order to create discrete blocks of goal-directed activity. Using measures of inter-dependency and comparisons between analysis states, these blocks identify patterns in interaction logs coupled with the current view that users are examining. Through an analysis of publicly available data and data from a lab study across a variety of analysis tasks, we validate that our segmentation approach aligns with users’ changing goals and tasks. Finally, we identify several downstream applications for our approach.
more »
« less
- Award ID(s):
- 1850195
- PAR ID:
- 10336366
- Date Published:
- Journal Name:
- CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
- Page Range / eLocation ID:
- 1 to 15
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Interactive web-based applications play an important role for both service providers and consumers. However, web applications tend to be complex, produce high-volume data, and are often ripe for attack. Attack analysis and remediation are complicated by adversary obfuscation and the difficulty in assembling and analyzing logs. In this work, we explore the web application analysis task through log file fusion, distillation, and visualization. Our approach consists of visualizing the logs of web and database traffic with detailed function execution traces. We establish causal links between events and their associated behaviors. We evaluate the effectiveness of this process using data volume reduction statistics, user interaction models, and usage scenarios. Across a set of scenarios, we find that our techniques can filter at least 97.5% of log data and reduce analysis time by 93-96%.more » « less
-
null (Ed.)Troubleshooting a distributed system can be incredibly difficult. It is rarely feasible to expect a user to know the fine-grained interactions between their system and the environment configuration of each machine used in the system. Because of this, work can grind to a halt when a seemingly trivial detail changes. To address this, there is a plethora of state-of-the-art log analysis tools, debuggers, and visualization suites. However, a user may be executing in an open distributed system where the placement of their components are not known before runtime. This makes the process of tracking debug logs almost as difficult as troubleshooting the failures these logs have recorded because the location of those logs is usually not transparent to the user (and by association the troubleshooting tools they are using). We present TLQ, a framework designed from first principles for log discovery to enable troubleshooting of open distributed systems. TLQ consists of a querying client and a set of servers which track relevant debug logs spread across an open distributed system. Through a series of examples, we demonstrate how TLQ enables users to discover the locations of their system’s debug logs and in turn use well-defined troubleshooting tools upon those logs in a distributed fashion. Both of these tasks were previously impractical to ask of an open distributed system without significant a priori knowledge. We also concretely verify TLQ’s effectiveness by way of a production system: a biodiversity scientific workflow. We note the potential storage and performance overheads of TLQ compared to a centralized, closed system approach.more » « less
-
Two-Factor Authentication (2FA) hardens an organization against user account compromise, but adds an extra step to organizations’ mission-critical tasks. We investigate to what extent quantitative analysis of operational logs of 2FA systems both supports and challenges recent results from user studies and surveys identifying usability challenges in 2FA systems. Using tens of millions of logs and records kept at two public universities, we quantify the at-scale impact on organizations and their employees during a mandatory 2FA implementation. We show the multiplicative effects of device remembrance, fragmented login services, and authentication timeouts on user burden. We find that user burden does not deviate far from other compliance and risk management time requirements already common to large organizations. We investigate the cause of more than one in twenty 2FA ceremonies being aborted or failing, and the variance in user experience across users. We hope our analysis will empower more organizations to protect themselves with 2FA.more » « less
-
We examine the association between user interactions with a checklist and task performance in a time-critical medical setting. By comparing 98 logs from a digital checklist for trauma resuscitation with activity logs generated by video review, we identified three non-compliant checklist use behaviors: failure to check items for completed tasks, falsely checking items when tasks were not performed, and inaccurately checking items for incomplete tasks. Using video review, we found that user perceptions of task completion were often misaligned with clinical practices that guided activity coding, thereby contributing to non-compliant check-offs. Our analysis of associations between different contexts and the timing of check-offs showed longer delays when (1) checklist users were absent during patient arrival, (2) patients had penetrating injuries, and (3) resuscitations were assigned to the highest acuity. We discuss opportunities for reconsidering checklist designs to reduce non-compliant checklist use.more » « less
An official website of the United States government

