Inconsistent and incomplete applications of metadata standards and unsatisfactory approaches to connecting repository holdings across the global research infrastructure inhibit data discovery and reusability. The Realities of Academic Data Sharing (RADS) Initiative has found that institutions and researchers create and have access to the most complete metadata, but that valuable metadata found in these local institutional repositories (IRs) are not making their way into global data infrastructure such as DataCite or Crossref. This panel examines the local to global spectrum of metadata completeness, including the challenges of obtaining quality metadata at a local level, specifically at Cornell University, and the loss of metadata during the transfer processes from IRs into global data infrastructure. The metadata completeness increases over time, as users reuse data and contribute to the metadata. As metadata improves and grows, users find and develop connections within data not previously visible to them. By feeding local IR metadata into the global data infrastructure, the global infrastructure starts giving back in the form of these connections. We believe that this information will be helpful in coordinating metadata better and more effectively across data repositories and creating more robust interoperability and reusability between and among IRs. 
                        more » 
                        « less   
                    
                            
                            DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization
                        
                    
    
            Data users need relevant context and research expertise to effectively search for and identify relevant datasets. Leading data providers, such as the Inter‐university Consortium for Political and Social Research (ICPSR), offer standardized metadata and search tools to support data search. Metadata standards emphasize the machine‐readability of data and its documentation. There are opportunities to enhance dataset search by improving users' ability to learn about, and make sense of, information about data. Prior research has shown that context and expertise are two main barriers users face in effectively searching for, evaluating, and deciding whether to reuse data. In this paper, we propose a novel chatbot‐based search system, DataChat, that leverages a graph database and a large language model to provide novel ways for users to interact with and search for research data. DataChat complements data archives' and institutional repositories' ongoing efforts to curate, preserve, and share research data for reuse by making it easier for users to explore and learn about available research data. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2121789
- PAR ID:
- 10478071
- Publisher / Repository:
- Association for Information Science and Technology
- Date Published:
- Journal Name:
- Proceedings of the Association for Information Science and Technology
- Volume:
- 60
- Issue:
- 1
- ISSN:
- 2373-9231
- Page Range / eLocation ID:
- 586 to 591
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Modern science generates large complicated heterogeneous collections of data. In order to effectively exploit these data, researchers must find relevant data, and enough of its associated metadata to understand it and put it in context. This problem exists across a wide range of research domains and is ripe for a general solution. Existing ventures address these issues using ad hoc purpose-built tools. These tools explicitly represent the data relationships by embedding them in their data storage mechanisms and in their applications. While producing useful tools, these approaches tend to be difficult to extend and data relationships are not necessarily traversable symmetrically. We are building a general system for navigational metadata. The relationships between data and between annotations and data are stored as first-class objects in the system. They can be viewed as instances drawn from a small set of graph types. General-purpose programs can be written which allow users explore these graphs and gain insights into their data. This process of data navigation, successive inclusion and filtering of objects provides powerful paradigm for data exploration.more » « less
- 
            Social media provides unique opportunities for researchers to learn about a variety of phenomena—it is often publicly available, highly accessible, and affords more naturalistic observation. However, as research using social media data has increased, so too has public scrutiny, highlighting the need to develop ethical approaches to social media data use. Prior work in this area has explored users’ perceptions of researchers’ use of social media data in the context of a single platform. In this paper, we expand on that work, exploring how platforms and their affordances impact how users feel about social media data reuse. We present results from three factorial vignette surveys, each focusing on a different platform—dating apps, Instagram, and Reddit—to assess users’ comfort with research data use scenarios across a variety of contexts. Although our results highlight different expectations between platforms depending on the research domain, purpose of research, and content collected, we find that the factor with the greatest impact across all platforms is consent—a finding which presents challenges for big data researchers. We conclude by offering a sociotechnical approach to ethical decision-making. This approach provides recommendations on how researchers can interpret and respond to platform norms and affordances to predict potential data use sensitivities. The approach also recommends that researchers respond to the predominant expectation of notification and consent for research participation by bolstering awareness of data collection on digital platforms.more » « less
- 
            Agapito, G. (Ed.)The portable document format (PDF) is currently one of the most popular formats for offline sharing biomedical information. Recently, HTML-based formats for web-first biomedical information sharing have gained popularity. However, machine-interpretable information is required by literature search engines, such as Google Scholar, to index articles in a context-aware manner for accurate biomedical literature searches. The lack of technological infrastructure to add machine-interpretable metadata to expanding biomedical information, on the other hand, renders them unreachable to search engines. Therefore, we developed a portable technical infrastructure (goSemantically) and packaged it as a Google Docs add-ons. The “goSemantically” assists authors in adding machine-interpretable metadata at the terminology and document structural levels While authoring biomedical content. The “goSemantically” leverages the NCBO Bioportal resources and introduces a mechanism to annotate biomedical information with relevant machine-interpretable metadata (semantic vocabularies). The “goSemantically” also acquires schema.org meta tags designed for search engine optimization and tailored to accommodate biomedical information. Thus, individual authors can conveniently author and publish biomedical content in a truly decentralized fashion. Users can also export and host content with relevant machine-interpretable metadata (semantic vocabularies) in interoperable formats such as HTML and JSON-LD. To experience the described features, run this code with Google Docmore » « less
- 
            null (Ed.)In the aftermath of earthquake events, reconnaissance teams are deployed to gather vast amounts of images, moving quickly to capture perishable data to document the performance of infrastructure before they are destroyed. Learning from such data enables engineers to gain new knowledge about the real-world performance of structures. This new knowledge, extracted from such visual data, is critical to mitigate the risks (e.g., damage and loss of life) associated with our built environment in future events. Currently, this learning process is entirely manual, requiring considerable time and expense. Thus, unfortunately, only a tiny portion of these images are shared, curated, and actually utilized. The power of computers and artificial intelligence enables a new approach to organize and catalog such visual data with minimal manual effort. Here we discuss the development and deployment of an organizational system to automate the analysis of large volumes of post-disaster visual data, images. Our application, named the Automated Reconnaissance Image Organizer (ARIO), allows a field engineer to rapidly and automatically categorize their reconnaissance images. ARIO exploits deep convolutional neural networks and trained classifiers, and yields a structured report combined with useful metadata. Classifiers are trained using our ground-truth visual database that includes over 140,000 images from past earthquake reconnaissance missions to study post-disaster buildings in the field. Here we discuss the novel deployment of the ARIO application within a cloud-based system that we named VISER (Visual Structural Expertise Replicator), a comprehensive cloud-based visual data analytics system with a novel Netflix-inspired technical search capability. Field engineers can exploit this research and our application to search an image repository for visual content. We anticipate that these tools will empower engineers to more rapidly learn new lessons from earthquakes using reconnaissance data.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    