skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Beyond Artificial Reality: Finding and Monitoring Live Events from Social Sensors
With billions of active social media accounts and millions of live video cameras, live new big data offer many opportunities for smart applications. However, the main consumers of the new big data have been humans. We envision the research on live knowledge, to automatically acquire real-time, validated, and actionable information. Live knowledge presents two significant and diverging technical challenges: big noise and concept drift. We describe the EBKA (evidence-based knowledge acquisition) approach, illustrated by the LITMUS landslide information system. LITMUS achieves both high accuracy and wide coverage, demonstrating the feasibility and promise of EBKA approach to achieve live knowledge.  more » « less
Award ID(s):
2026945
PAR ID:
10295723
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
ACM transactions on Internet technology
Volume:
20
Issue:
1
ISSN:
1533-5399
Page Range / eLocation ID:
1-21
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    A rapidly evolving situation such as the COVID-19 pandemic is a significant challenge for AI/ML models because of its unpredictability. The most reliable indicator of the pandemic spreading has been the number of test positive cases. However, the tests are both incomplete (due to untested asymptomatic cases) and late (due the lag from the initial contact event, worsening symptoms, and test results). Social media can complement physical test data due to faster and higher coverage, but they present a different challenge: significant amounts of noise, misinformation and disinformation. We believe that social media can become good indicators of pandemic, provided two conditions are met. The first (True Novelty) is the capture of new, previously unknown, information from unpredictably evolving situations. The second (Fact vs. Fiction) is the distinction of verifiable facts from misinformation and disinformation. Social media information that satisfy those two conditions are called live knowledge. We apply evidence-based knowledge acquisition (EBKA) approach to collect, filter, and update live knowledge through the integration of social media sources with authoritative sources. Although limited in quantity, the reliable training data from authoritative sources enable the filtering of misinformation as well as capturing truly new information. We describe the EDNA/LITMUS tools that implement EBKA, integrating social media such as Twitter and Facebook with authoritative sources such as WHO and CDC, creating and updating live knowledge on the COVID-19 pandemic. 
    more » « less
  2. In this position paper, we describe research on knowledge graph-empowered materials science prediction and discovery. The research consists of several key components including ontology mapping, materials data annotation, and information extraction from unstructured scholarly articles. We argue that although big data generated by simulations and experiments have motivated and accelerated the data-driven science, the distribution and heterogeneity of materials science-related big data hinders major advancements in the field. Knowledge graphs, as semantic hubs, integrate disparate data and provide a feasible solution to addressing this challenge. We design a knowledge-graph based approach for data discovery, extraction, and integration in materials science. 
    more » « less
  3. "Knowledge is power" is an old adage that has been found to be true in today's information age. Knowledge is derived from having access to information. The ability to gather information from large volumes of data has become an issue of relative importance. Big Data Analytics (BDA) is the term coined by researchers to describe the art of processing, storing and gathering large amounts of data for future examination. Data is being produced at an alarming rate. The rapid growth of the Internet, Internet of Things (IoT) and other technological advances are the main culprits behind this sustained growth. The data generated is a reflection of the environment it is produced out of, thus we can use the data we get out of systems to figure out the inner workings of that system. This has become an important feature in cybersecurity where the goal is to protect assets. Furthermore, the growing value of data has made big data a high value target. In this paper, we explore recent research works in cybersecurity in relation to big data. We highlight how big data is protected and how big data can also be used as a tool for cybersecurity. We summarize recent works in the form of tables and have presented trends, open research challenges and problems. With this paper, readers can have a more thorough understanding of cybersecurity in the big data era, as well as research trends and open challenges in this active research area. 
    more » « less
  4. The physical world evolves. The cyber world evolves and grows with big data, with social media as a major component of information growth. Classic ML models are limited by their static training data with implicit Complete and Timeless Knowledge assumptions. In an evolving world, static training data suffer from knowledge obsolescence due to truly novel timely information. Knowledge obsolescence introduces a widening distance between static ML models and the evolving world, called cyber-physical gap. Periodic retraining of new models may restore their accuracy temporarily, but subsequently their performance will deteriorate with widening cyber-physical gap. Knowledge obsolescence affects statically trained models of any size, including LLMs. Two major research challenges arise from cyber-physical gap: (1) collection and incorporation of space-time aware ground truth training data, and (2) understanding and capturing of the varying speed of information and knowledge evolution when the physical and cyber worlds evolve. 
    more » « less
  5. We live in a data-rich world. As a result, science today is fundamentally different, thanks in no small part to advances in computing. The ability to collect and analyze vast amounts of data has gotten easier. Computational Thinking (CT) practices provide the cognitive tools and skills necessary to use large data sets to investigate and solve complex scientific problems. In this big data world, virtually every scientific field is inextricably linked to computational thinking, and doing science requires scientists to merge their domain knowledge with a CT mindset. As teachers, can we identify problems where CT skills should be used? In addition, do we encourage students to leverage CT practices to solve problems? 
    more » « less