skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Evolution of Knowledge in Social Media and Their Relationship to an Evolving Real World
The physical world evolves. The cyber world evolves and grows with big data, with social media as a major component of information growth. Classic ML models are limited by their static training data with implicit Complete and Timeless Knowledge assumptions. In an evolving world, static training data suffer from knowledge obsolescence due to truly novel timely information. Knowledge obsolescence introduces a widening distance between static ML models and the evolving world, called cyber-physical gap. Periodic retraining of new models may restore their accuracy temporarily, but subsequently their performance will deteriorate with widening cyber-physical gap. Knowledge obsolescence affects statically trained models of any size, including LLMs. Two major research challenges arise from cyber-physical gap: (1) collection and incorporation of space-time aware ground truth training data, and (2) understanding and capturing of the varying speed of information and knowledge evolution when the physical and cyber worlds evolve.  more » « less
Award ID(s):
2039653
PAR ID:
10489207
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
Proceedings of 2023 IEEE International Conference on Cognitive Machine Intelligence (CogMI)
Subject(s) / Keyword(s):
machine learning, dynamic model performance, evolution of knowledge, evolving social media
Format(s):
Medium: X
Location:
Atlanta, GA, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Cyber Physical Systems (CPS) are characterized by their ability to integrate the physical and information or cyber worlds. Their deployment in critical infrastructure have demonstrated a potential to transform the world. However, harnessing this potential is limited by their critical nature and the far reaching effects of cyber attacks on human, infrastructure and the environment. An attraction for cyber concerns in CPS rises from the process of sending information from sensors to actuators over the wireless communication medium, thereby widening the attack surface. Traditionally, CPS security has been investigated from the perspective of preventing intruders from gaining access to the system using cryptography and other access control techniques. Most research work have therefore focused on the detection of attacks in CPS. However, in a world of increasing adversaries, it is becoming more difficult to totally prevent CPS from adversarial attacks, hence the need to focus on making CPS resilient. Resilient CPS are designed to withstand disruptions and remain functional despite the operation of adversaries. One of the dominant methodologies explored for building resilient CPS is dependent on machine learning (ML) algorithms. However, rising from recent research in adversarial ML, we posit that ML algorithms for securing CPS must themselves be resilient. This article is therefore aimed at comprehensively surveying the interactions between resilient CPS using ML and resilient ML when applied in CPS. The paper concludes with a number of research trends and promising future research directions. Furthermore, with this article, readers can have a thorough understanding of recent advances on ML-based security and securing ML for CPS and countermeasures, as well as research trends in this active research area. 
    more » « less
  2. null (Ed.)
    A rapidly evolving situation such as the COVID-19 pandemic is a significant challenge for AI/ML models because of its unpredictability. The most reliable indicator of the pandemic spreading has been the number of test positive cases. However, the tests are both incomplete (due to untested asymptomatic cases) and late (due the lag from the initial contact event, worsening symptoms, and test results). Social media can complement physical test data due to faster and higher coverage, but they present a different challenge: significant amounts of noise, misinformation and disinformation. We believe that social media can become good indicators of pandemic, provided two conditions are met. The first (True Novelty) is the capture of new, previously unknown, information from unpredictably evolving situations. The second (Fact vs. Fiction) is the distinction of verifiable facts from misinformation and disinformation. Social media information that satisfy those two conditions are called live knowledge. We apply evidence-based knowledge acquisition (EBKA) approach to collect, filter, and update live knowledge through the integration of social media sources with authoritative sources. Although limited in quantity, the reliable training data from authoritative sources enable the filtering of misinformation as well as capturing truly new information. We describe the EDNA/LITMUS tools that implement EBKA, integrating social media such as Twitter and Facebook with authoritative sources such as WHO and CDC, creating and updating live knowledge on the COVID-19 pandemic. 
    more » « less
  3. Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools, and create methods for continuous integration of new information extracted from text. 
    more » « less
  4. Martin, A; Hinkelmann, K; Fill, H.-G.; Gerber, A.; Lenat, D.; Stolle, R.; van Harmelen, F. (Ed.)
    AI models for cybersecurity have to detect and defend against constantly evolving cyber threats. Much effort is spent building defenses for zero days and unseen variants of known cyber-attacks. Current AI models for cybersecurity struggle with these yet unseen threats due to the constantly evolving nature of threat vectors, vulnerabilities, and exploits. This paper shows that cybersecurity AI models will be improved and more general if we include semi-structured representations of background knowledge. This could include information about the software and systems, as well as information obtained from observing the behavior of malware samples captured and detonated in honeypots. We describe how we can transfer this knowledge into forms that the RL models can directly use for decision-making purposes. 
    more » « less
  5. Manufacturing has adopted technologies such as automation, robotics, industrial Internet of Things (IoT), and big data analytics to improve productivity, efficiency, and capabilities in the production environment. Modern manufacturing workers not only need to be adept at the traditional manufacturing technologies but also ought to be trained in the advanced data-rich computer-automated technologies. This study analyzes the data science and analytics (DSA) skills gap in today’s manufacturing workforce to identify the critical technical skills and domain knowledge required for data science and intelligent manufacturing-related jobs that are highly in-demand in today’s manufacturing industry. The gap analysis conducted in this paper on Emsi job posting and profile data provides insights into the trends in manufacturing jobs that leverage data science, automation, cyber, and sensor technologies. These insights will be helpful for educators and industry to train the next generation manufacturing workforce. The main contribution of this paper includes (1) presenting the overall trend in manufacturing job postings in the U.S., (2) summarizing the critical skills and domain knowledge in demand in the manufacturing sector, (3) summarizing skills and domain knowledge reported by manufacturing job seekers, (4) identifying the gaps between demand and supply of skills and domain knowledge, and (5) recognize opportunities for training and upskilling workforce to address the widening skills and knowledge gap. 
    more » « less