skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Failure Prediction by Utilizing Log Analysis: A Systematic Mapping Study
In modern computing, log files provide a wealth of information regarding the past of a system, including the system failures and security breaches that cost companies and developers a fortune in both time and money. While this information can be used to attempt to recover from a problem, such an approach merely mitigates the damage that has already been done. Detecting problems, however, is not the only information that can be gathered from log files. It is common knowledge that segments of log files, if analyzed correctly, can yield a good idea of what the system is likely going to do next in real-time, allowing a system to take corrective action before any negative actions occur. In this paper, the authors put forth a systematic map of this field of log prediction, screening several hundred papers and finally narrowing down the field to approximately 30 relevant papers. These papers, when broken down, give a good idea of the state of the art, methodologies employed, and future challenges that still must be overcome. Findings and conclusions of this study can be applied to a variety of software systems and components, including classical software systems, as well as software parts of control, or the Internet of Things (IoT) systems.  more » « less
Award ID(s):
1854049
PAR ID:
10203754
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
RACS '20: Proceedings of the International Conference on Research in Adaptive and Convergent Systems
Page Range / eLocation ID:
188 to 195
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Log analysis is a technique of deriving knowledge from log files containing records of events in a computer system. A common application of log analysis is to derive critical information about a system's security issues and intrusions, which subsequently leads to being able to identify and potentially stop intruders attacking the system. However, many systems produce a high volume of log data with high frequency, posing serious challenges in analysis. This paper contributes with a systematic literature review and discusses current trends, advancements, and future directions in log security analysis within the past decade. We summarized current research strategies with respect to technology approaches from 34 current publications. We identified limitations that poses challenges to future research and opened discussion on issues towards logging mechanism in the software systems. Findings of this study are relevant for software systems as well as software parts of the Internet of Things (IoT) systems. 
    more » « less
  2. Adversarial attacks against supervised learning a algorithms, which necessitates the application of logging while using supervised learning algorithms in software projects. Logging enables practitioners to conduct postmortem analysis, which can be helpful to diagnose any conducted attacks. We conduct an empirical study to identify and characterize log-related coding patterns, i.e., recurring coding patterns that can be leveraged to conduct adversarial attacks and needs to be logged. A list of log-related coding patterns can guide practitioners on what to log while using supervised learning algorithms in software projects. We apply qualitative analysis on 3,004 Python files used to implement 103 supervised learning-based software projects. We identify a list of 54 log-related coding patterns that map to six attacks related to supervised learning algorithms. Using Lo g Assistant to conduct P ostmortems for Su pervised L earning ( LOPSUL ) , we quantify the frequency of the identified log-related coding patterns with 278 open-source software projects that use supervised learning. We observe log-related coding patterns to appear for 22% of the analyzed files, where training data forensics is the most frequently occurring category. 
    more » « less
  3. Large language models (LLMs) have demonstrated abilities to perform complex tasks in multiple domains, including mathematical and scientific reasoning. We demonstrate that with carefully designed prompts, LLMs can accurately carry out key calculations in research papers in theoretical physics. We focus on a broadly-used approximation method in quantum physics: the Hartree-Fock method, requiring an analytic multi-step calculation deriving approximate Hamiltonian and corresponding self-consistency equations. To carry out the calculations using LLMs, we design multi-step prompt templates that break down the analytic calculation into standardized steps with placeholders for problem-specific information. We evaluate GPT-4’s performance in executing the calculation for 15 papers from the past decade, demonstrating that, with the correction of intermediate steps, it can correctly derive the final Hartree-Fock Hamiltonian in 13 cases. Aggregating across all research papers, we find an average score of 87.5 (out of 100) on the execution of individual calculation steps. We further use LLMs to mitigate the two primary bottlenecks in this evaluation process: (i) extracting information from papers to fill in templates and (ii) automatic scoring of the calculation steps, demonstrating good results in both cases. 
    more » « less
  4. {"Abstract":["This data set provides files needed to run the simulations described in the manuscript entitled "Organic contaminants and atmospheric nitrogen at the graphene\u2013water interface: A simulation study" using the molecular dynamics software NAMD and LAMMPS. The output of the simulations, as well as scripts used to analyze this output, are also included. The files are organized into directories corresponding to the figures of the main text and supplementary information. They include molecular model structure files (NAMD psf), force field parameter files (in CHARMM format), initial atomic coordinates (pdb format), NAMD or LAMMPS configuration files, Colvars configuration files, NAMD log files, and NAMD output including restart files (in binary NAMD format) and some trajectories in dcd format (downsampled). Analysis is controlled by shell scripts (Bash-compatible) that call VMD Tcl scripts. A modified LAMMPS C++ source file is also included.<\/p>"],"Other":["This material is based upon work supported by the National Science Foundation under Grant No. DMR-1945589. A majority of the computing for this project was performed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CHE-1726332, CNS-1006860, EPS-1006860, and EPS-0919443. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) and Stampede2 at the Texas Advanced Computing Center through allocation BIO200030, which is supported by National Science Foundation grant number ACI-1548562."]} 
    more » « less
  5. null; null; null (Ed.)
    Distributed systems are seeing wider use as software becomes more complex and cloud systems increase in popularity. Preforming anomaly detection and other log analysis procedures on distributed systems have not seen much research. To this end, we propose a simple and generic method of clustering log statements from separate log files to perform future log analysis. We identify variable components of log statements and find matches of these variables between the sources. After scoring the variables, we select the one with the highest score to be the clustering basis. We performed a case study of our method on the two open-source projects, to which we found success in the results of our method and created an open-source project log-matcher. 
    more » « less