Core Concepts
An LLM-based two-stage strategy is proposed to accurately localize the root-cause configuration properties for end-users based on log analysis, overcoming the challenges posed by the vast and complex configuration space.
Abstract
The paper presents an LLM-based two-stage strategy to localize configuration errors in configurable software systems via log analysis. The key highlights are:
-
Preliminary Study:
- Conducted a study on 100 configuration error reports for Hadoop, identifying two types of log symptoms: direct (explicit information about the root-cause configuration property) and indirect (lack of explicit information but pointing to other system states).
- Recognized the challenges and opportunities in utilizing logs to localize configuration errors.
-
Two-Stage Strategy:
- Anomaly Identification Stage:
- Parses logs into templates, extracts specific templates, calculates anomaly degree, and identifies key log messages related to configuration errors.
- Anomaly Inference Stage:
- Direct Inference Phase: Attempts to directly localize configuration error triggers based on rules.
- Verification Phase: Verifies the results from the Direct Inference Phase using an LLM.
- Indirect Inference Phase: Leverages an LLM to infer the suspected configuration error triggers from the key log messages.
-
Evaluation:
- Implemented a tool, LogConfigLocalizer, based on the proposed strategy.
- Established a log benchmark with configuration errors by running Hadoop workloads with mutated configurations.
- Achieved an average accuracy of 99.91% on the benchmark, outperforming a baseline tool.
- Demonstrated the effectiveness of the Verification Phase and the two parts of LLM interactions through comparative experiments.
The proposed methodology provides an efficient and accurate solution for end-users to localize configuration errors in configurable software systems, leveraging the power of LLMs and log analysis.
Stats
The numeric configuration properties in Hadoop account for 37% of the misconfigured data types.
The direct symptom in logs occupies 20% of the cases, while the indirect symptom takes up 80%.
Quotes
"Configuration errors can significantly disrupt user experiences; for example, Sweden faced domain paralysis (.se) due to DNS configuration errors [36], causing widespread inconvenience."
"Configuration errors are common software system anomalies, which are troublesome and particularly difficult to diagnose, even for experienced maintenance engineers, leading to significant side effects for companies, maintainers, and end-users [44, 46, 54]."