核心概念
Mint is a novel distributed tracing framework that addresses the limitations of traditional sampling methods by leveraging commonality and variability analysis to reduce trace overhead while capturing all requests, enabling comprehensive system observability with minimal resource consumption.
统计
A large-scale e-commerce system in Alibaba generates approximately 18.6-20.5 pebibytes (PBs) of traces per day.
Storing these traces would cost an average of $114.59k per month.
Adopting tracing introduces up to 102 MB/min of additional bandwidth between nodes.
The average miss rate for trace queries at Alibaba over 30 days was 27.17% using a combination of OpenTelemetry's head sampling and tail sampling.
More than 11% of traces exceed 1.2 MB in size.
Inter-trace pairs with commonality account for about 34% - 56% of all inter-trace pairs.
Inter-span pairs with commonality make up around 25% - 45% of all inter-span pairs.
Mint reduces storage overhead to 2.7% and network overhead to 4.2% on average.
引用
"Although distributed traces are helpful, they are often voluminous [60], making their collecting, storing, and processing extremely expensive, especially in production environments [29]."
"However, our research revealed significant shortcomings of the prevailing trace sampling techniques utilising the ‘1 or 0’ strategy, as evidenced by an empirical trace study (§ 2.2) conducted on real-world systems."
"To address the above limitations, we shift the strategy of trace overhead reduction from the ‘1 or 0’ paradigm to the ‘commonality + variability’ paradigm which parses trace data into common patterns and variable parameters, and processes them individually."