When optimizing the write performance of GreptimeDB v0.7, we discovered through flame graphs that the CPU time spent parsing Prometheus write requests accounted for about 12% of the total. In comparison, the CPU time spent on protocol parsing by VictoriaMetrics, which is implemented in Go, is only around 5%. This forced us to start considering optimizing the overhead of the protocol conversion layer.
To simplify the discussion, all the test code is stored in the GitHub repository https://github.com/v0y4g3r/prom-write-request-bench.
bash
git clone https://github.com/v0y4g3r/prom-write-request-bench
cd prom-write-request-bench
export PROJECT_ROOT=$(pwd)
Optimizing the overhead of the protocol conversion layer
Step 1: Reproduce the cases
First, let’s set up the baseline using a minimal reproducible benchmark. Corresponding branch:
bash
git checkout step1/reproduce
Rust-related benchmark code(benches/prom_decode.rs
):
rust
fn bench_decode_prom_request(c: &mut Criterion) {
let mut d = std::path::PathBuf::from(env!("CARGO_MANIFEST_DIR"));
d.push("assets");
d.push("1709380533560664458.data");
let data = Bytes::from(std::fs::read(d).unwrap());
let mut request_pooled = WriteRequest::default();
c.benchmark_group("decode")
.bench_function("write_request", |b| {
b.iter(|| {
let mut request = WriteRequest::default();
let data = data.clone();
request.merge(data).unwrap();
});
});
}
Run the benchmark command multiple times:
bash
cargo bench -- decode/write_request
To receive the baseline result:
text
decode/write_request
time: [7.3174 ms 7.3274 ms 7.3380 ms]
change: [+128.55% +129.11% +129.65%] (p = 0.00 < 0.05)
Pull the VictoriaMetrics code in the current directory to set up a Go performance testing environment:
bash
git clone https://github.com/VictoriaMetrics/VictoriaMetrics
cd VictoriaMetrics
cat <<EOF > ./lib/prompb/prom_decode_bench_test.go
package prompb
import (
"io/ioutil"
"te