rel:: [[protobuf]] [[golang]] [[Profiling]] > [!note] > [[markdown]] deck for presentation w/ speaker notes # `fastdelta` #### Cheap Protocol Buffer Parsing --- # `fastdelta` - slack: **#profiling-go** and **#profiling** - who: - **Nick Ripley**, _the talented_ - **Paul Bauer**, _yours truly_ - special thanks: - Felix _"Mr. Go Profiler"_ Geisendörfer - Richard _"Richie"_ Artoul note: Hi, I'm Paul Bauer, engineer on the Datadog Continuous Profiler. Nick Ripley and I want to show you a go profiler improvement we call "fastdelta" Special thanks to - Felix for advice and prototyping - Richie for the molecule library, underpinning this technique --- ![[why is fastdelta.jpeg]] note: What is fastdelta? --- - What is a `golang` delta profile? - Why are delta profiles expensive? - What are we doing to fix it?\* <p/> \*we put **fast** in front of **delta** ![[CleanShot 2022-10-26 at 23.43.40@2x.png]] note: To answer that and why it's AWESOME ... some background --- #### Datadog Profiling Product ![[CleanShot 2022-10-27 at 00.33.47@2x.png|500]] note: This is a flamegraph rendering of a CPU profile ... --- ### Protocol Buffers - binary message serialization format - harbinger of ~~doom~~ gRPC - the Datadog Profiler submits [golang](https://go.dev/) profiles as profile protobuf messages - **pprof** note: ...but it starts life as a profile protocol buffer - pprofs - if you aren't Java --- ##### pprof in memory (a dramatization) ![[pprof simple model.excalidraw.png]] note: This is a simplified model of a pprof fully hydrated in-memory I've seen serialized pprofs that are 350KiB on the wire blow up to 30+MiB in memory. And the deep reference chains can make garbage collection EXPENSIVE, especially in the mark phase. So, if you really wanted to thrash your GC, you'd - keep multiple pprofs on heap - for long periods - then throw them away --- #### Delta &nbsp; Profiles - go runtime creates life-of-the-process profiles for - heap - mutex - block - knowing what allocations happened in the last minute is more useful - Delta profiles are **diffs** of lifetime profiles ```go previous.Scale(-1) delta, err := profile.Merge( []*profile.Profile{current, previous} ) ``` note: Which ... is exactly what the Datadog profiler does to create Delta Profiles. The go runtime gives us a profile of all allocations since the process started. But that's less useful than a profile of all the allocations in the last minute. So we take two profiles a minute apart and diff them. Key difference is we can aggregate delta profiles. --- #### How bad can it be? ![[CleanShot 2022-10-27 at 02.32.59@2x.png]] note: How bad can it be? Well ... pretty bad. This shows a heap object count profile for one of our production services at Datadog, and delta profiles are accounting for 62%! Now they don't behave this badly for all services, but in the worst cases, customers can't use profile aggregation because of delta overhead costs. --- ## Fastdelta - low allocation, streaming pprof parser - does not unpack an entire profile in memory - no reference chains - built on [molecule](https://github.com/richardartoul/molecule) - overhead - hashed sample values from the previous profile cycle - some memory for index structures reused on each diff note: - streaming pprof parser that sips memory - does not unpack an entire profile all at once - only uses simple maps and slices - no reference chains - built on [molecule](https://github.com/richardartoul/molecule) - **OF NOTE** the go standard library and Google Cloud profiler creates deltas the slow way - as far as we know, fastdelta is novel in the continuous profiler space --- ### Results ![[abr-3291736298.gif|600]] - [memory in use and allocation rate](https://app.datadoghq.com/profiling/comparison?query=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29&compare_end_A=1666812600000&compare_end_B=1666812600000&compare_query_A=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29%20env%3Aprod%20datacenter%3Aus1.prod.dog%20%20%20availability-zone%3A%28us-east-1b%20OR%20us-east-1e%29&compare_query_B=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29%20env%3Aprod%20datacenter%3Aus1.prod.dog%20availability-zone%3Aus-east-1a&compare_start_A=1666809000000&compare_start_B=1666809000000&compareValuesMode=absolute&my_code=disabled&profile_type=heap-live-size&viz=flame_graph&start=1666808501079&end=1666809401079&paused=false) - [allocations and GC overhead](https://app.datadoghq.com/profiling/comparison?query=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29&compare_end_A=1666812600000&compare_end_B=1666812600000&compare_query_A=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29%20env%3Aprod%20datacenter%3Aus1.prod.dog%20%20%20availability-zone%3A%28us-east-1b%20OR%20us-east-1e%29&compare_query_B=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29%20env%3Aprod%20datacenter%3Aus1.prod.dog%20availability-zone%3Aus-east-1a&compare_start_A=1666809000000&compare_start_B=1666809000000&compareValuesMode=absolute&my_code=disabled&op_filter=focus_on%28package%3A%22gopkg.in%2FDataDog%2Fdd-trace-go.v1%2Fprofiler%22%29&profile_type=alloc-size&viz=flame_graph&start=1666808501079&end=1666809401079&paused=false) note: - show Memory In Use (click) - explain the comparison view setup - logs-event-store-reader - production load, shadows - A - old delta method - B - fastdelta - Memory In Use - 655 MiB vs 18 MiB, 36x improvement - switch to pre-queued tab - Memory allocations/minute - 4.8 GiB vs 103 MiB, 46x improvement - **CPU - GC Overhead** call this out!! --- ### Links - coming soon! &nbsp; `dd-trace-go 1.44.0` 🤞 - [Reducing Go Delta Profile Overhead](https://docs.google.com/document/d/14iqTExUSgs_p_WI86qrikXgTxa8Sdp7cbXzIuVQVQ3Y/edit#heading=h.t7edd1unuztu) - <https://github.com/richardartoul/molecule> - <https://github.com/DataDog/dd-trace-go/pull/1511> ![[link.gif]] notes: If you have any questions about - this technique - drawbacks - the layers of tests we have (which should be a lighting talk on its own) - hit up the links or see us in \#profiling-go slack --- ### Considerations - more complicated than [github.com/google/pprof/profile](github.com/google/pprof/profile) - multiple passes, message order not defined - index pass - delta computation pass - pruning pass, drop unneeded messages - write pruned string table - Testing - fuzz testing - fidelity checks on live systems ---