rel:: [[protobuf]] [[golang]] [[Profiling]]
> [!note]
> [[markdown]] deck for presentation w/ speaker notes
# `fastdelta`
#### Cheap Protocol Buffer Parsing
---
# `fastdelta`
- slack: **#profiling-go** and **#profiling**
- who:
- **Nick Ripley**, _the talented_
- **Paul Bauer**, _yours truly_
- special thanks:
- Felix _"Mr. Go Profiler"_ Geisendörfer
- Richard _"Richie"_ Artoul
note:
Hi, I'm Paul Bauer, engineer on the Datadog Continuous Profiler.
Nick Ripley and I want to show you a go profiler improvement we call "fastdelta"
Special thanks to
- Felix for advice and prototyping
- Richie for the molecule library, underpinning this technique
---
![[why is fastdelta.jpeg]]
note:
What is fastdelta?
---
- What is a `golang` delta profile?
- Why are delta profiles expensive?
- What are we doing to fix it?\*
<p/>
\*we put **fast** in front of **delta**
![[CleanShot 2022-10-26 at 23.43.40@2x.png]]
note:
To answer that and why it's AWESOME ... some background
---
#### Datadog Profiling Product
![[CleanShot 2022-10-27 at 00.33.47@2x.png|500]]
note:
This is a flamegraph rendering of a CPU profile ...
---
### Protocol Buffers
- binary message serialization format
- harbinger of ~~doom~~ gRPC
- the Datadog Profiler submits [golang](https://go.dev/) profiles as profile protobuf messages - **pprof**
note:
...but it starts life as a profile protocol buffer - pprofs - if you aren't Java
---
##### pprof in memory (a dramatization)
![[pprof simple model.excalidraw.png]]
note:
This is a simplified model of a pprof fully hydrated in-memory
I've seen serialized pprofs that are 350KiB on the wire blow up to 30+MiB in memory.
And the deep reference chains can make garbage collection EXPENSIVE, especially in the mark phase.
So, if you really wanted to thrash your GC, you'd
- keep multiple pprofs on heap
- for long periods
- then throw them away
---
#### Delta Profiles
- go runtime creates life-of-the-process profiles for
- heap
- mutex
- block
- knowing what allocations happened in the last minute is more useful
- Delta profiles are **diffs** of lifetime profiles
```go
previous.Scale(-1)
delta, err := profile.Merge(
[]*profile.Profile{current, previous}
)
```
note:
Which ... is exactly what the Datadog profiler does to create Delta Profiles.
The go runtime gives us a profile of all allocations since the process started.
But that's less useful than a profile of all the allocations in the last minute.
So we take two profiles a minute apart and diff them.
Key difference is we can aggregate delta profiles.
---
#### How bad can it be?
![[CleanShot 2022-10-27 at 02.32.59@2x.png]]
note:
How bad can it be? Well ... pretty bad.
This shows a heap object count profile for one of our production services at Datadog, and delta profiles are accounting for 62%!
Now they don't behave this badly for all services, but in the worst cases, customers can't use profile aggregation because of delta overhead costs.
---
## Fastdelta
- low allocation, streaming pprof parser
- does not unpack an entire profile in memory
- no reference chains
- built on [molecule](https://github.com/richardartoul/molecule)
- overhead
- hashed sample values from the previous profile cycle
- some memory for index structures reused on each diff
note:
- streaming pprof parser that sips memory
- does not unpack an entire profile all at once
- only uses simple maps and slices
- no reference chains
- built on [molecule](https://github.com/richardartoul/molecule)
- **OF NOTE** the go standard library and Google Cloud profiler creates deltas the slow way
- as far as we know, fastdelta is novel in the continuous profiler space
---
### Results
![[abr-3291736298.gif|600]]
- [memory in use and allocation rate](https://app.datadoghq.com/profiling/comparison?query=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29&compare_end_A=1666812600000&compare_end_B=1666812600000&compare_query_A=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29%20env%3Aprod%20datacenter%3Aus1.prod.dog%20%20%20availability-zone%3A%28us-east-1b%20OR%20us-east-1e%29&compare_query_B=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29%20env%3Aprod%20datacenter%3Aus1.prod.dog%20availability-zone%3Aus-east-1a&compare_start_A=1666809000000&compare_start_B=1666809000000&compareValuesMode=absolute&my_code=disabled&profile_type=heap-live-size&viz=flame_graph&start=1666808501079&end=1666809401079&paused=false)
- [allocations and GC overhead](https://app.datadoghq.com/profiling/comparison?query=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29&compare_end_A=1666812600000&compare_end_B=1666812600000&compare_query_A=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29%20env%3Aprod%20datacenter%3Aus1.prod.dog%20%20%20availability-zone%3A%28us-east-1b%20OR%20us-east-1e%29&compare_query_B=service%3Alogs-event-store-reader%20pod_name%3A%28logs-event-store-reader-shadow-%2A%20OR%20logs-event-store-reader-alerting-shadow-%2A%29%20env%3Aprod%20datacenter%3Aus1.prod.dog%20availability-zone%3Aus-east-1a&compare_start_A=1666809000000&compare_start_B=1666809000000&compareValuesMode=absolute&my_code=disabled&op_filter=focus_on%28package%3A%22gopkg.in%2FDataDog%2Fdd-trace-go.v1%2Fprofiler%22%29&profile_type=alloc-size&viz=flame_graph&start=1666808501079&end=1666809401079&paused=false)
note:
- show Memory In Use (click)
- explain the comparison view setup
- logs-event-store-reader
- production load, shadows
- A - old delta method
- B - fastdelta
- Memory In Use
- 655 MiB vs 18 MiB, 36x improvement
- switch to pre-queued tab
- Memory allocations/minute
- 4.8 GiB vs 103 MiB, 46x improvement
- **CPU - GC Overhead** call this out!!
---
### Links
- coming soon! `dd-trace-go 1.44.0` 🤞
- [Reducing Go Delta Profile Overhead](https://docs.google.com/document/d/14iqTExUSgs_p_WI86qrikXgTxa8Sdp7cbXzIuVQVQ3Y/edit#heading=h.t7edd1unuztu)
- <https://github.com/richardartoul/molecule>
- <https://github.com/DataDog/dd-trace-go/pull/1511>
![[link.gif]]
notes:
If you have any questions about
- this technique
- drawbacks
- the layers of tests we have (which should be a lighting talk on its own)
- hit up the links or see us in \#profiling-go slack
---
### Considerations
- more complicated than [github.com/google/pprof/profile](github.com/google/pprof/profile)
- multiple passes, message order not defined
- index pass
- delta computation pass
- pruning pass, drop unneeded messages
- write pruned string table
- Testing
- fuzz testing
- fidelity checks on live systems
---