Testing Sync at Dropbox-Isaac Goldberg

# Testing Sync at Dropbox-Isaac Goldberg ![rw-book-cover](https://readwise-assets.s3.amazonaws.com/static/images/article2.74d541386bbf.png) ## Metadata - Author:: [[Isaac Goldberg]] - Full_Title:: Testing Sync at Dropbox - URL:: <https://dropbox.tech/infrastructure/-testing-our-new-sync-engine> ## Highlights - To pull this off, we knew we would need a serious investment in automated testing. Our testing strategy gave us confidence that we were on the right track throughout the rewrite, and today it allows us to continue building and shipping new features on a quick release cycle. ([View Highlight](https://instapaper.com/read/1308663234/15319380)) ^136856783 - Emphasizing testability early, even before implementing the associated testing frameworks, was critical to ensuring that our architecture was informed appropriately. ([View Highlight](https://instapaper.com/read/1308663234/15319383)) ^136856784 - one of our core architectural principles is “Design away invalid system states.” ([View Highlight](https://instapaper.com/read/1308663234/15319387)) ^136856785 - The concurrency model of Sync Engine Classic made testing extremely challenging ([View Highlight](https://instapaper.com/read/1308663234/15319394)) ^136856786 - In Sync Engine Classic, components were free to fork threads internally. ([View Highlight](https://instapaper.com/read/1308663234/15319397)) ^136856787 - In Nucleus, we sought to make writing tests as ergonomic and predictable as possible. Nearly all of our code runs on a single “control” thread. Operations benefiting from concurrency (e.g. network I/O, filesystem I/O, and CPU-intensive work like hashing) are offloaded to dedicated threads or thread pools. When it comes to testing, we can serialize the entire system. Asynchronous requests can be serialized and run on the main thread instead of in the background. ([View Highlight](https://instapaper.com/read/1308663234/15319404)) ^136856788 - Randomized testing is the most essential part of our testing strategy. ([View Highlight](https://instapaper.com/read/1308663234/15319406)) ^136856789 - The developer experience was paramount. ([View Highlight](https://instapaper.com/read/1308663234/15319412)) ^136856790 - All randomized testing frameworks must be fully deterministic and easily reproducible. ([View Highlight](https://instapaper.com/read/1308663234/15319413)) ^136856791 - At the beginning of a random test run, generate a random seed. Instantiate a pseudorandom number generator (PRNG) with that seed. (Personally, given its name, I like this one.) Run the test using that PRNG for all random decisions, e.g. generating initial filesystem state, task scheduling, or network failure injection. If the test fails, output the seed. ([View Highlight](https://instapaper.com/read/1308663234/15319415)) ^136856792 - When a regression sneaks in, CI automatically creates a tracking task for each failing seed, including also the hash of the latest commit at the time. ([View Highlight](https://instapaper.com/read/1308663234/15319421)) ^136856793 - In order to uphold this guarantee, we take great care to make Nucleus itself fully deterministic ([View Highlight](https://instapaper.com/read/1308663234/15319427)) ^136856794 - CanopyCheck is very effective at testing our planning algorithm, but there is much more to Nucleus that still needs testing coverage. This is where Trinity comes in. ([View Highlight](https://instapaper.com/read/1308663234/15319442)) ^136856795 - Trinity - randomized integration tests, permuting on mocked system state (filesystem) and dependency service states (backend) #readwise/note - Trinity is privy to all asynchronous activity in the system. Trinity drives Nucleus on the main thread, buffering numerous intercepted requests to each of the mocked components. ([View Highlight](https://instapaper.com/read/1308663234/15319451)) ^136856796 - While Trinity is great at finding unexpected interactions between different components internal to Nucleus, mocking out so many external sources of nondeterminism comes at a cost. ([View Highlight](https://instapaper.com/read/1308663234/15319455)) ^136856797 - When running Trinity against our in-memory filesystem mock, we lose all coverage of this area. To cover this layer of our codebase, we also run Trinity in a “native” mode, targeting the platform’s actual filesystem. However, running against the native filesystem incurs a huge performance penalty (roughly 10x), which in turn means Trinity Native can’t test as many different seeds. ([View Highlight](https://instapaper.com/read/1308663234/15319459)) ^136856798 - In order to deliver on its reproducibility promises, Trinity serializes all calls into the native platform APIs, to avoid any nondeterminism that could arise from the OS interleaving system calls. ([View Highlight](https://instapaper.com/read/1308663234/15319460)) ^136856799 - Trinity cannot actually reboot the machine mid-test, so it can’t validate that we use fsync in all the right places to ensure crash durability on each platform. ([View Highlight](https://instapaper.com/read/1308663234/15319464)) ^136856800 - To test the protocol, we have a separate test suite called Heirloom. Heirloom operates on the same deterministic random seed principle for controlling the client’s execution, but because it necessarily talks to a real Dropbox server over the network, it must trade off some determinism guarantees. And due to the overhead of this approach (communicating across multiple language boundaries and through our full backend stack), Heirloom runs about 100x slower than Trinity. ([View Highlight](https://instapaper.com/read/1308663234/15319468)) ^136856801 - Another tradeoff Trinity makes is that, because it mocks out less of the system than CanopyCheck does, it cannot minimize failing test cases as easily. The more of Nucleus’s complex, emergent behavior we try to validate, the less we can perturb any given test’s input without making its behavior totally diverge. ([View Highlight](https://instapaper.com/read/1308663234/15319490)) ^136856802 - the more real the randomized test, the harder it is to shrink test cases #readwise/note