Camera uploads is a feature in our Android and iOS apps that automatically backs up a user’s photos and videos from their mobile device to Dropbox. The feature was first introduced in 2012, and uploads millions of photos and videos for hundreds of thousands of users every day. People who use camera uploads are some of our most dedicated and engaged users. They care deeply about their photo libraries, and expect their backups to be quick and dependable every time. It’s important that we offer a service they can trust.
Until recently, camera uploads was built on a C++ library shared between the Android and iOS Dropbox apps. This library served us well for a long time, uploading billions of images over many years. However, it had numerous problems. The shared code had grown polluted with complex platform-specific hacks that made it difficult to understand and risky to change. This risk was compounded by a lack of tooling support, and a shortage of in-house C++ expertise. Plus, after more than five years in production, the C++ implementation was beginning to show its age. It was unaware of platform-specific restrictions on background processes, had bugs that could delay uploads for long periods of time, and made outage recovery difficult and time-consuming.
In 2019, we decided that rewriting the feature was the best way to offer a reliable, trustworthy user experience for years to come. This time, Android and iOS implementations would be separate and use platform-native languages (Kotlin and Swift respectively) and libraries (such as WorkManager and Room for Android). The implementations could then be optimized for each platform and evolve independently, without being constrained by design decisions from the other.
This post is about some of the design, validation, and release decisions we made while building the new camera uploads feature for Android, which we released to all users during the summer of 2021. The project shipped successfully, with no outages or major issues; error rates went down, and upload performance greatly improved. If you haven’t already enabled camera uploads, you should try it out for yourself.
Designing for background reliability
The main value proposition of camera uploads is that it works silently in the background. For users who don’t open the app for weeks or even months at a time, new photos should still upload promptly.
How does this work? When someone takes a new photo or modifies an existing photo, the OS notifies the Dropbox mobile app. A background worker we call the scanner carefully identifies all the photos (or videos) that haven’t yet been uploaded to Dropbox and queues them for upload. Then another background worker, the uploader, batch uploads all the photos in the queue.
Uploading is a two step process. First, like many Dropbox systems, we break the file into 4 MB blocks, compute the hash of each block, and upload each block to the server. Once all the file blocks are uploaded, we make a final commit request to the server with a list of all block hashes in the file. This creates a new file consisting of those blocks in the user’s Camera Uploads folder. Photos and videos uploaded to this folder can then be accessed from any linked device.
One of our biggest challenges is that Android places strong constraints on how often apps can run in the background and what capabilities they have. For example, App Standby limits our background network access if the Dropbox app hasn’t recently been foregrounded. This means we might only be allowed to access the network for a 10-minute interval once every 24 hours. These restrictions have grown more strict in recent versions of Android, and the cross-platform C++ version of camera uploads was not well-equipped to handle them. It would sometimes try to perform uploads that were doomed to fail because of a lack of network access, or fail to restart uploads during the system-provided window when network access became available.
Our rewrite does not escape these background restrictions; they still apply unless the user chooses to disable them in Android’s system settings. However, we reduce delays as much as possible by taking maximum advantage of the network access we do receive. We use WorkManager to handle these background constraints for us, guaranteeing that uploads are attempted if, and only if, network access becomes available. Unlike our C++ implementation, we also do as much work as possible while offline—for example, by performing rudimentary checks on new photos for duplicates—before asking WorkManager to schedule us for network access.
Measuring interactions with our status banners helps us identify emerging issues in our apps, and is a helpful signal in our efforts to eliminate errors. After the rewrite was released, we saw users interacting with more “all done” statuses than usual, while the number of “waiting” or error status interactions went down. (This data reflects only paid users, but non-paying users show similar results.)
To further optimize use of our limited network access, we also refined our handling of failed uploads. C++ camera uploads aggressively retried failed uploads an unlimited number of times. In the rewrite we added backoff intervals between retry attempts, and als