Engineering

HWBP #1: Offline-first Android app architecture

We're going to look at how we made Primal faster to use, and easier to write, by using an offline-first app architecture.

Mauro Banze

Nov 11, 2020 • 6 min read

This is part of the How We Built Primal series, an exploration of how we built a modern Android app. Primal is a mobile app that allows close friends to send video messages back-and-forth.

For this article, we're going to look at how we made Primal faster to use, and easier to write, by using an offline-first app architecture.

Why offline-first

In Primal, we have the following main types of data our users want to access:

the list of chats a user is part of;
the list of members of those chats;
and the message history of each chat;

Our users open Primal several times a day as they send and receive messages. Every time they open the app, they need access to the above-listed data.

With an online-only, no-local-caching approach, our users have to wait for this data to load from our servers every single time they request it. This is a frustrating experience, as mobile networks are often slow, unreliable, and sometimes unavailable. Even when available, loading that data still takes a few seconds. This goes against our desire to offer a fast and reliable messaging experience to Primal users.

To solve this challenge, we decided to employ an offline-first architecture. There are a few characteristics of our app that make offline caching an ideal solution to the above-stated problem:

The list of chats, members, and message history changes infrequently. For instance, only occasionally do users join new groups. In contrast, the list of posts on a Facebook feed is constantly changing;
This data is long-lived. E.g.: today, our users still care about messages sent 3 weeks ago. In contrast, a list of posts on a Facebook feed is ephemeral: users don't need to reference their feed of a few weeks ago;
The same data is accessed frequently, i.e. every time our app is used.

Besides providing a snappy user experience, an offline-first architecture has the following additional benefits:

The app remains semi-usable when the device is offline: users can access their list of chats and message history. However, sending new messages is obviously not possible.
An offline-first architecture makes it a lot easier to write the app. More on this later.

The architecture

Here are the basic principles of our offline-first architecture:

UIs only display data coming from the local storage (database and key-value store); UIs never directly request data from our servers;
UIs typically observe/listen to local data rather than one-shot querying it. This ensures UIs are automatically updated when local data is updated;
The local data is kept up-to-date with the server database via a synchronization operation. A sync operation queries the server for new or modified data and then caches it locally. Sync operations can be triggered in various situations: e.g. user opening a screen, or a push notification informing the client that new data is available on the server.

Let's look at a concrete example. On Primal, users are first presented with a list of the chats they are part of. This screen observes the local database for this data and displays it. Simultaneously, the screen triggers a background sync operation that asks the server "are there updates to the list of chats of the current user?". If there are (e.g. a new user joined one of the chats), these updates are saved to the local database, which then triggers the list of chats to be updated on the screen.

With this architecture, our app feels fast. When users open the app, content is fetched from the local storage and displayed within a few milliseconds, much faster than querying from the network can ever be.

The synchronization algorithm

Synchronization can vary greatly in complexity depending on your use-case, requirements, and constraints. Synchronizing a Google Docs file may require a very different algorithm than synchronizing a Spotify playlist across devices.

In Primal, we need to synchronize the following data:

The list of chats the logged-in user is a part of. This user can join and leave chats over time;
The list of members of each group that the logged-in user is part of. Other users can join and leave chats over time;
The list of messages within a chat. Chat members can send new messages or delete existing ones at any time.

Here's some of our requirements and constraints:

Our system cannot cause users to miss messages. Therefore, syncing needs to be reliable.
Simpler and easier to implement is better than perfect or efficient. We're an unproven app trying to find product-market fit, not win engineering awards.
We also don't have many users, so scaling or efficient use of resources is not a concern at the moment.

Scrappy solution: query the whole list every time

The simplest and quickest solution is to always query all of the data from the server. For example, keeping the local list of "chats the logged-in user is part of" is a matter of asking the server for this whole list with every sync. The next step is to attempt to insert each element into the local database. If the object (e.g. chat) doesn't yet exist, it is inserted. Otherwise, the query is ignored.

This approach is far from perfect, as sync time scales linearly with the list size. But since our lists are not very large, and we execute the operation in a background thread, we accept the compromise. This is actually the implementation in use at the time of writing.

A better solution

A better solution may be to include the timestamps of the oldest and most recent messages the client knows of with each synchronization request. The server can then return the list elements that fall outside of this range to the client, which proceeds to persist those into the local database. This approach ensures optimal use of time, computing, and networking resources.

What about deleted elements, you may ask? We can treat all deletions as modifications (e.g. by setting a deleted field to true). In order to synchronize modifications, we need to keep a list of modified elements on the server, along with the date of each modification. The client can then query the server with the last modification date it knows of as a parameter. The server can then return a list of elements that have been modified since the parameter timestamp, which the client can then update on its local database.

Offline-first + reactive programming: a match made in heaven

Besides faster access to data, a local-storage-first architecture coupled with reactive programming results in a very nice app architecture, making our app features more decoupled, and thus easier to write, understand, and maintain.

To appreciate why, let's consider the following scenario. In Primal, we have 3 screens (pictured at the top) that deal with the Message entity:

The List of Chats screen that displays the date of the last message within each chat.
The Message Timeline screen that displays the actual messages.
The Camera screen that allows creating and sending a message.

What happens when the user successfully sends a new message within screen #3? Both the first and second screens need to be updated. In an online-first world, this update would need to take place in memory. Somehow, screen #3 would need to reference those other screens. This is problematic because now these screens need to know and care about each other.

This becomes a bigger problem when data can be modified from multiple places: e.g. the list of messages within a chat can be modified from the camera screen, but also from the push-notification module, from the delete-message screen, etc. Wiring these components together (perhaps via listeners) is complicated and error-prone. Now consider that you have multiple types of data, all being modified across your app, and all hell breaks loose.

With the database and key-value store as the only sources of data to the screens, the camera screen merely needs to insert the new message into the local database. It does not need to notify any other screen. In fact, it's not even aware of their existence. The database change is automatically propagated to all screens that are listening to a query whose result is affected by the change.

There's several benefits of this approach:

It enables local reasoning: we can concentrate on the feature at hand, and completely ignore how other screens or app components might be affected by our changes;
Features can evolve independently of each other. With a more coupled approach, changing one feature may require making several changes across the app;
Using battle-tested mechanisms like SQLite, Room, and observable queries is a more reliable change-propagation solution than stitching our own system together (e.g. wiring screens via listeners);

How to implement this on Android?

You're in luck, as this is precisely the topic of the next article on this series.

Follow me on Twitter for more articles.

Thank you for reading.