496code: CRDT Application Server


The CRDT Application Server is an application server based on Pipes technology.

While Pipes was designed for P2P, many of the underlying principles also have application in a centralized client/server model. Clients and servers are, technically, peers -- with the special provision that a server is a peer with extra authority. This allows servers to perform the sort of centralized roles typical in a client/server model, such as client authorization, data validation, and synchronization.

Data Synchronization

The primary benefit in building an application server using CRDTs is that you get data synchronization of potentially offline clients. Each client builds their own state locally and attempts to synchronize with peers (in this case, the server). When available, synchronization happens normally and the data becomes available to other clients. But when offline, the data lives locally and the client application operates just as it would when online.

How does this work? We use CRDTs with a blockhain-based (see below) transport. The CRDTs are a standard delta type allowing Add, Update, and Remove operations, and supporting Singletons, Sets, Grow-Only Sets, and Mutable objects.

On the client, as user actions are performed, each CRDT is encoded into a Block, and stored locally on the client's blockchain. When the client has an active server connection, it synchronizes its chain with the server (see below), including getting any updates from other clients.

Why Blockchain?

There are two main reasons to use the blockchain-like functionality: the first is that it allows abstraction of the push/pull/merge functionality (see below) from the contents of the data model itself, providing clear separation of concerns, meaning that the blockchain and synchronization code is completely independent of the data model code (which is a good thing).

The second reason is data integrity -- the same reason git commits are hashes of their contents. Since the block id is a (cryptographic) hash including its contents and the previous block id, each block implicitly verifies all of the data in the history of the application container.

This also provides a solid mechanism for the two primary use cases in data synchronization: initial content download is just getting the whole blockchain, and synchronizing updates are sharing of new/recent blocks. For many multi-user applications, it's hard to imagine a chain of more than a few thousand blocks, which will download and process in under a second with any modern connection/device (but see Snapshotting below).

The beauty of this system is that the server doesn't even have to know what the contents of the blocks are to perform client synchronization. This opens up the potential to encrypt blocks, with keys that the clients know but the server doesn't -- potentially a very nice feature for security-conscious clients. If required, the server can of course "look inside" blocks to support application-specific features that require server-side processing.

Blockchain Detail

In the application server context, the extra authority of the server allows for a usage pattern that is very similar to how almost all of us use `git`. Think of the backend like 'origin', and every client has its own checkout. Every time a client makes a change, it makes a new block -- the same thing as making a commit. Then it attempts a "push" -- and, just like in git, if the backend determines that the push would not create a fork (branch), the push is allowed and the block is then available on the backend (origin). Other clients are notified, and then they perform a "pull" to get the latest changes. The CRDT deltas that are in the blocks are like the "diffs" in a commit -- when the client receives them, they apply the diff/delta and the model is updated.

The only question is: what happens when you try to "push", but you don't have the latest? As in git, the server refuses to allow it; and, as in git, the client has to do a "pull" first, then a "merge" -- and only then will the server allow the push to go through. So, just like in git, the backend forces the clients to do merges locally. For many applications, the CRDT model allows for easy conflict-free merges (which is the "C" in CRDT). The important thing is that merges happen in a way that is consistent, providing all clients with the same state once they are all online and synced.

The main feature that makes this like a blockchain is that every block (CRDT delta) refers explicitly to the one before it (in git terms, its parent) -- or in the case of a merge, the two blocks before it.

Finally, this is technically not a blockchain at all. First of all, we allow forks and merges, which makes it a DAG instead of a chain -- so we refer to this as a "BALDAG", which stands for "Block Aggressively Linearized DAG". Remember you saw it here first. ; )

But it's also not a traditional blockchain in that we don't use a proof-of-work for consensus. Instead, we rely on our centralized server (the backend) to provide the consensus by authority (in the manner described above).

So we could call it something else but...for now it's easier to just call it blockchain.

Snapshotting

As containers grow, the number of historical blocks/CRDTs can get very large with a lot of use. This can lead to the "IBD problem" -- initial block download. This is a big problem in cryptocurrencies, with the IBD in Bitcoin reaching into the hundreds of gigabytes.

Of course, application containers will never grow that big, but it is not hard to imagine containers with many thousands of deltas, growing up to megabytes in size (for example, in a text editor). More importantly, the blockchain could easily grow to be many times the size of the "application state", that is, the data required for the client to present the UI to the user. In that case, we will want to support snapshotting -- that is, capturing the state of a container and communicating that to new sync clients instead of a complete chain of deltas.

This is already supported under-the-hood by the blockchain design, partly because each block self-validates by including a (cryptographic) hash of the model state in its block data. This means that every client that applies the block (including the backend) verifies that it ends up with the exact same container state as the client that made the block. In addition to adding data integrity protection, this also makes it easy to support snapshots in the future: snapshots can be saved on the server under the snapshot hash. The server can choose to store this when, say, the size of the full model state is less than half of the size of the blockchain of deltas. Or, it may make more sense to check for snapshotting when a user shares a container, that is, right before another client is likely to do a full chain download.

The CRDT Application Server is in private beta and being used in both internal and commercial applications.


Projects | Home | Contact: info@496code.com