29

Assuming a system where there's a Web Application with a resource, and a reference to a remote application with another similar resource, how do you represent a bi-directional sync action which synchronizes the 'local' resource with the 'remote' resource?

Example:

I have an API that represents a todo list.

GET/POST/PUT/DELETE /todos/, etc.

That API can reference remote TODO services.

GET/POST/PUT/DELETE /todo_services/, etc.

I can manipulate todos from the remote service through my API as a proxy via

GET/POST/PUT/DELETE /todo_services/abc123/, etc.

I want the ability to do a bi-directional sync between a local set of todos and the remote set of TODOS.

In a rpc sort of way, one could do

POST /todo_services/abc123/sync/

But, in the "verbs are bad" idea, is there a better way to represent this action?

Ben
  • 103

5 Answers5

20

Where and what are the resources?

REST is all about addressing resources in a stateless, discoverable manner. It does not have to be implemented over HTTP, nor does it have to rely on JSON or XML, although it is strongly recommended that a hypermedia data format is used (see the HATEOAS principle) since links and ids are desirable.

So, the question becomes: How does one think about synchronization in terms of resources?

What is bi-directional sync?**

Bi-directional sync is the process of updating the resources present on a graph of nodes so that, at the end of the process, all nodes have updated their resources in accordance with the rules governing those resources. Typically, this is understood to be that all nodes would have the latest version of the resources as present within the graph. In the simplest case the graph consists of two nodes: local and remote. Local initiates the sync.

So the key resource that needs to be addressed is a transaction log and, therefore, a sync process might look like this for the "items" collection under HTTP:

Step 1 - Local retrieves the transaction log

Local: GET /remotehost/items/transactions?earliest=2000-01-01T12:34:56.789Z

Remote: 200 OK with body containing transaction log containing fields similar to this.

  • itemId - a UUID to provide a shared primary key

  • updatedAt - timestamp to provide a co-ordinated point when the data was last updated (assuming that a revision history is not required)

  • fingerprint - a SHA1 hash of the contents of the data for rapid comparison if updateAt is a few seconds out

  • itemURI - a full URI to the item to allow retrieval later

Step 2 - Local compares the remote transaction log with its own

This is the application of the business rules of how to sync. Typically, the itemId will identify the local resource, then compare the fingerprint. If there is a difference then a comparison of updatedAt is made. If these are too close to call then a decision will need to be made to pull based on the other node (perhaps it is more important), or to push to the other node (this node is more important). If the remote resource is not present locally then a push entry is made (this contains the actual data for insert/update). Any local resources not present in the remote transaction log are assumed to be unchanged.

The pull requests are made against the remote node so that the data exists locally using the itemURI. They are not applied locally until later.

Step 3 - Push local sync transaction log to remote

Local: PUT /remotehost/items/transactions with body containing the local sync transaction log.

The remote node might process this synchronously (if it's small and quick) or asynchronously (think 202 ACCEPTED) if it's likely to incur a lot of overhead. Assuming a synchronous operation, then the outcome will be either 200 OK or 409 CONFLICT depending on the success or failure. In the case of a 409 CONFLICT, then the process has to be started again since there has been an optimistic locking failure at the remote node (someone changed the data during the sync). The remote updates are processed under their own application transaction.

Step 4 - Update locally

The data pulled in Step 2 is applied locally under an application transaction.

While the above is not perfect (there are several situations where local and remote may get into trouble and having remote pull data from local is probably more efficient than stuffing it into a big PUT) it does demonstrate how REST can be used during a bi-directional synchronization process.

Gary
  • 24,440
7

I would consider a synchronization operation as a resource that can be accessed (GET) or created (POST). With that in mind, the API URL could be:

/todo_services/abc123/synchronization

(Calling it "synchronization", not "sync" to make it clear it's not a verb)

Then do:

POST /todo_services/abc123/synchronization

To initiate a synchronization. Since a synchronization operation is a resource, this call could potentially return an ID that can then be used to check the status of the operation:

GET /todo_services/abc123/synchronization?id=12345
laurent
  • 715
6

This is a hard problem. I do not believe REST is an appropriate level to implement sync. A robust sync would essentially need to be a distributed transaction. REST is not the tool for that job.

(Assumption: by "sync" you are implying that either resource can change independently of the other at any time, and you want the ability to realign them without losing updates.)

You may want to consider making one the "master" and the other the "slave" so that you can confidently clobber the slave periodically with data from the master.

You may also wish to consider the Microsoft Sync Framework if you absolutely need to support independently changing data stores. This would not work through REST, but behind the scenes.

2

Apache CouchDB is a database which is based on REST, HTTP, and JSON. Developers perform basic CRUD operations over HTTP. It also provides a replication mechanism which is peer-to-peer using only HTTP methods.

To provide this replication, CouchDB needs to have some CouchDB-specific conventions. None of these are opposed to REST. It provides each document (that is a REST resource within a database) with a revision number. This is part of the JSON representation of that document, but is also in the ETag HTTP header. Each database also has a sequence number which allows for tracking changes to the database as a whole.

For conflict resolution, they simply note that a document is conflicted and retain the conflicted versions, leaving it to the developers using the database to provide a conflict resolution algorithm.

You can either use CouchDB as your REST API, which will give you synchronization out of the box, or take a look at how it provides replication to provide a starting point for making your own algorithm.

David V
  • 563
-1

You can solve the "verbs are bad" problem with a simple renaming - use "updates" instead of "sync".

The sync process is actually sending the a list of local updates made since the last sync, and receiving a list of updates made on the server in that same time.

Tom Clarkson
  • 1,332