10

Tiled images

Tiled images are large images that have been split in smaller square tiles. There are several tiled image formats, with different ways of organizing the tile files. A tiled image on the web can be downloaded only by finding all the individual tile URLs, downloading them, and stitching them back together. This is what a tiled image downloader does.

zoomable image

Building a tiled image downloader

I maintain several tiled image downloaders (mainly dezoomify, and dezoomify-rs). Their implementation is quite simple, the only software architecture challenges are to:

  1. Make as much of the software as possible testable without having to actually make network requests
  2. Support many different tiled image formats without duplicating code.

I call dezoomers the modules of the software that are specific to each tiled image format. The goal is to be able to have dezoomers be as small as possible, and to have the core of the software do most of the work.

Implementing template-only tiled image download

Currently, in dezoomify-rs, the dezoomers provide a list of tile URLs and associated positions, and then the core downloads and stitches the tiles in parallel. That works for most image formats, but not for template-only downloads.

Template-only download is a feature that would allow the software to take as a single input an URL template of the form http://test.com/{x},{y}.jpg, and be able to replace {x} and {y} by coordinates to create the image. The challenge here is that the bounds for x and y are not known in advance, and can be computed only by requesting tiles, and looking at the response of the server: as soon as the server returns a 404, we know that we have reached the maximum value for x, and process to the next line, until the server returns a 404 even for x=0, at which point we know we have reached the maximal value for y.

Here is a simple schema I made of the state machine that could be implemented for template-only downloads:

state machine

The question is: how to architecture the software in a way that:

  1. allows both template-only downloads and traditional dezoomers when all the tile URLs are known in advance.
  2. is efficient (always downloads as many tiles in parallel as possible)
  3. is testable
  4. avoids code duplication
lovasoa
  • 209

1 Answers1

2

Here is a sketch of an algorithm which should do the trick - functional tools to the rescue:

  • Implement your core downloader not in terms of a list of URLs, but in terms of a stream (or generator) of URLs (I did not do this in Javascript or Rust by myself, only in some other languages, but found several information in the web that both languages support these concepts).

  • The downloader fills the parallel download queue with up to N elements, where N is the maximum number of concurrent downloads allowed. Note such an implementation should be able to work on a "finite" stream as well as on an "infinite" stream (based on a lazy generator function). Getting a 404 then should work as an additional stopping criterion.

  • The downloaded images need to be returned through some asynchronous event (or callback function) to pass them to the "stitcher". This mechanics could also be used to inform the caller about failed downloads.

Now you can this reuse for your traditional downloader as well as for your template-only downloads. The first one should be obvious. The second one works exactly in the two-step way you have scetched above, with the difference that this describes the order of URLs generated for the stream.

  • First, you start with an "infinite" stream of tiles (1,1), (2,1), (3,1), ... to find the limit for x and download the first row

  • Then you provide the URLS for (1,2), (2,2), ..., (MaxX, 2), (1,3), (2,3), ..., also as an "infinite stream", until the download stops.

It should be obvious that this fulfills your requirements 1, 2 and 4. Requirement #3 is quite orthogonal to any algorithm here, one gets the kind of testability you asked for (working without network requests) by making the download functionality replaceable by some "mock" download function.

Hope this was clear enough, if not, don't hesitate to ask.

Doc Brown
  • 218,378