How to check robustness in a service that includes multiple points of failure in workflow, including FTP

Question

As part of my workflow, I need to do all these steps in one transaction - I need to ftp files to 2 different FTP servers. - There is also a spreadsheet that gets generated which needs to be FTP'ed. Can this be streamed, instead of downloading and then pushed to FTP server.

I am using Ruby Net::SFTP and Net::FTP libraries to send the files.

I would like it to be robust. I am not sure if I need to do anything else or of this is good enough.

Just to be clear, this is already working in production, I am not stuck, just looking to exchange design/architecture ideas on how to improve this.

score 3 · Accepted Answer · answered Mar 13 '14 at 18:25

Ahh, with FTP, the simple answer is - you don't.

What you can do is to retrieve the file you sent to the FTP server, and check it is the same as the original file. If they match, all worked as well as you'd hoped.

This gets tricky if you're not allowed to read from the FTP server (as some credit card acquirers do), security is set up such that you're allowed to write to some directories but not read from them. In these cases, the server tends to have a service that generates a summary report of your upload that you can retrieve from a read-only directory. In the case of acquirers that do not do this, you just have to cross your fingers and wait for them to complain usually on the following day.

score 1 · Answer 2 · answered Mar 13 '14 at 18:10

It depends what you mean by "robust" e.g. considering scenarios like:

PUT fails on remote host
PUT succeeds but file is corrupted somehow
Host is temporarily unavailable
Host is permanently unavailable (e.g. wrong config, or host is decommissioned)

You can code around this (or buy software that provides guarantees around delivery), but if you're in control of the whole workflow, it may be better to have clients "pull" the files rather than having the publishers "push" them. The system I work on creates and sends a huge amount of files, and we have no idea if the remote clients even need some of them. A "pull" model at least tells you that a system somewhere is actively trying to fetch a given file.

You could, for example, built a service that accepts file requests, and builds them if they are not available, or else uses an existing / cached version, perhaps from the filesystem. A REST API would be a natural fit for this, e.g.

http://server.example.com/myservice/mydata/file12345.csv

...would build file12345.csv on-demand and cache it for subsequent requests (if you needed that).

How to check robustness in a service that includes multiple points of failure in workflow, including FTP

2 Answers2