Best approach for inter-process task queues

Question

I have an application made up of multiple processes/workers/services which need to send messages to each other that represent units of "enqueued tasks" to be done. I am trying to find the best pattern to use for the type of communication these workers need to perform.

These are the requirements:

Communication is asynchronous. One worker enqueues a task for another worker to do, and doesn't block for its completion or require a synchronous response.
The workers do not share memory.
The messages need to be persistent and durable. They should be stored when the application is shut down and resumed when the application starts back up.
Messages should be "unique" in the queue. That is, if one worker needs another worker to perform a task, but that task is already present in the other worker's queue, then it shouldn't be enqueued again. This is to avoid generating a huge backlog of redundant tasks in the queues of slower workers which were generated by faster workers.

Points 1-3 just scream AMQP. But then point 4 is the problem, because I don't know of an AMQP implementation with a straightforward solution for enforcing uniqueness of messages (I know RabbitMQ doesn't).

I'm wondering if the fact that these messages queues actually represent enqueued tasks to perform, and need to be unique, means that there is a more suitable pattern than MQs.

score 2 · Answer 1 · answered Mar 12 '21 at 21:41

There is this plugin for rabbit MQ

https://github.com/noxdafox/rabbitmq-message-deduplication

You can see that its fairly trivial to implement queue storage with an index. Just use a database to put your messages in, generate whatever hash you want to signify uniqueness.

However, its not in the spirit of the pattern. What do you do when a duplicate is sent just after its twin has been processed? they wont both be in any queue at the same time, but you will have processed the same job twice.

I would perhaps go for a cached result approach. Just bung all your jobs in the queue and don't worry about duplication. When you process a job calc the hash and check the cache to see if you already know the result. If so you can either return that, or simply drop the job as already done.

Now you can control your cache invalidation as required, either by time or memory or whatever your limits are.

score 1 · Answer 2 · answered Mar 16 '21 at 09:36

A database is a great place to start, because:

You already have that infrastructure (presumably)
They are very flexible
Performance is very good and can be optimised as you scale

The way you accomplish your needs with a database (let's say PostgreSQL, but it could also be MongoDB and others):

Service A: Writes to [InviteEmails] table
All tables, including [InviteEmails] have a trigger which calls NOTIFY with the table name as the channel name
Service B: Is has called LISTEN InviteEmails, and is signalled
Service B: Runs a self-defined VIEW called AllEmails, that includes InviteEmails and others with UNION.
Service B: Uses SMTP to send emails for each record-task, then updates the InviteEmail record to indicate the task is complete.

To handle your uniqueness requirement:

Service A MAY avoid creating duplicate [InviteEmail] records, simply by using EXISTS SQL logic.

Service B MAY ignore duplicates in InviteEmails using EXISTS SQL logic (for similar records that are already complete), as well as a min(ID) selector.

To confirm compliance with your requirements

Communication is asynchronous. One worker enqueues a task for another worker to do, and doesn't block for its completion or require a synchronous response.

One worker writes to the database, another reads. It's asynchronous.

The workers do not share memory.

They are a logical database. Each table (queue) can conceivably be on a database each (at an extreme) - still not sharing memory.

The messages need to be persistent and durable. They should be stored when the application is shut down and resumed when the application starts back up.

Databases, even NoSQL, and NewSQL are durable. SQL and NewSQL offer transactions and ACID compliance.

Messages should be "unique" in the queue. That is, if one worker needs another worker to perform a task, but that task is already present in the other worker's queue, then it shouldn't be enqueued again. This is to avoid generating a huge backlog of redundant tasks in the queues of slower workers which were generated by faster workers.

see "To handle your uniqueness requirement" above

Best approach for inter-process task queues

2 Answers2