3

We have these things called "executions" that are run at certain times. They have a delay property that basically says "execute me after delay microseconds`. Currently we are using Cassandra as a queue, but this is definitely not scalable, and the database is not the best tool for a queue.

I could implement this as a priority queue, but since we expect to receive a large number of messages, I cannot keep this in memory. I was thinking of using Apache's ActiveMQ which supports priority, but I'm wondering if it's worth it to bring in an entire framework. My other option is to implement a priority-queue that is written to disk (using MappedByteBuffer). The second option is more work, but more directly addresses our use-case. I don't mind using ActiveMQ and it would most certainly be easier, but I'm just wondering if it is worth it to bring in an entire framework when I'm only going to be using a small part of it.

Some more context:

We have these scripts that pull in data from different sources. The script returns the data, along with an "execution" object that says "run me again after delay microseconds" (basically polling for new data). Currently this execution is written to Cassandra, and a thread periodically polls and picks these up and then calls the appropriate script.

As I read more into message queues, I'm realizing that it may not even work. ActiveMQ has a fixed list of priorities, whereas I want a priority queue that supports arbitrary priority-values.

I guess it's more or less a scheduler. Each execution is "scheduled" to be run after delay microseconds. Would it be more appropriate to use a scheduling framework here? Is there one that supports priorities?

2 Answers2

1

Think about what you really have to keep in memory in order to execute the jobs; it may be as simple as just a job ID, next-execution-time, and periodic-repeat-time. In the simplest (and possibly oversimplified) case, everything can stay in the database and the execution 'engine' only has to know the next job to execute.

Profile it, consider likely and peak loads, then add work agents and dispatchers and queues (oh my!) as needed. Or something completely different ;)

1

As I understand you need to:

  1. run the script
  2. get the data and schedule next script execution after delay with some priority

I would implement step 2. using some scheduling framework, like Quartz. Quartz provides ability to prioritize executions in case they are triggered at the same time:

Trigger trigger = newTrigger() 
    .withIdentity(triggerKey("myTrigger", "myTriggerGroup"))
    .startAt(futureDate(delay, SECONDS))
    .withPriority(priority)
    .build();

Full example can be found in Quartz tutorial.

This approach does not require any additional database tables. If you need persistence you can use JDBC JobStore configuration to store scheduling data in database.