Eat What You Kill

Worker threads

It's a fairly common pattern:

  • Listen for work coming in
  • Hand each work item to a pool of worker threads to process

Produce-Consume

So you have one part of the system producing work. And then this work is consumed by the worker threads.

Or even more simply, you could just spin around doing produce, consume, produce, consume in one thread.

  • An NIO selector produces incoming data, new connections, outgoing availability
  • An event subscription produces events

Produce-Execute-Consume

The classic worker thread pool:

  • NIO-Selector thread keeps calling select() or one of its friends
  • Selectable byte channel and event type put onto queue to be consumed:
    for (SelectionKey key : selector.selectedKeys()) {
      executor.execute(() => handleSelected(key.channel(), key.readyOps()));
    }

Executor implementations often function as queues

What's wrong with that?

Every piece of incoming work is guaranteed to cause a context switch, along with other overhead involved in queueing the consumer. This adds latency to processing the work, and may also require the work context to be loaded into a different CPU's cache.

Using more CPUs to handle work - Jetty calls this Parallel Slowdown, Jetty-9 was 15% slower than Jetty-8 after making it fully async-capable.

Understanding and dealing with this effect is Mechanical Sympathy- "how to code sympathetically to and measure the underlying stack/platform so good performance can be extracted"

Better queueing?

  • Libraries such as Disruptor implement a low-latency queue, but the problem doesn't go away.
  • Work-stealing queues could be assigned per-processor - good luck doing that on the JVM

Problems with queues in general: applying back-pressure to the producer instead of just letting the queue grow. If the queue size is simply limited, you just end up blocking everything again eventually.

Execute-Produce-Consume

SelectionKey key = selectedKeys.next();
if (!key) return;
executor.execute(this); // may take some time to dispatch
do {
  handleSelected(key.channel(), key.readyOps());
  key = selectedKeys.next();
} while (key);

If work is processed fast compared to thread dispatch, then this behaves just like a single thread P-C loop. Thread dispatch delay may mean time for a job in the executor to get through the queue.

Additional threads added to handle production - may immediately exit if there's nothing to do by the time they start

Further reading