Alchemy Micro-Service Framework: Using RabbitMQ Instead of HTTP

atombender · on March 14, 2016

As someone who has used RabbitMQ in production for many years, you should rather consider using NATS [1] for RPC.

RabbitMQ's high availability support is, frankly, terrible [2]. It's a single point of failure no matter how you turn it, because it cannot merge conflicting queues that result from a split-brain situation. Partitions can happen not just on network outage, but also in high-load situations.

NATS is also a lot faster [3], and its client network protocol is so simple that you can implement a client in a couple hundred lines in any language. Compare to AMQP, which is complex, often implemented wrong, and requires a lot of setup (at the very least: declare exchanges, declare queues, then bind them) on the client side. NATS does topic-based pub/sub out of the box, no schema required.

(Re performance, relying on ACK/NACK with RPC is a bad idea. The better solution is to move retrying into the client side and rely on timeouts, and of course error replies.)

RabbitMQ is one of the better message queue implementations for scenarios where you need the bigger features it provides: durability (on-disk persistence), transactions, cross-data center replication (shovel/federation plugins), hierarchical topologies and so on.

[1] http://nats.io

[2] https://aphyr.com/posts/315-jepsen-rabbitmq

[3] http://bravenewgeek.com/dissecting-message-queues/

manigandham · on March 14, 2016

While a simple HTTP interface is easy to code around, I quite like the AMQP protocol. It's fast, efficient, reliable and powerful.

We currently use it to send hundreds of thousands of messages per second, large and small, around the world to different data centers and it always works smoothly.

For anyone interested, Microsoft Channel 9 has a great 6 part series on the AMQP 1.0 protocol: https://channel9.msdn.com/Blogs/Subscribe/The-AMQP-10-Protoc...

atombender · on March 14, 2016

Note that NATS is not HTTP, it's its own very simple text-based protocol.

AMQP is nice, but I don't think anyone would categorize it as simple. It's binary, for one. And to read and write it, you have to deal with framing, which is not always easy to get right. For example, for a long time RabbitMQ had an issue with dead-letter exchanges (aka DLX), where each bounce would add a header to the message envelope. DLX is great for retries, but after a bunch of retries, a message could get quite large. Some clients (the Node.js client in particular) has a small limit on frame sizes and will throw on such messages rather than grow the buffer. (Fortunately, this header was fixed in a recent RabbitMQ version.)

manigandham · on March 15, 2016

Yes, thanks for reminding me, NATS isnt HTTP.

Didn't say AMQP is simple though. It's definitely not, but that's where the features and functionality come from. It seems the issues you described are with the broker and clients, not the protocol itself, which I find to be pretty solid.

atombender · on March 15, 2016

AMQP also specifies the data model (exchanges, queues, bindings and so forth), which dictates the implementation of it. I find that data model a bit heavy-handed, but it's not terrible. However, it's not very suitable for RPC.

nitrogen · on March 15, 2016

It's not too difficult to do RPC over AMQP by having an ephemeral reply queue per request, if your load isn't too high.

abrookewood · on March 15, 2016

I spoke to Kyle Kingsbury (Aphyr) about 9 months ago and at that time, the only queue or pub/sub system he thought was relatively safe from partition errors was Kafka. Not sure if his position has changed recently.

atombender · on March 15, 2016

Indeed, though this is irrelevant to this particular use case.

NATS doesn't have replication, sharding or total ordering. Consistency is a challenge for clustered messaging brokers that need this.

With NATS, queues are effectively sharded by node. If a node dies, its messages are lost. Incoming messages to the live nodes will still go to connected subscribers, and subscribers are expected to reconnect to the pool of available nodes. Once a previously dead node rejoins, it will start receiving messages.

NATS in this case replaces something like HAProxy; a simple in-memory router of requests to backends.

abrookewood · on March 16, 2016

Ahh .. OK, that makes it clearer. Thanks.

StreamBright · on March 15, 2016

This is also my personal experience with message queues, even though I haven't had a chance to work with NATS yet. Kafka is just a really solid piece of engineering when you need 5-50 servers. With that many servers you can handle millions of messages per second that usually enough for a mid size company. I am not sure about higher scale but I believe LinkedIN has much larger clusters.

chuhnk · on March 14, 2016

+1 to this. Used rabbit extensively in production but now #10 on the list. Nats is #1 as a transport message bus in my opinion.

grahar64 · on March 14, 2016

Initially when developing Alchemy we had a look at NSQ http://nsq.io and found it a little difficult (that was a year or so ago, so might be better now). Then we started looking at RabbitMQ and thought it fit our requirements better. I have not heard of NATS but will definitely have a look see. At the moment Alchemy is tied to RabbitMQ pretty tightly, but abstracting and supporting many queue solutions would be good.

I don't like pushing the retry into the client side. Since microservice by necessity have lots of communication between them that can be quite a bit of code across all the services. I would rather the architecture deal with it and just ensure that endpoints are idempotent so calls can be retried without adding client complexity. This is a personal preference, and in some cases clients do need to deal with retires, but I just like it not to be the default.

In the services folder there is this https://github.com/LoyaltyNZ/alchemy-framework/blob/master/s... pretty good RabbitMQ HA setup, with:

1. cluster_partition_handling set to autoheal so that it will do its best to recover (a few lost messages is infinitely better than a broken system) 2. queue_master_locator is min-master, so queues are mastered on nodes where the least amount of other queues are mastered. This will balance the queues across the clusters meaning if a node goes down then there will be minimal amount of queues to recreate 3. A mirror policy to mirror every queue (this will only mirror service queues because response queues are exclusive) , this will make the system a bit slower, but makes it much more robust.

This is enough to handle split brain (although this is difficult to test) and nodes going down and coming back (much easier to test).

Cheers for the comment :)

atombender · on March 14, 2016

Destructive cluster recovery, patched with mirroring raises the question of how suitable RabbitMQ is for the job in the first place. Mirroring for RPC requests, think about it for a second! Mirroring. For RPC!

RabbitMQ's autohealing just solves your problem in the wrong way. Yes, it will usually fix itself (if it doesn't die with a Mnesia inconsistent_database error), but it will discard messages, and you won't know which ones.

Meanwhile, NATS will forward messages to subscribers as long as there's a clear path. There are no network partition issues because the queues don't have RabbitMQ's strict, total ordering.

Note that RabbitMQ is notoriously sensitive to partitions; one small blip and it gets its knickers in a twist. This is why I recommend increasing the net_ticktime option to something like 180 so you're less exposed.

Having done this for a long time, my advice is that making the client more intelligent is always the better option. If you're relunctant, consider a sidecar proxy like Linkerd [1] which can handle the gritty details for you.

[1] https://linkerd.io/

spotman · on March 14, 2016

Just curious, what did you find difficult about NSQ?

I moved one large project from RabbitMQ to NSQ over a year ago and haven't looked back. It has just been wonderful to work with and build on top of.

Anything you didn't like about NSQ or pitfalls to watch out for from your experience?

deepanchor · on March 15, 2016

Curious about this too. I've found NSQ to be rock solid and easy to setup and work with.

bjflanne · on March 14, 2016

Also felt this post by Diogo in the NATS community was interesting here:

http://www.diogogmt.com/2016/02/08/benchmarking-nats-and-res...

atombender · on March 14, 2016

Impressive numbers, though it seems he's testing Go vs. Node.js at the same time. I'd like to see performance numbers for HTTP/2, though.

diogogmt · on March 15, 2016

HTTP/2 definitely makes it faster for browsers to load assets in a webpage, however, not sure how much it would speed up individual REST requests since most of the time is bound to the request/response round trip.

Just clarifying the comment about the benchmark, even though there are both Go and Node.js components the actual bit that was being benchmarked was HTTP and NATS for the inter service communication. All the code is available on github if anybody wants to rerun the benchmarks.

atombender · on March 15, 2016

HTTP/2's main feature is that it's multiplexed, which a client written in a suitably async-friendly language can exploit to pipeline parallel requests, or at least better reuse connections. There's presumably not too much performance gain (though header compression helps) if your client can't exploit the multiplexing.

lazyant · on March 15, 2016

Setting Rabbitmq's "cluster_partition_handling" as "pause_minority" in theory ameliorates split-brain issues in clusters of 3 or more (odd numbers), where the majority would ignore the minority nodes.

deepanchor · on March 15, 2016

NSQ and NATS are my goto tools for messaging, though NSQ seems more flexible to me because it supports message persistence and also provides NATS-like ephemeral channels for when persistence is not a hard requirement. And it comes with a shiny admin-dashboard, which NATS lacks. NATS is useful when raw performance is a priority.

One thing I do find lacking in both of these queues is support for per-message TTL though, for pruning time sensitive messages. I'm not sure what the performance overhead would be for supporting something like that.

sixftmonster · on March 14, 2016

Would this library be better off utilizing NATS instead of RabbitMQ baring any requirements a person might have for the persistence and other features you mentioned?

atombender · on March 14, 2016

Yes, I think so.

StreamBright · on March 15, 2016

Would Kafka be a better option in this case? There might be some properties I am not aware of that makes it impossible.

atombender · on March 15, 2016

No, Kafka is completely unsuitable for RPC, for several reasons.

First, its data model shards queues into partitions, each of which can be consumed by just a single consumer. Assume we have partititions 1 and 2. P1 is empty, P2 has a ton of messages. You will now have one consumer C1 which is idle, while C2 is doing work. C1 can't take any of C2's work because it can only process its own partition. In other words: A single slow consumer can block a significant portion of the queue. Kafka is designed for fast (or at least evenly performant) consumers.

Kafka's queues are also persisted on disk, which is terrible for RPC.

Think of Kafka as a linear database that you can append to and read from sequentially. Its main use case is for data that can fan out into multiple parallel processing steps. For example, a log processing system that extracts metrics: You feed the Kafka queue into something like Apache Storm, which churns the data and emits counts into an RDBMS (for example).

lyha · on March 15, 2016

Kafka stores everything to disk, this may not be what you are looking for for your RPC calls (that you would make usually as a direct service to service HTTP call). Moreover kafka "topics" are statically declared (i.e. by admin scripts instead of a public API), and it's a heavy-weight operation. So it's not the best fit to have micro-services registering themselves and dynamically creating "topics" for each method.

olivemonkey · on March 14, 2016

IMO this is an emerging anti pattern to use Rabbit to connect "microservices". It often introduces a single point of failure to your "distributed" system and has problems with network partitions. If critical functionality stops working when Rabbit is down, you're probably doing it wrong.

calpaterson · on March 14, 2016

Most real world microservice projects (I've worked on several) already have many single points of failure. Often there is one service that needs to be up for the system to be up (such as the one that processes your customers orders), you don't realise some VMs are sharing a physical disk or everything is dependent on a single router somewhere you've never heard of that will one day run out of memory and drop TCP connections. This is not to mention the risks posed to availability by third-party tracking software that push changes that break web forms (#1 cause of long outages in my experience).

Message brokers like RabbitMQ give you a lot of benefit and introduce only a small number of failure modes. You can obviate tricky service discovery boot orders, do RPC with without caring about whether about your server could be restarted and of course you get a good implementation of pub-sub too. If you stay away from poorly considered high availability schemes I am absolutely fine with recommending it for intra-service communication.

pm90 · on March 15, 2016

Great comment. One thing I don't like very much about microservices is simply that my service often will have some such hard dependency. e.g. if the logging service is down, I lose all logs-based statistics. If the authentication microservice is down, I'm screwed. The parent comment made the excellent point about SPOF, but it seems like for microservices to work correctly, there will always be some SPOFs.

Maybe I'm being too pessimistic. I use microservices, but without significant engineering rigor, I think its a recipe for disaster.

zeisss · on March 15, 2016

> If the authentication microservice is down, I'm screwed.

Well, make sure your sevice is not dependent on those services then. Use signed tokens to only rely on the authenticator for logins. everything afterwards can work it our by themself.

logging: keep your stuff in a queue or logfile, until the log service is back up.

nugator · on March 14, 2016

Do you have a better solution? Genuinely interested since I'm about to implement webservice data change event notifications using probably RabbitMQ.

eropple · on March 14, 2016

It does require some up-front work, but I've generally moved over to Kafka for most of my stuff. Rabbit has some nice aspects, but as noted its HA options start at "dire" and escalate smoothly to "oh dear god no". Throughput with Kafka is very, very good and,in my experience, it's remarkably difficult to kill. If you want fully ephemeral topics, you can write the broker data to a tmpfs. But I really like keeping data around, because I can replay my topics later both for DR and for debugging.

slowmovintarget · on March 14, 2016

Kafka makes reliable durable messaging not just possible, but solves all the usual attendant problems you try to design around.

* Durable messages cause a slow-down when publishing, but not with Kafka because it uses the Linux kernel's page cache to write.

* Durable messages are slow to read if they have to come from disk. Kafka is optimized to load sequential blocks into memory an push them out through a socket with very few copies. This makes for very fast reads.

* Slow consumers can bog down the broker. Kafka stores all messages and keeps them on the topic for a time horizon. No back-pressure from slow consumers.

* Disconnected but subscribed consumers cause messages to back up on disk and eventually clog the broker. Kafka stores all messages. There's no clogging or backup, that's just how it works.

* Brokers must track whether a consumer actually received the message, failures can cause missed messages or clogs. Kafka clients may read from a given point on the topic forward. If they fail during a read, they just back up and read again. The messages will be there for hours/days/weeks as configured.

With a rock-steady durable messaging system based on commit logs, all of those problems that arose from attempting to avoid durable messaging go away.

Now you build microservices that emit and respond to events. Microservices that can "rehydrate" their state from private checkpoints and topic replays. And all of this with partition tolerance and simple mirroring.

Although it isn't usually necessary, if you want, you can make all of that elastic with Mesos, too.

Further reading:

[1] https://engineering.linkedin.com/distributed-systems/log-wha...

[2] http://mesos.apache.org/

StreamBright · on March 15, 2016

Kafka saved my "life" few times. We had TTL set to 168h and somebody pushed a change to production that silently ignored a type of message. We realized it few days later. Luckily we could re-play all of the messages after fixing the code. I know there are so many things wrong with this, yet, Kafka is excellent at storing data for medium terms and that can be a real bliss.

atombender · on March 14, 2016

Kafka is very good for synchronizing data streams, but I can't imagine that it's suitable for RPC?

slowmovintarget · on March 15, 2016

You don't (or shouldn't) do ordinary RPC over messaging any way. Rather, use an Event Sourcing style: http://martinfowler.com/eaaDev/EventSourcing.html

atombender · on March 15, 2016

I didn't read that web page carefully, but it seems to describe a transaction log, which is what Kafka excels it, but it has precious title to do with RPC.

RPC is point-to-point communication based on requests and replies. Kafka's strictly sequential requirement would be terrible for this because a single slow request would hold up its entire partition — no other consumer would be able to process the pending upstream events. Kafka is also persistent (does it have in-memory queues?), which is pointless for RPC.

eropple · on March 15, 2016

Message queues, period, aren't particularly good for RPC. HTTP, as an online protocol, has such huge advantages that trying to replace it doesn't make sense to me. However, for comms between services where a consumer isn't waiting on the other end, a message queue is plenty appropriate--and Kafka is much, much better at that than RabbitMQ is in terms of throughput and data sanity.

I also quite like NATS, but Kafka provides similar performance characteristics in the general case (generally higher latency being the exception, though I have never encountered latency-sensitive processes where a message queue made sense in the first place) and means babysitting fewer systems.

atombender · on March 15, 2016

To be clear, NATS is not a heavy messaging broker like RabbitMQ. For one, it's in-memory only, and queues only exist when there are consumers: If you publish and there are no subscribers, the message doesn't go anywhere. NATS is closer to ZeroMQ than RabbitMQ or Kafka.

A lot of people use HAProxy to route messages via HTTP to microservices — what's HAProxy if not a glorified message queue?

If you don't use an intermediate — meaning you to point-to-point HTTP between one microservice and another — you have to find a way to discover peers, perform health checks, load-balance between them, and so on. Which you can do — services like etcd and Consul exist for this — but using an intermediary such as NATS or Linkerd [1] is also a great, possibly simpler solution.

[1] https://linkerd.io/

slowmovintarget · on March 16, 2016

The point I was making is that use of messaging in the first place implies an architectural style other than RPC.

If you're going to use RPC, host a REST endpoint and cache the living daylights out of it.

atombender · on March 16, 2016

But if you read what I wrote elsewhere in this thread, NATS is not a traditional messaging broker, and is highly suitable for RPC.

grahar64 · on March 14, 2016

See above comment, in the file https://github.com/LoyaltyNZ/alchemy-framework/blob/master/s... is a description of creating a cluster of RabbitMQ nodes that can auto heal if an individual node goes down. We are running CoreOS which will occasionally shutdown a node and update. When this happens we see zero downtime and no error messages.

sagichmal · on March 14, 2016

Even if you have a cluster of Rabbit nodes, the logical RabbitMQ cluster is still a single point of failure.

This is a fundamental problem with message bus architectures that advocates seem to ignore. It's even more problematic in a microservices architecture, where you (presumably) do domain-driven design in order to allow overall forward progress in the face of partial service/component/datastore outages. To throw all that out the window by coupling everything to a message bus... I still don't really understand.

grahar64 · on March 14, 2016

In my experience failure occurs more frequently when you use more and more systems in more complex ways. e.g. using HAProxy for load balancing, with Consul for service discovery and Consul template for configuration. Each of these is a single point of failure as they are all required for the system to work.

If you define single point of failure, as any computer goes down takes the system with it, then RabbitMQ is not a single point of failure.

I am not sure how domain driven design helps solve this.

philsnow · on March 14, 2016

> HAProxy for load balancing, with Consul for service discovery and Consul template for configuration. Each of these is a single point of failure as they are all required for the system to work.

Not necessarily. I don't know anything about consul, but if you use something like zookeeper to discover services and write those into an HAProxy config, include a failsafe in whatever writes the HAProxy config on ZK updates such that if the delta is "too large" it will refuse to rewrite the config.

Then if ZK becomes unavailable, what you lose is the ability to easily _make changes_ to what's in the service list. If your service instances come and go relatively infrequently, this might be fine while the ZK fire get put out.

grahar64 · on March 15, 2016

Service instances in a continuous deployment environment are coming and going all day. IF your service discovery and config breaks then everything stops, nothing can be developed or deployed until the broken stuff is fixed.

sagichmal · on March 15, 2016

If SD or config mgmt dies, you can't deploy new stuff, but the existing services continue to work. When your message bus dies, everything dies. It's a fundamentally different failure.

luan42 · on March 14, 2016

So what is the solution then?

hassy · on March 14, 2016

Depends on the application, but as one possible general solution: have microservices talk to each other directly when it makes sense to rather than communicating over RabbitMQ / central message bus out of laziness/convenience.

pm90 · on March 15, 2016

Doesn't this make the architecture a lot more complex though? I mean if every service uses a common messaging broker, number of connections to every other service is O(n). While if every microservice needs to talk to every other microservice, its O(n2). And analysis is much harder, unless all the services send their logs/metrics to a common logging/metrics system.

brown9-2 · on March 15, 2016

You shouldn't need every single microservice to talk to every single other microservice. If so, you have a design problem.

grahar64 · on March 15, 2016

If you don't have every service talk to every other service then it is either a deployment problem or a load balance problem.

If a service only say talks to instances of another service that is local, then every node in a cluster must contain ALL services. If the local service is overloaded but a remote service isn't then the service you are talking to will be slow, regardless of any front end load balancing.

Every service must be able to talk to every other service, because if it cannot then you cannot load balance or deploy without it being an n^2 problem. So the question is how to implement it WITHOUT complicating the services

sagichmal · on March 15, 2016

profilesvc may only need to talk to (depend on) usersvc and accountsvc to do its job. usersvc and accountsvc may only need to talk to (depend on) their data stores.

Yes, every instance of every service needs to manage its communication paths to other services. That's N connections, rather than just 1 to the message bus. But this is a pretty well-understood problem. We have connection pools and circuit breakers and so on. And the risks are distributed, isolated, heterogeneous.

ebiester · on March 14, 2016

So you have a load balancer for every set of microservices? (Presumably, you have more than one instance running.)

brown9-2 · on March 15, 2016

Service Discovery often removes the need for load balancers. Let the clients discover where all the instances of Service X are and build the clients to handle failures to connect to individual instances.

grahar64 · on March 15, 2016

Service discovery does not remove the need for load balancing.

For example, if you had three nodes, each with every service and round robin service discovery to overload the system is just a matter of receiving a difficult request every third query. No matter how good your front end load balancing is in a micro service system, if your intra-service requests are not load balanced you can have problems with overloading one node, while others are idle.

sagichmal · on March 15, 2016

Service discovery can remove the need for load balancing, if you move load balancing logic into the client services. Have them architect their own load balancing over available instances of their dependent services.

Unless you're deploying a Smartstack-esque LB strategy, where each physical node hosts its own load balancer, then the details of what's deployed where are mostly irrelevant. You use your SD system to abstract away the physical dimension of the problem, and address logical clusters of service instances. And you rely on your scheduler to distribute service instances evenly among nodes.

officialchicken · on March 14, 2016

If it goes down the Rabbit hole, you've engineered it wrong.

eddd · on March 14, 2016

Looks pretty cool, but doing RPC over RMQ is quite expensive. I would rather see good abstractions over ZMQ - that would be really cool.

Also as many people here pointed out, RMQ is a single point of failure which might be acceptable for some cases. To me, the problem is that by design this architecture has a bottleneck which is really hard to get rid off.

mugsie · on March 14, 2016

OpenStack has had this as a common design pattern across a lot of the services for a while.

It works quite well for us now - there was a period 2-3 years ago where rabbitmq was the bane of my existence, and the cause of many a page but newer versions have been fine.

We do assume that everything in the queue can go away, and only in flight calls will fail - and (at least in the service I work on) we do retries if something falls through the cracks.

There is a shared library we use for it - http://docs.openstack.org/developer/oslo.messaging/

It also has drivers for ZMQ, and QPID afaik.

pm90 · on March 15, 2016

Yes, I have worked with the OpenStack messaging queue, and its really amazing how it just works. Also, easy to grab analytics about service usage by querying the queue directly, is amazing.

eeZi · on March 14, 2016

How does this compare to Autobahn / crossbar.io?

http://autobahn.ws/

http://crossbar.io/

I really like the service discovery in Crossbar ("routed RPC").

grahar64 · on March 14, 2016

I had not heard of these before. But they look really interesting. I think they are pretty similar as it is a common pattern, and straight forward solution to many problems.

nitrogen · on March 15, 2016

This looks cool. I wrote a less developed than this and unreleased RabbitMQ-based microframework that combines RPC and one-way event streams to integrate an ecommerce site in one datacenter with a back office in another. At another company I used ZeroMQ to make a job processing cluster.

Comments by others suggesting the use of http are kind of missing the point of message queues. The latency added by http(s) made interactive RPC sluggish, and made batch back office updates take hours instead of minutes with AMQP.

If I had Alchemy two years ago I might have used this instead of rolling my own.

kitwalker12 · on March 14, 2016

this is helpful. we already use rabbitmq for our golang services. This would ease the task of setting up publishers and subscribers in the node APIs we're building.

ajmurmann · on March 15, 2016

I've been working on something similar but decided to go with Kafka as the message bus because I want the messages to persist. This allows for more error recovery solutions and auditing. Still having all that data around also allows me to come up with new ideas to use that days after the fact.

jhgg · on March 15, 2016

I feel like this is a bit silly. Feel like one could just speak HTTP or Thrift, and use etcd or zookeeper for discovery. RabbitMQ just complicates the stack. Also, it's HA options are generally... disappointing.

sunnya · on March 15, 2016

Check out http://restbus.org , a similar project for .NET.

swang · on March 14, 2016

What are you using for general load balancing/availability? A consistent hashring?

grahar64 · on March 14, 2016

I do not understand the question. In production we use an ELB hitting multiple routers, on multiple nodes. Not sure how hashring applies to load balancing.

philip142au · on March 15, 2016

Use AKKA

jorgecurio · on March 15, 2016

as someone who's used RabbitMQ in a SaaS app to run millions of tasks, heed my warning, RabbitMQ is great if you are low on memory but the overhead was too much to deal with. Celery also contributed to this but if I could travel back in time, I would stop myself from using RabbitMQ. Instead, Redis is a much better alternative for micro-service.

grahar64 · on March 15, 2016

Redis pub/sub (which I assume you are talking about as an alternative) is fine, but you have to be listening when the message is sent otherwise you miss it.

Another problem is working out if where you sent it there are any listeners, which RabbitMQ deals with return queues. This is important for 404's when a client is trying to talk to a service that does not exist. In Redis you would just post to a pub/sub queue wait till nothing happens and timeout, rather than immediately knowing through a return queue.

lyha · on March 15, 2016

Actually Redis returns the number of consumers that were subscribed during your publish and received the message, so you can detect that no service is available and return your 404 status. The real limitation for RPC over Redis pub/sub to me is that you can only broadcast your messages to all consumers, and not have them load-balanced between the subscribed applications. So you end-up following a discover/unicast call pattern where Redis doesn't bring much to the table for the actual RPC call. Maybe that's an opportunity for a cool addition to redis...

grahar64 · on March 15, 2016

Cheers, I didn't know that about Redis. I actually really like Redis and it simplicity, it would be cool to get more options for passing messages over Redis.