Migrating Message Queues - Aaryamann Challani

March 13, 2022

Introduction

It's been a while since my last post, and that's because I've been focusing on work, along with finishing up university. I haven't had much time to keep up with developments in the crypto ecosystem. I'm okay with that, because it was just NFTs :P

I am an advocate for KubeMQ, with the ease it provides versus other MQ servers while operating on K8s. However, due to certain reasons I won't get into, we had to migrate our MQ Solution. The question that came up was - How do we do it with zero downtime? This post is precisely about that.

Step 1: Analysis

As part of the initial analysis, we had to nail down a few details -

Ideal MQ server for migration
Prioritize workloads according to MQ dependency

For the first detail, we looked at many MQ servers, some of which I do have experience with. We ended up going with RabbitMQ, because it offers both ephemeral and persistent queues, along with being highly configurable with few changes to the codebase. Not to mention, the ease of deployment as well.

For the second detail, an ast-parsing script was written that went through all our workloads to check whether they published/subscribed to a channel. With this list of workloads, we then began to prioritize them according to mission importance. There are other ways to find out which workloads use the pubsub feature, for example, it is possible to get a list of clients from KubeMQ's REST API.

Step 2: Work out RabbitMQ's configuration

As part of deploying RabbitMQ, we needed to ensure that we used an appropriate configuration, for things like user management etc. Since we like looking at dashboards, we also had to enable metrics. Metrics also gives us a look into the performance of RabbitMQ vs KubeMQ.

Step 3: Deploy RabbitMQ

Deploying RabbitMQ was fairly easy, as we use helm charts for majority of third party services. We used Terraform for this, obviously :)

Step 4: Update client libraries to use RabbitMQ

None of our workloads directly make use of KubeMQ's client libraries, they make use of a thin wrapper we wrote over the KubeMQ implementation. This way, we can change the underlying MQ server without having any changes client side :)

We have 2 packages for the MQ implementation, one for ephemeral and one for persistent queuing. Both were updated to use RabbitMQ. A new version of both these packages were created.

Step 5: Testing the waters

To test if -

RabbitMQ was configured correctly
Client libraries were OK for prod use

We bumped the versions of the non-critical workloads to use the new pubsub packages, and triggered a deploy of them as well. After ironing out a couple bugs that occurred, we were ready to migrate our mission critical workloads to use those packages too.

Step 6: Farewell, KubeMQ

After our main workloads started to use RabbitMQ, and we were satisfied with the performance, it was time to deprecate KubeMQ in our org. To do this, we simply deleted the kubemq namespace from our Terraform config. Any of the services that prevented the namespace deletion were also cleaned up forcefully.

All this took place on a development cluster, and moving this to a staging/production cluster is as easy as 1-2-3 (best case, lol)

Conclusion

I haven't done an MQ migration before, and thought to share my workflow behind it. I would appreciate any feedback you have/tips as well (HMU @ [email protected])

Thanks for reading!