Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll walk through using feature flags to pause and resume a Kafka Consumer on the fly.
You may be wondering, why not just shut down the consumer(s) to “pause” processing? This approach is 100% effective and accomplishes the end goal of stopping consumption. However, there are downsides when dealing with a large-scale consumer architecture.
- Rebalances – depending on the layout of your consumer group and the number of partitions, the rebalance process that triggers when a consumer leaves the group can be painful and time-consuming.
- Lack of Granularity – if the consumer is reading multiple topics, you shut down consumption of everything. There is no in-between.
- Responsive Processing – to re-enable processing, all of the consumers have to start back up and rebalance partitions. With 100+ consumer instances, this can be time-consuming.
Before getting into how to incorporate feature flags we should look at the functionality that already exists in the Kafka client libraries to dynamically pause and resume processing. We’re already closer to the end goal than you may have realized.
Pausing Kafka Consumers
The Kafka Consumer client has the functionality to fetch assigned partitions, pause specific partitions, fetch currently paused partitions, and resume specific partitions. These methods are all you need to pause and resume consumption dynamically.
Let’s look closer at these operations.
KafkaConsumer.assignment()
: returns the currently assigned topic partitions in the form of aSet of TopicPartition
. TheTopicPartition
contains two primary properties, the topic name as a String and the partition as an int.KafkaConsumer.pause(Set of TopicPartition)
: suspends fetching from the provided topic partitions. Consumer polling will continue, which is key to keeping the consumer alive and avoiding rebalances. However, future polls will not return any records from paused partitions until they have been resumed.KafkaConsumer.paused()
: returns the topic partitions that are currently assigned and in a paused state as a result of using thepause
method.KafkaConsumer.resume(Set of TopicPartition)
: resumes fetching from the provided topic partitions.
Option 1: Pause the World
The simplest way to pause processing is to pause everything. The snippet below illustrates this.
Option 2: Selective Pause
It’s helpful to have the big red button to shut down all processing but it would be more helpful if we could control things at a more granular level too.
The next snippet shows what pausing the partitions for a single topic might look like.
Unleash the Flags
With the previous snippet in hand, we’re almost there. We need to wire up a feature flag solution to provide the “pausedTopic” value(s) at runtime. Using a naming convention such as topic_{topic-name-here}
, the application can pull all of the flags and filter out only those that it cares about.
The pseudo-code below shows what this might look like.
You can integrate any feature flag solution, but let’s wire up Unleash.
Unleash
Unleash is an open-source feature management platform that has worked well for me. Gitlab even offers it as a built-in service available in all tiers as of Gitlab 13.5. A full list of capabilities can be found here.
Below is an example of configuring the Unleash client in a Java Spring Boot application. If you’re using Gitlab, there are clear instructions on integrating feature flags with your application.
Fetching Feature Flags
We’ve looked at the pause/resume functionality and how a feature flag naming convention can be used to target the topics that should be paused/resumed. Let’s tie the two together with two strategies for fetching and applying the toggles.
Option 1: Poll Loop
The simplest option is to integrate the feature flag checks right into the poll loop. Unleash will be refreshing the flag states in a background thread so each check will be hitting cache when looking at the latest values and seeing what, if anything, needs to be paused. The benefit of this approach is that everything happens right in the polling thread which is important since the consumer is not thread-safe.
Option 2: UnleashEvent Subscriber
Unleash has an internal eventing design that makes it easy to subscribe to the events that get triggered after refreshing the toggles. This is the most up-to-date representation of the flag states because the event is fired immediately after the cache is updated.
As mentioned in Option 1, Consumers are not thread-safe, so you have to handle consumer operations appropriately and run the consumer pause operations on the polling thread.
The main benefit of this is that it doesn’t clutter the poll loop with responsibilities.
Don’t Forget about Rebalances
The final piece that needs to be accounted for is the ConsumerRebalanceListener. With multiple consumers running in a group, each is responsible for pausing their assigned partitions. If a consumer dies, their assigned partitions automatically rebalance to the other consumers in the group. However, the consumer receiving the newly assigned partitions has no awareness of their previous state (paused/active) so they will be active after the assignment.
The RebalanceListener is your hook in the rebalance lifecycle to pause partitions before record consumption begins and avoid accidentally consuming a topic that should be paused. Using the components that have been built above, create or update the onPartitionsAssigned
method to keep partitions paused if necessary.
The post Unleashing Feature Flags onto Kafka Consumers first appeared on Object Partners.