getting started with golang and kafka
Introduction #
In this Getting Started guide, we will explore integrating Golang with Apache Kafka. Kafka, is found in many distributed systems, as it excels in handling large-scale and real-time data streams. Its architecture, designed for high throughput and scalability, makes it indispensable in modern data processing pipelines.
Golang, with its concise syntax and efficient concurrency management, stands as an ideal ally for Kafka. This blend of Kafka’s data handling prowess and Golang’s execution efficiency creates a formidable combination, perfect for building high-performance applications.
In this guide, we will delve into the mechanics of Kafka, uncovering its core components. We will also explore how Golang’s features complement Kafka’s capabilities. Our journey will be lined with detailed examples and practical insights, serving both as a reference and a learning experience.
Before we proceed any further, it is important to note that this guide focuses primarily on the integration of Kafka with Golang for publishing and consuming messages. Running and operating a Kafka cluster involves intricate details and complexities that are beyond the scope of this article. Our aim is to provide a clear and practical approach to using Kafka within Golang applications, rather than delving into the specifics of Kafka cluster management. For those interested in Kafka cluster operations, numerous comprehensive resources and official documentation are available for a more in-depth exploration.
With this understanding, let’s move forward into the heart of integrating Kafka with Golang.
Kafka and Golang - A Perfect Match? #
Apache Kafka is an open-source streaming platform. Kafka is used to build high performance data pipelines, streaming analytics, and event-driven applications. Kafka is a distributed system consisting of servers and clients. The servers are known as brokers and the clients can be either producers or consumers. Kafka is designed to be highly scalable, fault-tolerant, and fast. Kafka is used by many companies including: Cloudflare, Uber, Netflix, LinkedIn and many more.
Understanding Kafka #
Kafka, at its core, is a distributed streaming platform. It is designed to handle large volumes of data in real-time. Initially conceived at LinkedIn, it has grown into a critical component for managing streaming data in coutless industries. Kafka operates on a publish-subscribe model, with key components including producers, consumers, and brokers.
Below the diagram shows the high-level architecture of Kafka.
Some of the key components of Kafka’s architecture include:
- Producers - Producers are clients that publish data to Kafka topics.
- Consumers - Consumers are clients that subscribe to Kafka topics and consume data.
- Brokers - Brokers are servers that manage the persistence and replication of data.
- Topics - Topics are categories that data is published to and consumed from.
Why Golang? #
Golang is a statically typed, compiled programming language designed at Google. Golang is known for it’s simple syntax and powerful concurrency model, making it particularly suited for distributed systems. When it comes to Kafka integration, Golang’s features offer several benefits:
- Efficiency and Performance - Golang’s efficient execution model maximises Kafka’s throughput and scalability.
- Simplicity and Readability - The straightforward syntax makes the implementation of Kafka clients more maintainable.
- Robust Concurrency Model - Golang’s goroutines and channels provide a robust environment for handling concurrent Kafka streams.
When it comes to interacting with Kafka in Golang, there are several options available. The most popular are:
- sarama - A Golang library for Apache Kafka.
- confluent-kafka-go - Confluent’s Golang client
- kafka-go - Kafka library by Segment
- franz-go - Feature complete Kafka library in pure Go
Each of these libraries has its own strengths and weaknesses. For this guide, we will focus on the franz-go
library. It is a pure Go implementation of Kafka, with no dependencies on C bindings.
Crafting the Kafka Producer in Golang #
Now that we have a better understanding of Kafka and Golang, let’s explore how we can integrate the two together. We will start by building a Kafka producer in Golang. A Kafka producer is an application that publishes data to Kafka topics. This process is critical for feeding data in the Kafka ecosystem.
Ensure that you have Kafka installed and running. If you don’t have Kafka installed, you can follow the official quickstart guide to get up and running. Another alternative is to use Redpanda, a Kafka-compatible event streaming platform built for modern architectures. Redpanda is a drop-in replacement for Kafka, and it is built for performance and scale. You can find the installation instructions for Redpanda here.
Setting up a Golang Kafka Producer #
Getting set up with the twmb/franz-go
library is straightforward. We can install the library using the go get
command:
$ go get github.com/twmb/franz-go@latest
The first step for creating a Kafka producer is initialising a Kafka client. We can do this by creating a new client with the kgo.NewClient
function. This function takes a variadic list of options that can be used to configure the client.
package main
import (
"context"
"github.com/twmb/franz-go/pkg/kgo"
)
func main() {
opts := []kgo.Opt{
kgo.SeedBrokers("localhost:9092"),
kgo.DefaultProduceTopic("my-topic"),
kgo.ClientID("my-client-id"),
}
client, err := kgo.NewClient(opts...)
if err != nil {
// TODO: handle/log this error
return
}
defer client.Close()
}
Now that we have a Kafka client, we can start producing messages to Kafka. This is really simple to do with the franz-go
library. But first we need to create our message.
record := &kgo.Record{
Value: []byte("Hello World"),
Topic: "my-first-topic",
}
Now that we have our kgo.Record
which contains our message and the topic we want to send it to, we can produce the message to Kafka.
if err := client.ProduceSync(context.Background(), record).FirstErr(); err != nil {
// TODO: handle/log this error
return
}
Lets now put all of this together into a complete example.
package main
import (
"context"
"github.com/twmb/franz-go/pkg/kgo"
)
func main() {
opts := []kgo.Opt{
kgo.SeedBrokers("localhost:9092"),
kgo.DefaultProduceTopic("my-topic"),
kgo.ClientID("my-client-id"),
}
client, err := kgo.NewClient(opts...)
if err != nil {
// TODO: handle/log this error
return
}
defer client.Close()
record := &kgo.Record{
Value: []byte("Hello World"),
Topic: "my-topic",
}
if err := client.ProduceSync(context.Background(), record).FirstErr(); err != nil {
// TODO: handle/log this error
return
}
}
It should be noted the above example does not handle any errors. In a production system, you would want to handle errors and log them accordingly. You would also want to look at other best practices such as message serialisation and graceful shutdowns.
By following the steps above, we have successfully created a Kafka producer in Golang using the franz-go
library. This producer is now capable of publishing messages to a Kafka topic, forming the basis of a data pipeline into Kafka. In the next section we will turn our attention to other side of this equation: consuming message from Kafka.
Building the Kafka Consumer #
In the previous section, we explored how to create a Kafka producer in Golang. Now we will turn our attention to building a Kafka consumer. Building a robust Kafka consumer is essential for processing data from Kafka topics, especially high-throughput data streams.
The franz-go
library is known for being a feature complete Kafka library in pure Go. It provides a robust environment for handling Kafka streams, making it an ideal choice for building Kafka consumers in Golang.
Setting up a Golang Kafka Consumer #
Similar to the Kafka producer, we start with creating a Kafka client. We can do this by creating a new client with the kgo.NewClient
function. This function takes a variadic list of options that can be used to configure the client. For a consumer however we need to specify a few other options such as the group id and the topics we want to consume from.
package main
import (
"context"
"fmt"
"github.com/twmb/franz-go/pkg/kgo"
)
func main() {
opts := []kgo.Opt{
kgo.SeedBrokers("localhost:9092"),
kgo.ClientID("consumer-client-id"),
kgo.ConsumerGroup("my-group-identifier"),
kgo.ConsumeTopics("my-topic"),
}
client, err := kgo.NewClient(opts...)
if err != nil {
// TODO: handle/log this error
return
}
defer client.Close()
}
Now that we have a Kafka client, we can start consuming messages from Kafka. This is really simple to do with the franz-go
library.
for {
fetches := client.PollFetches(context.Background())
if errs := fetches.Errors(); len(errs) > 0 {
// All errors are retried internally when fetching, but non-retriable errors are
// returned from polls so that users can notice and take action.
panic(fmt.Sprint(errs))
}
// We can iterate through a record iterator...
iter := fetches.RecordIter()
for !iter.Done() {
record := iter.Next()
fmt.Println(string(record.Value), "from an iterator!")
}
}
Lets now put all of this together into a complete example.
package main
import (
"context"
"fmt"
"github.com/twmb/franz-go/pkg/kgo"
)
func main() {
opts := []kgo.Opt{
kgo.SeedBrokers("localhost:9092"),
kgo.ClientID("consumer-client-id"),
kgo.ConsumerGroup("my-group-identifier"),
kgo.ConsumeTopics("my-topic"),
}
client, err := kgo.NewClient(opts...)
if err != nil {
// TODO: handle/log this error
return
}
defer client.Close()
for {
fetches := client.PollFetches(context.Background())
if errs := fetches.Errors(); len(errs) > 0 {
// All errors are retried internally when fetching, but non-retriable errors are
// returned from polls so that users can notice and take action.
panic(fmt.Sprint(errs))
}
// We can iterate through a record iterator...
iter := fetches.RecordIter()
for !iter.Done() {
record := iter.Next()
fmt.Println(string(record.Value), "from an iterator!")
}
}
}
This example only shows consuming messages from a single topic and using a very simplistic solution. In a production system, you would want to handle errors, log them accordingly, and look at other best practices. The franz-go
library provides a lot of flexibility and features for consuming messages from Kafka.
Some best practices that you would want to consider as well in a production system include:
- Offset Management - Ensure you are tracking which messages you have consumed, avoiding data loss or duplication.
- Concurrency - Leverage Golang’s concurrency model to consume messages in parallel, enhancing throughput and efficiency.
- Error Handling - Implement robust error handling and re-balances to ensure your consumer remains stable under various conditions.
By following the steps above, you have created a simple Kafka consumer in Golang capable of reading and processing messages from Kafka topics. This consumer forms the foundation for any data processing pipline that requires data from Kafka.
In this getting started guide we have explored how to leverage Golang to produce and consume messages from Kafka.
Conclusion #
Through this guide, we have explored the versatile and feature-rich franz-go
library for integrating Kafka with Golang. We’ve uncovered the synergy between Kafka’s scalable messaging capabilities and Golang’s efficiency and simplicity. This forms a powerful combination for building high-performance data pipelines and event-driven applications. Although producing messages was fairly straightforward, consuming messages requires a bit more consideration and the simplification of the example above is not suitable for production. Kafka consumers require careful management of offsets and error handling in order to ensure data is processed correctly.
I have collected the above examples together along with a Docker Compose file that runs Redpanda and a console for managing Redpanda so that you can run the examples yourself. You can find the code and instructions for running the examples here.