Kafka 实现之接口设计 —— 生产者 API、消费者 API

一. 前言

二. 生产者 API

三. 消费者 API

3.1. 低级别 API

3.2. 高级别 API

一. 前言

Kafka 强大的应用程序层的基础是用于访问存储的两个基本 API，即用于写入事件的生产者 API和用于读取事件的消费者 API。在这两个 API 之上构建了用于集成和处理的 API。

二. 生产者 API

The Producer API that wraps the 2 low-level producers - kafka.producer.SyncProducer and kafka.producer.async.AsyncProducer.

生产者 API，它封装了2个低级别的生产者 - kafka.producer.SyncProducer 和 kafka.producer.async.AsyncProducer。

class Producer {/* Sends the data, partitioned by key to the topic using either the *//* synchronous or the asynchronous producer */public void send(kafka.javaapi.producer.ProducerData<K,V> producerData);/* Sends a list of data, partitioned by key to the topic using either *//* the synchronous or the asynchronous producer */public void send(java.util.List<kafka.javaapi.producer.ProducerData<K,V>> producerData);/* Closes the producer and cleans up */public void close();}

The goal is to expose all the producer functionality through a single API to the client. The new producer -

通过 API 提供给客户端，来暴露生产者所有的功能。新的生产者有如下功能：

1. can handle queueing/buffering of multiple producer requests and asynchronous dispatch of the batched data -

kafka.producer.Producer provides the ability to batch multiple produce requests (producer.type=async), before serializing and dispatching them to the appropriate kafka broker partition. The size of the batch can be controlled by a few config parameters. As events enter a queue, they are buffered in a queue, until either queue.time or batch.sizeis reached. A background thread (kafka.producer.async.ProducerSendThread) dequeues the batch of data and lets the kafka.producer.EventHandler serialize and send the data to the appropriate kafka broker partition. A custom event handler can be plugged in through the event.handler config parameter. At various stages of this producer queue pipeline, it is helpful to be able to inject callbacks, either for plugging in custom logging/tracing code or custom monitoring logic. This is possible by implementing the kafka.producer.async.CallbackHandler interface and setting callback.handler config parameter to that class.

1. 可以处理多个生产者请求和异步批量数据派发的队列/缓冲 -

kafka.producer.Producer 提供批处理多个生产请求的能力（producer.type=async），序列化和派发到 Broker 分区之前，可以配置参数控制批量的大小。随着事件进入队列，缓存在队列，直到满足 queue.time 或 batch.size。后台线程（kafka.producer.async.ProducerSendThread）发送一批数据用kafka.producer.EventHandler 序列化并发送到 Broker 分区。还可以通过 event.handler 配置参数可以插入自定义的事件处理。生产者队列管道在不同的阶段，无论是插入自动的日志记录/跟踪代码或自定义的监控逻辑，能够注入回调，通过实现kafka.producer.async.CallbackHandler 接口和设置 callback.handler 配置参数的类。

2. handles the serialization of data through a user-specified Encoder-
interface Encoder<T> {public Message toMessage(T data);
}
The default is the no-op kafka.serializer.DefaultEncoder

2. 处理数据序列化，通过用户指定的 Encoder-

interface Encoder<T> {public Message toMessage(T data);
}

默认是空操作 kafka.serializer.DefaultEncoder。

3. provides software load balancing through an optionally user-specified Partitioner-

The routing decision is influenced by the kafka.producer.Partitioner.
interface Partitioner<T> {int partition(T key, int numPartitions);
}
The partition API uses the key and the number of available broker partitions to return a partition id. This id is used as an index into a sorted list of broker_ids and partitions to pick a broker partition for the producer request. The default partitioning strategy ishash(key)%numPartitions. If the key is null, then a random broker partition is picked. A custom partitioning strategy can also be plugged in using thepartitioner.classconfig parameter.

3. 提供平衡负载，通过用户指定的 Partitioner -

路由决定由kafka.producer.Partitioner影响。

interface Partitioner<T> {int partition(T key, int numPartitions);
}

该分区 API，使用 Key 和可用 Broker 分区数，返回一个分区 ID。这个 id 用作索引 broker_ids 和分区排序列表来为生产者请求挑选一个 Broker 分区。默认的分区策略是 hash（key）% numPartitions。如果 key 是空，则随机 Broker 分区，还可以插入自定义分区策略使用partitioner.class 配置参数。

三. 消费者 API

We have 2 levels of consumer APIs. The low-level "simple" API maintains a connection to a single broker and has a close correspondence to the network requests sent to the server. This API is completely stateless, with the offset being passed in on every request, allowing the user to maintain this metadata however they choose.

我们有两个级别的消费者 API，低级别的“简单”API 保持一个 Broker 连接，并与发送到服务器的网络请求紧密对应。这个 API 是完全无状态的，每个请求都会传递偏移量，允许用户按照自己的选择来维护这个元数据。

The high-level API hides the details of brokers from the consumer and allows consuming off the cluster of machines without concern for the underlying topology. It also maintains the state of what has been consumed. The high-level API also provides the ability to subscribe to topics that match a filter expression (i.e., either a whitelist or a blacklist regular expression).

高级别 API 向消费者隐藏 Broker 的具体细节，并允许在不考虑底层拓扑的情况下消费机器集群。它维持已消费的状态。高级别 API 还提供了还提供订阅与过滤器表达式（即白名单或黑名单正则表达式）匹配 Topic 的能力。

3.1. 低级别 API

class SimpleConsumer {/* Send fetch request to a broker and get back a set of messages. */ public ByteBufferMessageSet fetch(FetchRequest request);/* Send a list of fetch requests to a broker and get back a response set. */ public MultiFetchResponse multifetch(List<FetchRequest> fetches);/*** Get a list of valid offsets (up to maxSize) before the given time.* The result is a list of offsets, in descending order.* @param time: time in millisecs,*              if set to OffsetRequest$.MODULE$.LATIEST_TIME(), get from the latest offset available.*              if set to OffsetRequest$.MODULE$.EARLIEST_TIME(), get from the earliest offset available.*/public long[] getOffsetsBefore(String topic, int partition, long time, int maxNumOffsets);
}

The low-level API is used to implement the high-level API as well as being used directly for some of our offline consumers (such as the hadoop consumer) which have particular requirements around maintaining state.

低级别 API 用于实现高级别 API，并直接用于一些离线消费者（如 hadoop 消费者），这些消费者对维护状态有特殊要求。

3.2. 高级别 API

/* create a connection to the cluster */ 
ConsumerConnector connector = Consumer.create(consumerConfig);interface ConsumerConnector {/*** This method is used to get a list of KafkaStreams, which are iterators over* MessageAndMetadata objects from which you can obtain messages and their* associated metadata (currently only topic).*  Input: a map of <topic, #streams>*  Output: a map of <topic, list of message streams>*/public Map<String,List<KafkaStream>> createMessageStreams(Map<String,Int> topicCountMap); /*** You can also obtain a list of KafkaStreams, that iterate over messages* from topics that match a TopicFilter. (A TopicFilter encapsulates a* whitelist or a blacklist which is a standard Java regex.)*/public List<KafkaStream> createMessageStreamsByFilter(TopicFilter topicFilter, int numStreams);/* Commit the offsets of all messages consumed so far. */public commitOffsets()/* Shut down the connector */public shutdown()
}

This API is centered around iterators, implemented by the KafkaStream class. Each KafkaStream represents the stream of messages from one or more partitions on one or more servers. Each stream is used for single threaded processing, so the client can provide the number of desired streams in the create call. Thus a stream may represent the merging of multiple server partitions (to correspond to the number of processing threads), but each partition only goes to one stream.

这个 API 以迭代器为中心，由 KafkaStream 类实现，每个 KafkaStream 表示来自一个或多个服务器上的一个或多个分区的消息流。每个流都是单线程处理，因此客户端可以在创建调用中提供所需流的数量。因此，一个流可能代表多个服务器分区的合并（对应处理线程的数量），但每个分区只能进入一个流。

The createMessageStreams call registers the consumer for the topic, which results in rebalancing the consumer/broker assignment. The API encourages creating many topic streams in a single call in order to minimize this rebalancing. The createMessageStreamsByFilter call (additionally) registers watchers to discover new topics that match its filter. Note that each stream that createMessageStreamsByFilter returns may iterate over messages from multiple topics (i.e., if multiple topics are allowed by the filter).

createMessageStreams 调用注册消费者的 Topic，这将导致重新平衡消费者/ Broker 分配。API鼓励在单个调用中创建多个 Topic 流，以尽量减少这种重新平衡。createMessageStreamsByFilter调用（另外）登记观察者发现符合过滤器的新 Topic。需要注意的是，createMessageStreamsByFilter 返回的每个流都可能遍历了来自多个 Topic 的消息（即，过滤器允许多个 Topic）。