kafka性能测试(转)KAFKA 0.8 PRODUCER PERFORMANCE

来自:http://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/

At LiveRamp, we constantly face scaling challenges as the volume of data that our infrastructure must deal with continues to grow. One such challenge involves the logging system. At present we useScribe as the transport mechanism to get logs from our webapp servers into our HDFS cluster. Scribe has served us well, but we are looking for alternatives because it has the following shortcomings:

  • It provides no support for compression
  • Consumers run in batches (map-reduce jobs) so real-time stats are not possible
  • It is no longer in active development

One of the most promising alternatives to Scribe that addresses all of the above is Kafka. We used Kafka to build a real-time stats system prototype during our last Hackweek, and saw enough promise to do some more in-depth testing. In this post we will focus on producer performance and scaling. Since we intend to put producers in our webapp servers, we are interested in both high overall throughput and low latency when sending individual messages.

WHY KAFKA 0.8

At the time of this writing, Kafka 0.8 has not been released, and documentation for it is scarce. However, since it is a backwards incompatible release that introduces a number of important features, it would make little sense for anyone just getting started with Kafka to invest development effort in the previous version.

All tests in this post were run on this revision of the 0.8 branch.

SETUP

BROKERS

We are starting with a modestly sized cluster of three machines. The specs are as follows:

Each machine has two pairs of disks in a mirroring configuration (RAID-1), which allow us to take advantage of the new multiple data directories feature introduced in Kafka 0.8. This makes it possible for a topic to have separate partitions on different disks, which should significantly increase the throughput per broker. This behavior is configured in the log.dirs setting as shown in the broker configuration below. We used default values for most other settings.

As recommended by the Kafka documentation, we use a separate cluster of three dedicated machines for ZooKeeper. All machines are connected with gigabit links.

PRODUCERS

Our real use case involves a number of webapp servers each producing a relatively modest volume of logs. For this test, however, we used only a few dedicated producer machines using a custom-made tool that simulates the real load. Each producer was configured as follows:

The most important setting here is producer.type, which we set toasync. Asynchronous mode is essential to get the most out of Kafka in terms of throughput. In this mode, each producer keeps an in-memory queue of messages that are sent in batch to the broker when a pre-configured batch size or time interval has been reached. This makes compression much more efficient, especially in a use case like ours in which log lines have string representations of JSON objects, and the same keys are repeated over and over across lines. Having fewer, larger messages also helps to achieve better network utilization.

PERFORMANCE TOOLS

The Kafka distribution provides a producer performance tool that can be invoked with the script bin/kafka-producer-perf-test.sh. While this tool is very useful and flexible, we only used it to corroborate that the results obtained with our own custom tool made sense. This is due to the following reasons:

  • Our tool is written in Java and uses the producer from the Java API.
  • While the message size is adjustable in the Kafka tool, we wanted to use messages with the same content structure as our real production logs.
  • Not all configuration parameters are exposed by the Kafka tool.
  • Our tool makes it possible to set a target throughput, which limits the rate at which threads push messages to the brokers. This is necessary to evaluate latency under realistic load conditions.

THROUGHPUT RESULTS

BASELINE PERFORMANCE

The Kafka documentation claims that producers can push about 50MB/sec through a system with a single broker as long as the batch size is not too small (the default value of 200 should be large enough). We were able to verify this claim very quickly for Kafka 0.7.2 by running the following command on a fresh installation

and obtaining the following results:

Running an equivalent command on a fresh installation of Kafka 0.8, however, gave us markedly worse results:

This is because in an effort to increase availability and durability, version 0.8 introduced intra-cluster replication support, and by default a producer waits for an acknowledgement response from the broker on every message (or batch of messages if async mode is used). It is possible to mimic the old behavior, but we were not very interested in that given that we intend to use replication in production.

Performance degraded further once we started using a sample of real ~1KB sized log messages rather than the synthetic messages produced by the Kafka tool, resulting in a throughput of about 10 MB/sec.

All throughput numbers refer to uncompressed data.

NUMBER OF PRODUCERS

Our first test consisted in evaluating the impact of adding producer machines.

By adding identically configured producer machines, each pushing as many messages as it can, the overall throughput increases slightly. We also observed that throughput was distributed very evenly across the machines.

NUMBER OF PARTITIONS

Next, using all ten machines at our disposal we tested the effect of using different numbers of partitions.

Throughput increases very markedly at first as more brokers and disks on them start hosting different partitions. Once all brokers and disks are used though, adding additional partitions does not seem to have any effect.

NUMBER OF REPLICAS

As we saw in the baseline performance tests, even using a single replica represents a big performance hit when compared to the old system which had no support for replication at all. We were interested in knowing how much of an additional hit we would get when using two and three replicas.

Fortunately, the extra performance hit turned out to be quite small.

NUMBER OF TOPICS

Finally, we tested the effect of increasing the number of topics. Our use case requires only a handful of topics, so we only experimented with small numbers.

Update: Michael G. Noll (see comment below) kindly pointed out that throughput could be improved by disabling ack messages, and provided this post. as a reference of what could be expected. I rerun some of the tests and here are some preliminary results:

  • Using the most realistic scenario (10 partitions, 10 producer machines, 3 replicas, and 1-10 topics, same as the last chart above), I only obtained a very modest 12% increase on average throughput.
  • Since this is very different from the ~2x mentioned in the post, I did some more digging and found the following:
    • Using one producer machine and a topic with 10 partitions and 3 replicas, I was able to reproduce the 2x improvement (21 to 44 MB/sec) with both Kafka's and our own tool (setting it to use synthetic messages)
    • When switching our tool back to real messages (a sample of production logs), that 2x became ~12%
    • Therefore, it appears that the ack message is no longer a big bottleneck once real messages are used.

LATENCY RESULTS

Having an idea of what is the maximum throughput that can be achieved, we investigated the average and maximum latency of sending an individual message, which directly impacts the loading time on a browser hitting our webapp servers (this is the time for a thread using the Kafka producer to return from a call to send, NOT the full producer-broker-consumer cycle). To do this, we configured our tool to limit the rate at which it pushes messages according to a target throughput, and monitored latency for different values of throughput.




The average latency is consistently below 0.02 ms for as long as the target throughput does not reach the maximum throughput. Unfortunately, the maximum latency hovers around 120 ms even for very low values of throughput. Once the producers start trying to push more messages than the brokers can handle, both average and maximum latency increase very dramatically.

Finally, we set queue.enqueue.timeout.ms to 0 in an attempt to prevent the Kafka producer from ever blocking on a call to send, hoping that this would decrease the maximum latency. Unfortunately, this had no effect whatsoever. We got identical results to the graphs above. The only difference was that, as expected, producers started throwing exceptions (kafka.common.QueueFullException) when the target throughput reached the maximum throughput. Also, we observed that once exceptions were thrown, the producers would hang indefinitely despite invoking the close method, and a call toSystem.exit was required to force the application to quit.

CONCLUSIONS

Based on the numbers obtained above, we can draw the following preliminary conclusions:

  • Kafka 0.8 improves availability and durability at the expense of some performance.
  • Throughput seems to scale very well as the number of brokers and/or disks per broker increases.
  • Moderate numbers of producer machines and topics have no negative effect on throughput compared to a single producer and topic.
  • When configured in async mode, producers have very low average latency for each message sent, but there are outliers that take over 100 ms, even when operating at low overall throughput. This poses a problem for our use case.
  • Trying to push more data than the brokers can handle for any sustained period of time has catastrophic consequences, regardless of what timeout settings are used. In our use case this means that we need to either ensure we have spare capacity for spikes, or use something on top of Kafka to absorb spikes.

NEXT STEPS

We have just scratched the surface and there is still a lot of work to be done. Following is a list of some of the things we will probably look into:

  • perform a similar analysis on consumers to make sure high throughput can be sustained regardless of how many consumers are active.
  • experiment with custom partitioners so that each producer needs to communicate with only a subset of the brokers (If/when we add more broker nodes to the cluster).
  • set up a mirroring configuration in which separate Kafka clusters from multiple cloud regions send their traffic to a master cluster.

FEEDBACK WELCOME

It is our hope that the information we provided will be useful for people considering using Kafka for the first time or switching from 0.7 to 0.8. If you have any questions, comments or suggestions please leave them below.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/401644.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Workbox-Window v4.x 中文版

Workbox 目前发了一个大版本,从 v3.x 到了 v4.x,变化有挺大的,下面是在 window 环境下的模块。 什么是 workbox-window? workbox-window 包是一组模块,用于在 window 上下文中运行,也就是说,在你的网页内…

媒体播放器三大底层架构

2019独角兽企业重金招聘Python工程师标准>>> 媒体播放工具,这里主要指视频播放,因为要面临庞大的兼容性和纷繁复杂的算法,从架构上看,能脱颖而出的体系屈指可数。大体来说业界主要有3大架构:MPC、MPlayer和…

PWA 可用性检测工具

针对移动端或者 PC 端浏览器是否对 PWA 可用的问题上,做了一个简单的站点,来实现上述问题的方便检测。让开发者较快的了解终端浏览器的特性支持度。 使用 工具地址:https://lecepin.gitee.io/detect-sw/ 地址二维码: 检测 可…

PWA 应用列表及常用工具

引言 在做 PWA 的过程中自己写了一些相关的应用和工具,在这里整合下,方便记录及查找使用。 应用列表 PWA 支持检测工具番茄钟二维码生成新闻应用身体数据统计应用支付宝集福应用田英章书法字典应用抖音无水印下载应用很好用的备忘录精神氮泵 PWA 支持检…

Struts2中访问HttpServletRequest和HttpSession

2019独角兽企业重金招聘Python工程师标准>>> 关键字: struts2 httpservletrequest httpsession 在没有使用Struts2之前,都习惯使用HttpServletRequest和HttpSession对象来操作相关参数,下面介绍一下在Struts2中访问隐藏的HttpServletRequest和HttpSession的两种方法…

web前端长度单位详解(px、em、rem、%、vw/vh、vmin/vmax、vm、calc())

基础理论1)简介2)绝对长度3)相对长度4)经验之谈1)简介 在前端开发中,会遇到各种不同类型的长度单位,比如px,em,rem等。 而整体的长度单位分为两大类:相对长度 和 绝对长度。 2&…

PWA(Progressive Web App)入门系列:Fetch Request Headers Response Body

前言 在 WEB 中,对于网络请求一直使用的是 XMLHttpRequest API 来处理,XMLHttpRequest 也很强大,传统的 Ajax 也是基于此 API 的。那么为什么 W3C 标准中又加入了类似功能的 Fetch API 呢?他有何优势。 Fetch 什么是 Fetch Fet…

CSS3开发总结(圆角、盒阴影、边界图片)

CSS3开发总结(圆角) 12/100 发布文章 qq_41913971 CSS31)圆角 border-radius2)盒阴影 box-shadow3)边界图片 border-image-source1)圆角 border-radius border-radius属性,复合属性。一个最多可…

深入理解Android的startservice和bindservice

一、首先,让我们确认下什么是service? service就是android系统中的服务,它有这么几个特点:它无法与用户直接进行交互、它必须由用户或者其他程序显式的启动、它的优先级比较高, 它比处于前台的应用优先级低&am…

PWA(Progressive Web App)入门系列:Notification

前言 在很多场景下,需要一种通知的交互方式来提醒用户,传统方式下可以在页面实现一个 Dialog,或通过修改网页的 title 来实现消息的通知。然而传统的实现存在着一定的不足,在网页最小化的情况下,无法查看任何通知&…

PWA(Progressive Web App)入门系列:Push

前言 很多时候,原生应用会通过一些消息推送来唤起用户的关注,增加驻留率。网页该怎么做呢?有没有类似原生应用的推送机制?推送功能又能玩出什么花样呢? Push API Push API 给与了 Web 应用程序接收从服务器发出的推送…

Dom学习笔记

DOM document object model 文档 对象 模型 文档:html页面 文档对象:页面中的元素 文档对象模型: 文档对象模型是w3c 为了能够让js去操作页面中的元素,定义的一套标准 DOM会把当前文档看作一棵树 树种的每一个元素就是文档树 的一个节点 同…

PWA(Progressive Web App)入门系列:消息通讯

前言 serviceWorker 的能力决定它要处理的事情,网站页面的部分逻辑处理会转移到 serviceWorker 层进行处理,这里就要页面层和 serviceWorker 层进行交互来实现消息通讯。 下面就说一下两个环境下的消息通讯。 窗口向 serviceWorker 通讯 这里列举出窗…

查看Linux上程序或进程用到的库

为什么80%的码农都做不了架构师?>>> ldd /path/to/program 要找出某个特定可执行依赖的库,可以使用ldd命令。这个命令调用动态链接器去找到程序的库文件依赖关系。 objdump -p /path/to/program | grep NEEDED 注意!并不推荐为任…

超方便的 IndexDB 库

前言 做为 Web 浏览器层的本地存储,IndexDB 做为一个很好的选择,几乎可以存储任意类型的数据,且是异步的。但是正常使用方式下需要在监听各种事件来处理结果,不是很方便,下面就对这一层进行了包装,使用方便…

BP网络识别26个英文字母matlab

wx供重浩:创享日记 对话框发送:字母识别 获取完整源码源工程文件 一、 设计思想 字符识别在现代日常生活的应用越来越广泛,比如车辆牌照自动识别系统,手写识别系统,办公自动化等等。本文采用BP网络对26个英文字母进行…

PWA(Progressive Web App)入门系列:Sync 后台同步

前言 当我们在一些地下停车场,或者在火车上、电梯等无法避免的信号不稳定的场所,使用网站应用处理一些表单操作或者上传数据的操作时,面临的将是网络连接错误的响应,使用户的操作白费。 而此刻 PWA 的 Sync API 就很好的解决了这…

PWA(Progressive Web App)入门系列:安装 Web 应用

前言 在传统的 Web 应用中,通常只能通过在浏览器的地址栏里输入相应的网址才能进行访问,或者把网页地址创建到桌面上通过点击,然后在浏览器里打开。 传统模式下,图标、启动画面、主题色、视图模式、屏幕方向等等都无法去自定义和…

「工具」IndexDB 版备忘录

前言 工作日常需要做一些备忘录,记录一些要做的事。在 Mac 上有使用系统的备忘录,但功能偏弱且文本格式调整不方便。再就是使用浏览器找专门的备忘录网站,功能是满足了,但是链路长,没有桌面软件直接。 所以最后干脆自…

「工具」PWA Manifest图标及 favicon.ico 生成工具

PWA 其中有一个能力就是把网站安装到系统桌面,以原生应用的体验来运行网站,使用户无需再找开浏览器输入网址进入网站,而是可以直接点击安装好的应用直接运行,给使网站访问缩短路径及增加网站的曝光度。 其中有一个问题就是需要生…