原文:https://cwiki.apache.org/confluence/display/FLUME/Getting+Started
什么是 Flume NG?
Flume NG 旨在比起 Flume OG 变得明显更简单。更小。更easy部署。在这样的情况下,我们不提交Flume NG 到 Flume OG 的后向兼容。当前。我们期待来自感兴趣測试Flume NG 正确性、易用性和与别的系统集成的可能性的人的反馈。
变了什么?
Flume NG (下一代)的实现中尽管保持了非常多原来的概念,但 与 Flume OG (原版) 还是有非常大的差别。假设你熟悉 Flume, h这些可能是你想知道的。
- 你仍会有 sources 和sinks ,他们还做相同的事情. 他们由 channels 连接.
- Channels 可插入式的、命令持久的。
Flume NG ships with an in-memory channel for fast, but non-durable event delivery and a file-based channel for durable event delivery. ?
- Channels 可插入式的、命令持久的。
- 没有很多其它的逻辑或物理的节点。我们能够把全部的物理节点叫做 agents,agents 能够执行0到多个 sources 和 sinks。
- 没有 master 和 ZooKeeper 的依赖了。此时, Flume 执行于一个简单的基于文件配置的系统。
- 一切都是插件,一些面向终于用户的,一些面向工具和系统开发人员的。可插入组件包含 channels, sources, sinks, interceptors, sink processors, 和 event serializers.
获得 Flume NG
Flume在下载页面上有源代码包和二进制文件可用。假设你并不打算为Flume 创建 补丁,二进制文件可能是開始的最好方式。
从源代码中创建
要从源代码中创建,你须要git, Sun JDK 1.6, Apache Maven 3.x, 大约 90MB 的本地硬盘空间和网络连接。
1. 签出源代码
$ git clone https: //git-wip-us.apache.org/repos/asf/flume.git flume $ cd flume $ git checkout trunk |
2. 编译项目
Apache Flume 的创建须要比默认配置很多其它的内存。
我们推荐设置Maven的例如以下选项:
export MAVEN_OPTS= "-Xms512m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=512m" |
# 创建代码和运行測试 (注意: 用 mvn install, 不是 mvn package , 由于我们每天都部署 Jenkins SNAPSHOT jars , 并且Flume 是一个多模块的项目) $ mvn install # ...或者不运行測试的安装 $ mvn install -DskipTests |
(请注意为编译成功 Flume 要求 Google Protocol Buffers 编译器在path 中。你能够依照这里的步骤下载安装它。 here.)
这些在 flume-ng-dist/target 中生成两种包.他们是:
- apache-flume-ng-dist-1.4.0-SNAPSHOT-bin.tar.gz - Flume 的二进制版, 待执行
- apache-flume-ng-dist-1.4.0-SNAPSHOT-src.tar.gz - 仅有源代码的 Flume 公布版
假设你是一个用户,仅仅想要执行 Flume, 你可能想要的是 -bin 版本号。复制一个、解压之,你就准备好用了。
$ cp flume-ng-dist/target/apache-flume- 1.4 . 0 -SNAPSHOT-bin.tar.gz . $ tar -zxvf apache-flume- 1.4 . 0 -SNAPSHOT-bin.tar.gz $ cd apache-flume- 1.4 . 0 -SNAPSHOT-bin |
3.基于工作模板创建你的属性文件(或从头创建一个)
$ cp conf/flume-conf.properties.template conf/flume.conf |
4. (可选) 基于模板创建你的 flume-env.sh 文件(或从头创建一个)。
flume-ng 可运行文件通过在命令行中指定--conf/-c 在conf 文件夹中寻找一个名为 "flume-env.sh" 的文件。 一个使用 flume-env.sh 的样例是在开发你自己的如sources 和 sinks的 Flume NG组件时通过 JAVA_OPTS 指定debugging 或 profiling 选项。
$ cp conf/flume-env.sh.template conf/flume-env.sh |
5. 配置和执行Flume NG
在你配置完 Flume NG (见下),你能够用 bin/flume-ng
运行它. 这个脚本有一些參数和模式。
配置
Flume 用一个基于配置格式的 Java 属性文件。
当执行一个 agent时。须要你通过 -f <file> 选项(见上)的方式告诉 Flume 哪个文件要用。
这个文件可放在不论什么地方,可是从传统-和在未来-conf文件夹才是正确放置配置文件的地方。
让我们開始一个简单的样例. 复制粘贴这些到 conf/flume.conf
:
# 在 agent1上定义一个 叫做ch1的内存channelagent1.channels.ch1.type = memory # 在 agent1 上定义一个叫做avro-source1 的 Avro source 并告诉它 # 绑定到 0.0 . 0.0 : 41414 . 把它和 channel ch1 连接起来. agent1.sources.avro-source1.channels = ch1 agent1.sources.avro-source1.type = avro agent1.sources.avro-source1.bind = 0.0 . 0.0 agent1.sources.avro-source1.port = 41414 # 定义一个 logger sink ,记录它收到的全部事件 # 把它和在同一 channel 上的别的终端相连 agent1.sinks.log-sink1.channel = ch1 agent1.sinks.log-sink1.type = logger # 最后,既然我们已经定义了全部的组件,告诉agent1 我们想要激活 哪一个agent1.channels = ch1 agent1.sources = avro-source1 agent1.sinks = log-sink1 |
这是样例创建了一个内存channel(如,一个不可信或“最小效果”的传输),一个 Avro RPC source。和一个连接他们的日志sink. Avro source 接收的不论什么事件 被路由给 channel ch1并发送给日志sink。须要注意的是定义组件是配置 Flume 的第一半,他们必须被通过列在 <agent>.channels,
<agent>.sources
, (和 sections. Multiple sources, sinks, 和 channels 也可能被列入,按空格分隔)激活。
要看很多其它细节,请看 org.apache.flume.conf.properties.PropertiesFileConfigurationProvider
类的 文档。.
这是一列此时已实现了的 sources, sinks, 和 channels。每一个插件有其自身的选项并须要配置属性,所以请 看文档(如今)。
组件 | 类型 | 描写叙述 | 实现类 |
---|---|---|---|
Channel | memory | 内存中,快,非持久事件传输 | MemoryChannel |
Channel | file | 一个 reading, writing, mapping, 和 manipulating 一个文件 的 channel | FileChannel |
Channel | jdbc | JDBC-based, durable event transport (Derby-based) | JDBCChannel |
Channel | recoverablememory | 一个用本地文件系统做存储的非持久 channel 实现 | RecoverableMemoryChannel |
Channel | org.apache.flume.channel.PseudoTxnMemoryChannel | 主要用作測试,不是生产用的 | PseudoTxnMemoryChannel |
Channel | (custom type as FQCN) | 你自己的 Channel 实现 | (custom FQCN) |
Source | avro | Avro Netty RPC event source | AvroSource |
Source | exec | Execute a long-lived Unix process and read from stdout | ExecSource |
Source | netcat | Netcat style TCP event source | NetcatSource |
Source | seq | Monotonically incrementing sequence generator event source | SequenceGeneratorSource |
Source | org.apache.flume.source.StressSource | 主要用作測试,不是生产用的。Serves as a continuous source of events where each event has the same payload. The payload consists of some number of bytes (specified by size property, defaults to 500) where each byte has the signed value Byte.MAX_VALUE (0x7F, or 127). | org.apache.flume.source.StressSource |
Source | syslogtcp |
| SyslogTcpSource |
Source | syslogudp |
| SyslogUDPSource |
Source | org.apache.flume.source.avroLegacy.AvroLegacySource |
| AvroLegacySource |
Source | org.apache.flume.source.thriftLegacy.ThriftLegacySource |
| ThriftLegacySource |
Source | org.apache.flume.source.scribe.ScribeSource |
| ScribeSource |
Source | (custom type as FQCN) | 你自己的 Source 实现 | (custom FQCN) |
Sink | hdfs | Writes all events received to HDFS (with support for rolling, bucketing, HDFS-200 append, and more) | HDFSEventSink |
Sink | org.apache.flume.sink.hbase.HBaseSink | A simple sink that reads events from a channel and writes them to HBase. | org.apache.flume.sink.hbase.HBaseSink |
Sink | org.apache.flume.sink.hbase.AsyncHBaseSink |
| org.apache.flume.sink.hbase.AsyncHBaseSink |
Sink | logger | Log events at INFO level via configured logging subsystem (log4j by default) | LoggerSink |
Sink | avro | Sink that invokes a pre-defined Avro protocol method for all events it receives (when paired with an avro source, forms tiered collection) | AvroSink |
Sink | file_roll |
| RollingFileSink |
Sink | irc |
| IRCSink |
Sink | null | /dev/null for Flume - blackhole all events received | NullSink |
Sink | (custom type as FQCN) | 你自己的 Sink 实现 | (custom FQCN) |
ChannelSelector | replicating |
| ReplicatingChannelSelector |
ChannelSelector | multiplexing |
| MultiplexingChannelSelector |
ChannelSelector | (custom type) | 你自己的 ChannelSelector 实现 | (custom FQCN) |
SinkProcessor | default |
| DefaultSinkProcessor |
SinkProcessor | failover |
| FailoverSinkProcessor |
SinkProcessor | load_balance | 多sink时提供平衡加载的能力 | LoadBalancingSinkProcessor |
SinkProcessor | (custom type as FQCN) | 你自己的 SinkProcessor 实现 | (custom FQCN) |
Interceptor$Builder | host |
| HostInterceptor$Builder |
Interceptor$Builder | timestamp | TimestampInterceptor | TimestampInterceptor$Builder |
Interceptor$Builder | static |
| StaticInterceptor$Builder |
Interceptor$Builder | regex_filter |
| RegexFilteringInterceptor$Builder |
Interceptor$Builder | (custom type as FQCN) | 你自己的 Interceptor$Builder 实现 | (custom FQCN) |
EventSerializer$Builder | text |
| BodyTextEventSerializer$Builder |
EventSerializer$Builder | avro_event |
| FlumeEventAvroEventSerializer$Builder |
EventSerializer | org.apache.flume.sink.hbase.SimpleHbaseEventSerializer |
| SimpleHbaseEventSerializer |
EventSerializer | org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer |
| SimpleAsyncHbaseEventSerializer |
EventSerializer | org.apache.flume.sink.hbase.RegexHbaseEventSerializer |
| RegexHbaseEventSerializer |
HbaseEventSerializer | Custom implementation of serializer for HBaseSink. | 你自己的 HbaseEventSerializer 实现 | (custom FQCN) |
AsyncHbaseEventSerializer | Custom implementation of serializer for AsyncHbase sink. | 你自己的 AsyncHbaseEventSerializer 实现 | (custom FQCN) |
EventSerializer$Builder | Custom implementation of serializer for all sinks except for HBaseSink and AsyncHBaseSink. | 你自己的 EventSerializer$Builder 实现 | (custom FQCN) |
flume-ng 让你执行一个有利于測试和实验的 Flume NG agent 或一个 Avro client 。
无论如何,你须要指定一个命令(如。 agent
或 avro-client
) 和一个 conf 文件夹 (--conf <conf dir>
).。
全部别的选项都在命令行指定。
用上面的 flume.conf 启动flume server:
bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent1 |
注意,agent 名称是以 -n agent1
指定必须与 -f conf/flume.conf 中给定的名字匹配
你的输出应该像这样:
$ bin/flume-ng agent --conf conf/ -f conf/flume.conf -n agent1 2012 - 03 - 16 16 : 36 : 11 , 918 (main) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java: 58 )] Starting lifecycle supervisor 1 2012 - 03 - 16 16 : 36 : 11 , 921 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java: 54 )] Flume node starting - agent1 2012 - 03 - 16 16 : 36 : 11 , 926 (lifecycleSupervisor- 1 - 0 ) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java: 110 )] Node manager starting 2012 - 03 - 16 16 : 36 : 11 , 928 (lifecycleSupervisor- 1 - 0 ) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java: 58 )] Starting lifecycle supervisor 10 2012 - 03 - 16 16 : 36 : 11 , 929 (lifecycleSupervisor- 1 - 0 ) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java: 114 )] Node manager started 2012 - 03 - 16 16 : 36 : 11 , 926 (lifecycleSupervisor- 1 - 1 ) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java: 67 )] Configuration provider starting 2012 - 03 - 16 16 : 36 : 11 , 930 (lifecycleSupervisor- 1 - 1 ) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java: 87 )] Configuration provider started 2012 - 03 - 16 16 : 36 : 11 , 930 (conf-file-poller- 0 ) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java: 189 )] Checking file:conf/flume.conf for changes 2012 - 03 - 16 16 : 36 : 11 , 931 (conf-file-poller- 0 ) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java: 196 )] Reloading configuration file:conf/flume.conf 2012 - 03 - 16 16 : 36 : 11 , 936 (conf-file-poller- 0 ) [DEBUG - org.apache.flume.conf.properties.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java: 225 )] Starting validation of configuration for agent: agent1, initial-configuration: AgentConfiguration[agent1] SOURCES: {avro-source1=ComponentConfiguration[avro-source1] CONFIG: {port= 41414 , channels=ch1, type=avro, bind= 0.0 . 0.0 } RUNNER: ComponentConfiguration[runner] CONFIG: {} } CHANNELS: {ch1=ComponentConfiguration[ch1] CONFIG: {type=memory} } SINKS: {log-sink1=ComponentConfiguration[log-sink1] CONFIG: {type=logger, channel=ch1} RUNNER: ComponentConfiguration[runner] CONFIG: {} } 2012 - 03 - 16 16 : 36 : 11 , 936 (conf-file-poller- 0 ) [INFO - org.apache.flume.conf.properties.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java: 119 )] Post-validation flume configuration contains configuation for agents: [agent1] 2012 - 03 - 16 16 : 36 : 11 , 937 (conf-file-poller- 0 ) [DEBUG - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java: 67 )] Creating instance of channel ch1 type memory 2012 - 03 - 16 16 : 36 : 11 , 944 (conf-file-poller- 0 ) [DEBUG - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java: 73 )] Creating instance of source avro-source1, type avro 2012 - 03 - 16 16 : 36 : 11 , 957 (conf-file-poller- 0 ) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java: 69 )] Creating instance of sink log-sink1 typelogger 2012 - 03 - 16 16 : 36 : 11 , 963 (conf-file-poller- 0 ) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java: 52 )] Node configuration change:{ sourceRunners:{avro-source1=EventDrivenSourceRunner: { source:AvroSource: { bindAddress: 0.0 . 0.0 port: 41414 } }} sinkRunners:{log-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor @79f6f296 counterGroup:{ name: null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel @43b09468 } } 2012 - 03 - 16 16 : 36 : 11 , 974 (lifecycleSupervisor- 1 - 1 ) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java: 122 )] Avro source starting:AvroSource: { bindAddress: 0.0 . 0.0 port: 41414 } 2012 - 03 - 16 16 : 36 : 11 , 975 (Thread- 1 ) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java: 123 )] Polling sink runner starting 2012 - 03 - 16 16 : 36 : 12 , 352 (lifecycleSupervisor- 1 - 1 ) [DEBUG - org.apache.flume.source.AvroSource.start(AvroSource.java: 132 )] Avro source started |
flume-ng global 选项
选项 | 描写叙述 |
---|---|
--conf,-c <conf> | 在 <conf> 文件夹使用配置 |
--classpath,-C <cp> | 追加到 classpath |
--dryrun,-d | 不真正启动 Flume,仅仅打印命令 |
-Dproperty=value | 设置一个JDK 系统的合适值 |
flume-ng agent 选项
给定 agent 命令,一个 Flume NG agent 将被一个给定的配置文件(必须) 启动。
选项 | 描写叙述 |
---|---|
--conf-file,-f <file> | 声明你要执行哪一个配置文件 (必须) |
--name,-n <agentname> | 声明我们要执行的 agent 的名字(必须) |
flume-ng avro-client 选项
从标准输入执行一个 Avro client,发送文件或数据给一个 Flume NG Avro Source正在监听的指定的主机和port。
选项 | 描写叙述 |
---|---|
--host,-H <hostname> | 指定 Flume agent 的主机名 (可能是本机) |
--port,-p <port> | 指定 Avro source 监听的port号 |
--filename,-F <filename> | 发送 <filename> 的每一行给 Flume (可选) |
--headerFile,-F <file> | 头文件的每一行包括 键/值对 |
Avro client把每一行(以 \n
, \r
, 或 \r\n 结尾
) 都当作一个事件。对Flume 来说 avro-client
命令就是 cat。比如,以下为每个linux用户创建一个事件并将其发送到本机的41414port上的
Flume 的 avro source 上。
在一个新窗体中键入 :
$ bin/flume-ng avro-client --conf conf -H localhost -p 41414 -F /etc/passwd -Dflume.root.logger=DEBUG,console |
你应该看到像这样 :
2012 - 03 - 16 16 : 39 : 17 , 124 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java: 175 )] Finished 2012 - 03 - 16 16 : 39 : 17 , 127 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java: 178 )] Closing reader 2012 - 03 - 16 16 : 39 : 17 , 127 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java: 183 )] Closing transceiver 2012 - 03 - 16 16 : 39 : 17 , 129 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java: 73 )] Exiting |
在你的第一个窗体,即server执行的那个:
2012 - 03 - 16 16 : 39 : 16 , 738 (New I/O server boss # 1 ([id: 0x49e808ca , / 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 41414 ])) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )] [id: 0x0b92a848 , / 1 27.0 . 0.1 : 39577 => / 127.0 . 0.1 : 41414 ] OPEN 2012 - 03 - 16 16 : 39 : 16 , 742 (New I/O server worker # 1 - 1 ) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )] [id: 0x0b92a848 , / 127.0 . 0.1 : 39577 => / 127.0 . 0.1 : 41414 ] BOU ND: / 127.0 . 0.1 : 41414 2012 - 03 - 16 16 : 39 : 16 , 742 (New I/O server worker # 1 - 1 ) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )] [id: 0x0b92a848 , / 127.0 . 0.1 : 39577 => / 127.0 . 0.1 : 41414 ] CON NECTED: / 127.0 . 0.1 : 39577 2012 - 03 - 16 16 : 39 : 17 , 129 (New I/O server worker # 1 - 1 ) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )] [id: 0x0b92a848 , / 127.0 . 0.1 : 39577 :> / 127.0 . 0.1 : 41414 ] DISCONNECTED 2012 - 03 - 16 16 : 39 : 17 , 129 (New I/O server worker # 1 - 1 ) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )] [id: 0x0b92a848 , / 127.0 . 0.1 : 39577 :> / 127.0 . 0.1 : 41414 ] UNBOUND 2012 - 03 - 16 16 : 39 : 17 , 129 (New I/O server worker # 1 - 1 ) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )] [id: 0x0b92a848 , / 127.0 . 0.1 : 39577 :> / 127.0 . 0.1 : 41414 ] CLOSED 2012 - 03 - 16 16 : 39 : 17 , 302 (Thread- 1 ) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java: 68 )] Event: { headers:{} body:[B @5c1ae90c } 2012 - 03 - 16 16 : 39 : 17 , 302 (Thread- 1 ) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java: 68 )] Event: { headers:{} body:[B @6aba4211 } 2012 - 03 - 16 16 : 39 : 17 , 302 (Thread- 1 ) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java: 68 )] Event: { headers:{} body:[B @6a47a0d4 } 2012 - 03 - 16 16 : 39 : 17 , 302 (Thread- 1 ) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java: 68 )] Event: { headers:{} body:[B @48ff4cf } ... |
祝贺你 !
你正在执行 Apache Flume !