做網(wǎng)站運營這工作怎么樣注冊域名
文章目錄
- 01 引言
- 02 連接器依賴
- 2.1 kafka連接器依賴
- 2.2 base基礎依賴
- 03 使用方法
- 04 序列化器
- 05 指標監(jiān)控
- 06 項目源碼實戰(zhàn)
- 6.1 包結(jié)構(gòu)
- 6.2 pom.xml依賴
- 6.3 配置文件
- 6.4 創(chuàng)建sink作業(yè)
01 引言
KafkaSink 可將數(shù)據(jù)流寫入一個或多個 Kafka topic
實戰(zhàn)源碼地址,一鍵下載可用:https://gitee.com/shawsongyue/aurora.git
模塊:aurora_flink_connector_kafka
主類:KafkaSinkStreamingJob
02 連接器依賴
2.1 kafka連接器依賴
<!--kafka依賴 start--><dependency><groupId>org.apache.flink</groupId><artifactId>flink-connector-kafka</artifactId><version>3.0.2-1.18</version></dependency><!--kafka依賴 end-->
2.2 base基礎依賴
若是不引入該依賴,項目啟動直接報錯:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/flink/connector/base/source/reader/RecordEmitter
<dependency><groupId>org.apache.flink</groupId><artifactId>flink-connector-base</artifactId><version>1.18.0</version></dependency>
03 使用方法
Kafka sink 提供了構(gòu)建類來創(chuàng)建 KafkaSink
的實例
DataStream<String> stream = ...;KafkaSink<String> sink = KafkaSink.<String>builder().setBootstrapServers(brokers).setRecordSerializer(KafkaRecordSerializationSchema.builder().setTopic("topic-name").setValueSerializationSchema(new SimpleStringSchema()).build()).setDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE).build();stream.sinkTo(sink);以下屬性在構(gòu)建 KafkaSink 時是必須指定的:
Bootstrap servers, setBootstrapServers(String)
消息序列化器(Serializer), setRecordSerializer(KafkaRecordSerializationSchema)
如果使用DeliveryGuarantee.EXACTLY_ONCE 的語義保證,則需要使用 setTransactionalIdPrefix(String)
04 序列化器
-
構(gòu)建時需要提供
KafkaRecordSerializationSchema
來將輸入數(shù)據(jù)轉(zhuǎn)換為 Kafka 的ProducerRecord
。Flink 提供了 schema 構(gòu)建器 以提供一些通用的組件,例如消息鍵(key)/消息體(value)序列化、topic 選擇、消息分區(qū),同樣也可以通過實現(xiàn)對應的接口來進行更豐富的控制。 -
其中消息體(value)序列化方法和 topic 的選擇方法是必須指定的,此外也可以通過
setKafkaKeySerializer(Serializer)
或setKafkaValueSerializer(Serializer)
來使用 Kafka 提供而非 Flink 提供的序列化器
KafkaRecordSerializationSchema.builder().setTopicSelector((element) -> {<your-topic-selection-logic>}).setValueSerializationSchema(new SimpleStringSchema()).setKeySerializationSchema(new SimpleStringSchema()).setPartitioner(new FlinkFixedPartitioner()).build();
05 容錯恢復
`KafkaSink` 總共支持三種不同的語義保證(`DeliveryGuarantee`)。對于 `DeliveryGuarantee.AT_LEAST_ONCE` 和 `DeliveryGuarantee.EXACTLY_ONCE`,Flink checkpoint 必須啟用。默認情況下 `KafkaSink` 使用 `DeliveryGuarantee.NONE`。 以下是對不同語義保證的解釋:
DeliveryGuarantee.NONE
不提供任何保證:消息有可能會因 Kafka broker 的原因發(fā)生丟失或因 Flink 的故障發(fā)生重復。DeliveryGuarantee.AT_LEAST_ONCE
: sink 在 checkpoint 時會等待 Kafka 緩沖區(qū)中的數(shù)據(jù)全部被 Kafka producer 確認。消息不會因 Kafka broker 端發(fā)生的事件而丟失,但可能會在 Flink 重啟時重復,因為 Flink 會重新處理舊數(shù)據(jù)。DeliveryGuarantee.EXACTLY_ONCE
: 該模式下,Kafka sink 會將所有數(shù)據(jù)通過在 checkpoint 時提交的事務寫入。因此,如果 consumer 只讀取已提交的數(shù)據(jù)(參見 Kafka consumer 配置isolation.level
),在 Flink 發(fā)生重啟時不會發(fā)生數(shù)據(jù)重復。然而這會使數(shù)據(jù)在 checkpoint 完成時才會可見,因此請按需調(diào)整 checkpoint 的間隔。請確認事務 ID 的前綴(transactionIdPrefix)對不同的應用是唯一的,以保證不同作業(yè)的事務 不會互相影響!此外,強烈建議將 Kafka 的事務超時時間調(diào)整至遠大于 checkpoint 最大間隔 + 最大重啟時間,否則 Kafka 對未提交事務的過期處理會導致數(shù)據(jù)丟失。
05 指標監(jiān)控
Kafka sink 會在不同的范圍(Scope)中匯報下列指標。
范圍 | 指標 | 用戶變量 | 描述 | 類型 |
---|---|---|---|---|
算子 | currentSendTime | n/a | 發(fā)送最近一條數(shù)據(jù)的耗時。該指標反映最后一條數(shù)據(jù)的瞬時值。 | Gauge |
06 項目源碼實戰(zhàn)
6.1 包結(jié)構(gòu)
6.2 pom.xml依賴
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.xsy</groupId><artifactId>aurora_flink_connector_kafka</artifactId><version>1.0-SNAPSHOT</version><!--屬性設置--><properties><!--java_JDK版本--><java.version>11</java.version><!--maven打包插件--><maven.plugin.version>3.8.1</maven.plugin.version><!--編譯編碼UTF-8--><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><!--輸出報告編碼UTF-8--><project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding><!--json數(shù)據(jù)格式處理工具--><fastjson.version>1.2.75</fastjson.version><!--log4j版本--><log4j.version>2.17.1</log4j.version><!--flink版本--><flink.version>1.18.0</flink.version><!--scala版本--><scala.binary.version>2.11</scala.binary.version></properties><!--通用依賴--><dependencies><!-- json --><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>${fastjson.version}</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.flink/flink-java --><dependency><groupId>org.apache.flink</groupId><artifactId>flink-java</artifactId><version>${flink.version}</version></dependency><dependency><groupId>org.apache.flink</groupId><artifactId>flink-streaming-scala_2.12</artifactId><version>${flink.version}</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --><dependency><groupId>org.apache.flink</groupId><artifactId>flink-clients</artifactId><version>${flink.version}</version></dependency><!--================================集成外部依賴==========================================--><!--集成日志框架 start--><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-slf4j-impl</artifactId><version>${log4j.version}</version></dependency><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-api</artifactId><version>${log4j.version}</version></dependency><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-core</artifactId><version>${log4j.version}</version></dependency><!--集成日志框架 end--><!--kafka依賴 start--><dependency><groupId>org.apache.flink</groupId><artifactId>flink-connector-kafka</artifactId><version>3.0.2-1.18</version></dependency><!--kafka依賴 end--><dependency><groupId>org.apache.flink</groupId><artifactId>flink-connector-base</artifactId><version>1.18.0</version></dependency></dependencies><!--編譯打包--><build><finalName>${project.name}</finalName><!--資源文件打包--><resources><resource><directory>src/main/resources</directory></resource><resource><directory>src/main/java</directory><includes><include>**/*.xml</include></includes></resource></resources><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><version>3.1.1</version><executions><execution><phase>package</phase><goals><goal>shade</goal></goals><configuration><artifactSet><excludes><exclude>org.apache.flink:force-shading</exclude><exclude>org.google.code.flindbugs:jar305</exclude><exclude>org.slf4j:*</exclude><excluder>org.apache.logging.log4j:*</excluder></excludes></artifactSet><filters><filter><artifact>*:*</artifact><excludes><exclude>META-INF/*.SF</exclude><exclude>META-INF/*.DSA</exclude><exclude>META-INF/*.RSA</exclude></excludes></filter></filters><transformers><transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"><mainClass>org.aurora.KafkaStreamingJob</mainClass></transformer></transformers></configuration></execution></executions></plugin></plugins><!--插件統(tǒng)一管理--><pluginManagement><plugins><!--maven打包插件--><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId><version>${spring.boot.version}</version><configuration><fork>true</fork><finalName>${project.build.finalName}</finalName></configuration><executions><execution><goals><goal>repackage</goal></goals></execution></executions></plugin><!--編譯打包插件--><plugin><artifactId>maven-compiler-plugin</artifactId><version>${maven.plugin.version}</version><configuration><source>${java.version}</source><target>${java.version}</target><encoding>UTF-8</encoding><compilerArgs><arg>-parameters</arg></compilerArgs></configuration></plugin></plugins></pluginManagement></build><!--配置Maven項目中需要使用的遠程倉庫--><repositories><repository><id>aliyun-repos</id><url>https://maven.aliyun.com/nexus/content/groups/public/</url><snapshots><enabled>false</enabled></snapshots></repository></repositories><!--用來配置maven插件的遠程倉庫--><pluginRepositories><pluginRepository><id>aliyun-plugin</id><url>https://maven.aliyun.com/nexus/content/groups/public/</url><snapshots><enabled>false</enabled></snapshots></pluginRepository></pluginRepositories></project>
6.3 配置文件
(1)application.properties
#kafka集群地址
kafka.bootstrapServers=localhost:9092
#kafka主題
kafka.topic=topic_a
#kafka消費者組
kafka.group=aurora_group
(2)log4j2.properties
rootLogger.level=INFO
rootLogger.appenderRef.console.ref=ConsoleAppender
appender.console.name=ConsoleAppender
appender.console.type=CONSOLE
appender.console.layout.type=PatternLayout
appender.console.layout.pattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n
log.file=D:\\tmprootLogger.level=INFO
rootLogger.appenderRef.console.ref=ConsoleAppender
appender.console.name=ConsoleAppender
appender.console.type=CONSOLE
appender.console.layout.type=PatternLayout
appender.console.layout.pattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n
log.file=D:\\tmp
6.4 創(chuàng)建sink作業(yè)
package com.aurora;import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.connector.base.DeliveryGuarantee;
import org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema;
import org.apache.flink.connector.kafka.sink.KafkaSink;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.KafkaSourceBuilder;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import org.apache.flink.runtime.state.StateBackend;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;import java.util.ArrayList;/*** @author 淺夏的貓* @description kafka 連接器使用demo作業(yè)* @datetime 22:21 2024/2/1*/
public class KafkaSinkStreamingJob {private static final Logger logger = LoggerFactory.getLogger(KafkaSinkStreamingJob.class);public static void main(String[] args) throws Exception {//===============1.獲取參數(shù)==============================//定義文件路徑String propertiesFilePath = "E:\\project\\aurora_dev\\aurora_flink_connector_kafka\\src\\main\\resources\\application.properties";//方式一:直接使用內(nèi)置工具類ParameterTool paramsMap = ParameterTool.fromPropertiesFile(propertiesFilePath);//================2.初始化kafka參數(shù)==============================String bootstrapServers = paramsMap.get("kafka.bootstrapServers");String topic = paramsMap.get("kafka.topic");KafkaSink<String> sink = KafkaSink.<String>builder()//設置kafka地址.setBootstrapServers(bootstrapServers)//設置消息序列號方式.setRecordSerializer(KafkaRecordSerializationSchema.builder().setTopic(topic).setValueSerializationSchema(new SimpleStringSchema()).build())//至少一次.setDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE).build();//=================4.創(chuàng)建Flink運行環(huán)境=================StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();ArrayList<String> listData = new ArrayList<>();listData.add("test");listData.add("java");listData.add("c++");DataStreamSource<String> dataStreamSource = env.fromCollection(listData);//=================5.數(shù)據(jù)簡單處理======================SingleOutputStreamOperator<String> flatMap = dataStreamSource.flatMap(new FlatMapFunction<String, String>() {@Overridepublic void flatMap(String record, Collector<String> collector) throws Exception {logger.info("正在處理kafka數(shù)據(jù):{}", record);collector.collect(record);}});//數(shù)據(jù)輸出算子flatMap.sinkTo(sink);//=================6.啟動服務=========================================//開啟flink的checkpoint功能:每隔1000ms啟動一個檢查點(設置checkpoint的聲明周期)env.enableCheckpointing(1000);//checkpoint高級選項設置//設置checkpoint的模式為exactly-once(這也是默認值)env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);//確保檢查點之間至少有500ms間隔(即checkpoint的最小間隔)env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);//確保檢查必須在1min之內(nèi)完成,否則就會被丟棄掉(即checkpoint的超時時間)env.getCheckpointConfig().setCheckpointTimeout(60000);//同一時間只允許操作一個檢查點env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);//程序即使被cancel后,也會保留checkpoint數(shù)據(jù),以便根據(jù)實際需要恢復到指定的checkpointenv.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);//設置statebackend,指定state和checkpoint的數(shù)據(jù)存儲位置(checkpoint的數(shù)據(jù)必須得有一個可以持久化存儲的地方)env.getCheckpointConfig().setCheckpointStorage("file:///E:/flink/checkPoint");env.execute();}}