中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

當(dāng)前位置: 首頁(yè) > news >正文

電腦網(wǎng)站建設(shè)網(wǎng)絡(luò)推廣公司口碑

電腦網(wǎng)站建設(shè),網(wǎng)絡(luò)推廣公司口碑,wordpress+培訓(xùn)模板下載,網(wǎng)站開(kāi)發(fā)試題庫(kù)AWS Dynamodb簡(jiǎn)介 Amazon DynamoDB 是一種完全托管式、無(wú)服務(wù)器的 NoSQL 鍵值數(shù)據(jù)庫(kù),旨在運(yùn)行任何規(guī)模的高性能應(yīng)用程序。DynamoDB能在任何規(guī)模下實(shí)現(xiàn)不到10毫秒級(jí)的一致響應(yīng),并且它的存儲(chǔ)空間無(wú)限,可在任何規(guī)模提供可靠的性能。DynamoDB 提…

AWS Dynamodb簡(jiǎn)介

  • Amazon DynamoDB 是一種完全托管式、無(wú)服務(wù)器的 NoSQL 鍵值數(shù)據(jù)庫(kù),旨在運(yùn)行任何規(guī)模的高性能應(yīng)用程序。
  • DynamoDB能在任何規(guī)模下實(shí)現(xiàn)不到10毫秒級(jí)的一致響應(yīng),并且它的存儲(chǔ)空間無(wú)限,可在任何規(guī)模提供可靠的性能。
  • DynamoDB 提供內(nèi)置安全性、連續(xù)備份、自動(dòng)多區(qū)域復(fù)制、內(nèi)存緩存和數(shù)據(jù)導(dǎo)出工具。

Redshift簡(jiǎn)介

  • Amazon Redshift是一個(gè)快速、功能強(qiáng)大、完全托管的PB級(jí)別數(shù)據(jù)倉(cāng)庫(kù)服務(wù)。用戶可以在剛開(kāi)始使用幾百GB的數(shù)據(jù),然后在后期擴(kuò)容到PB級(jí)別的數(shù)據(jù)容量。
  • Redshift是一種聯(lián)機(jī)分析處理OLAP(Online Analytics Processing)的類型,支持復(fù)雜的分析操作,側(cè)重決策支持,并且能提供直觀易懂的查詢結(jié)果。

資源準(zhǔn)備

VPC

  • vpc
    • cird block: 10.10.0.0/16
  • internet gateway
  • elastic ip address
  • nat gateway:使用elastic ip address作為public ip
  • public subnet
    • 三個(gè)Availability Zone
  • private subnet
    • 三個(gè)Availability Zone
  • public route table:public subnet關(guān)聯(lián)的route table
    • destination: 0.0.0.0/0 target: internet-gateway-id(允許與外界進(jìn)行通信)
    • destination:10.10.0.0/16 local(內(nèi)部通信)
  • private route table:private subnet關(guān)聯(lián)的route table
    • destination:10.10.0.0/16 local(內(nèi)部通信)
    • destination: 0.0.0.0/0 target: nat-gateway-id(允許內(nèi)部訪問(wèn)外界)
  • web server security group
    • 允許任意ip對(duì)443端口進(jìn)行訪問(wèn)
    • 允許自己的ipdui22端口進(jìn)行訪問(wèn),以便ssh到服務(wù)器上向數(shù)據(jù)庫(kù)插入數(shù)據(jù)
  • glue?redshift connection security group
    • 只包含一條self-referencing rule ,允許同一個(gè)security group對(duì)所有tcp端口進(jìn)行訪
    • 創(chuàng)建Glue connection時(shí)需要使用該security group:
    • Reference: glue connection security group must have a self-referencing rule to allow to allow AWS Glue components to communicate. Specifically, add or confirm that there is a rule of Type All TCP, Protocol is TCP, Port Range includes all ports, and whose Source is the same security group name as the Group ID.
  • private redshift security group
    • 允許vpc內(nèi)部(10.10.0.0/24)對(duì)5439端口進(jìn)行訪問(wèn)
    • 允許glue connection security group對(duì)5439端口進(jìn)行訪問(wèn)
  • public redshift security group
    • 允許vpc內(nèi)部(10.10.0.0/24)對(duì)5439端口進(jìn)行訪問(wèn)
    • 允許kenisis firehose所在region的public ip 對(duì)5439端口進(jìn)行訪問(wèn)
      • 13.58.135.96/27?for US East (Ohio)

      • 52.70.63.192/27?for US East (N. Virginia)

      • 13.57.135.192/27?for US West (N. California)

      • 52.89.255.224/27?for US West (Oregon)

      • 18.253.138.96/27?for AWS GovCloud (US-East)

      • 52.61.204.160/27?for AWS GovCloud (US-West)

      • 35.183.92.128/27?for Canada (Central)

      • 18.162.221.32/27?for Asia Pacific (Hong Kong)

      • 13.232.67.32/27?for Asia Pacific (Mumbai)

      • 13.209.1.64/27?for Asia Pacific (Seoul)

      • 13.228.64.192/27?for Asia Pacific (Singapore)

      • 13.210.67.224/27?for Asia Pacific (Sydney)

      • 13.113.196.224/27?for Asia Pacific (Tokyo)

      • 52.81.151.32/27?for China (Beijing)

      • 161.189.23.64/27?for China (Ningxia)

      • 35.158.127.160/27?for Europe (Frankfurt)

      • 52.19.239.192/27?for Europe (Ireland)

      • 18.130.1.96/27?for Europe (London)

      • 35.180.1.96/27?for Europe (Paris)

      • 13.53.63.224/27?for Europe (Stockholm)

      • 15.185.91.0/27?for Middle East (Bahrain)

      • 18.228.1.128/27?for South America (S?o Paulo)

      • 15.161.135.128/27?for Europe (Milan)

      • 13.244.121.224/27?for Africa (Cape Town)

      • 13.208.177.192/27?for Asia Pacific (Osaka)

      • 108.136.221.64/27?for Asia Pacific (Jakarta)

      • 3.28.159.32/27?for Middle East (UAE)

      • 18.100.71.96/27?for Europe (Spain)

      • 16.62.183.32/27?for Europe (Zurich)

      • 18.60.192.128/27?for Asia Pacific (Hyderabad)

VPC全部資源的serverless文件:

  • custom:bucketNamePrefix 替換為自己的創(chuàng)建的bucket
  • service: dynamodb-to-redshift-vpccustom:bucketNamePrefix: "jessica"provider:name: awsregion: ${opt:region, "ap-southeast-1"}stackName: ${self:service}deploymentBucket:name: com.${self:custom.bucketNamePrefix}.deploy-bucketserverSideEncryption: AES256resources:Parameters:VpcName:Type: StringDefault: "test-vpc"Resources:VPC:Type: "AWS::EC2::VPC"Properties:CidrBlock: "10.10.0.0/16"EnableDnsSupport: trueEnableDnsHostnames: trueInstanceTenancy: defaultTags:- Key: NameValue: !Sub "VPC_${VpcName}"# Internet GatewayInternetGateway:Type: "AWS::EC2::InternetGateway"Properties:Tags:- Key: NameValue: !Sub "VPC_${VpcName}_InternetGateway"VPCGatewayAttachment:Type: "AWS::EC2::VPCGatewayAttachment"Properties:VpcId: !Ref VPCInternetGatewayId: !Ref InternetGateway# web server security groupWebServerSecurityGroup:Type: AWS::EC2::SecurityGroupProperties:GroupDescription: Allow access from publicVpcId: !Ref VPCSecurityGroupIngress:- IpProtocol: tcpFromPort: 443ToPort: 443CidrIp: "0.0.0.0/0"Tags:- Key: NameValue: !Sub "VPC_${VpcName}_WebServerSecurityGroup"# public route tableRouteTablePublic:Type: "AWS::EC2::RouteTable"Properties:VpcId: !Ref VPCTags:- Key: NameValue: !Sub "VPC_${VpcName}_RouteTablePublic"RouteTablePublicInternetRoute:Type: "AWS::EC2::Route"DependsOn: VPCGatewayAttachmentProperties:RouteTableId: !Ref RouteTablePublicDestinationCidrBlock: "0.0.0.0/0"GatewayId: !Ref InternetGateway# public subnetSubnetAPublic:Type: "AWS::EC2::Subnet"Properties:AvailabilityZone: !Select [0, !GetAZs ""]CidrBlock: "10.10.0.0/24"MapPublicIpOnLaunch: trueVpcId: !Ref VPCTags:- Key: NameValue: !Sub "VPC_${VpcName}_SubnetAPublic"RouteTableAssociationAPublic:Type: "AWS::EC2::SubnetRouteTableAssociation"Properties:SubnetId: !Ref SubnetAPublicRouteTableId: !Ref RouteTablePublicSubnetBPublic:Type: "AWS::EC2::Subnet"Properties:AvailabilityZone: !Select [1, !GetAZs ""]CidrBlock: "10.10.32.0/24"MapPublicIpOnLaunch: trueVpcId: !Ref VPCTags:- Key: NameValue: !Sub "VPC_${VpcName}_SubnetBPublic"RouteTableAssociationBPublic:Type: "AWS::EC2::SubnetRouteTableAssociation"Properties:SubnetId: !Ref SubnetBPublicRouteTableId: !Ref RouteTablePublicSubnetCPublic:Type: "AWS::EC2::Subnet"Properties:AvailabilityZone: !Select [2, !GetAZs ""]CidrBlock: "10.10.64.0/24"MapPublicIpOnLaunch: trueVpcId: !Ref VPCTags:- Key: NameValue: !Sub "VPC_${VpcName}_SubnetCPublic"RouteTableAssociationCPublic:Type: "AWS::EC2::SubnetRouteTableAssociation"Properties:SubnetId: !Ref SubnetCPublicRouteTableId: !Ref RouteTablePublic# redshift security groupPrivateRedshiftSecurityGroup:Type: AWS::EC2::SecurityGroupProperties:GroupDescription: Allow access from inside vpcVpcId: !Ref VPCSecurityGroupIngress:- IpProtocol: tcpFromPort: 5439ToPort: 5439CidrIp: 10.10.0.0/24- IpProtocol: tcpFromPort: 5439ToPort: 5439SourceSecurityGroupId: !GetAtt GlueRedshiftConnectionSecurityGroup.GroupIdTags:- Key: NameValue: !Sub "VPC_${VpcName}_PrivateRedshiftSecurityGroup"# redshift security groupPublicRedshiftSecurityGroup:Type: AWS::EC2::SecurityGroupProperties:GroupDescription: Allow access from inside vpc and Kinesis Data Firehose CIDR blockVpcId: !Ref VPCSecurityGroupIngress:- IpProtocol: tcpFromPort: 5439ToPort: 5439CidrIp: 10.10.0.0/24- IpProtocol: tcpFromPort: 5439ToPort: 5439CidrIp: 13.228.64.192/27Tags:- Key: NameValue: !Sub "VPC_${VpcName}_PublicRedshiftSecurityGroup"GlueRedshiftConnectionSecurityGroup:Type: AWS::EC2::SecurityGroupProperties:GroupDescription: Allow self referring for all tcp portsVpcId: !Ref VPCTags:- Key: NameValue: !Sub "VPC_${VpcName}_GlueRedshiftConnectionSecurityGroup"GlueRedshiftConnectionSecurityGroupSelfReferringInboundRule:Type: "AWS::EC2::SecurityGroupIngress"Properties:GroupId: !GetAtt GlueRedshiftConnectionSecurityGroup.GroupIdIpProtocol: tcpFromPort: 0ToPort: 65535SourceSecurityGroupId: !GetAtt GlueRedshiftConnectionSecurityGroup.GroupIdSourceSecurityGroupOwnerId: !Sub "${aws:accountId}"# nat gatewayEIP:Type: "AWS::EC2::EIP"Properties:Domain: vpcNatGateway:Type: "AWS::EC2::NatGateway"Properties:AllocationId: !GetAtt "EIP.AllocationId"SubnetId: !Ref SubnetAPublic# private route tableRouteTablePrivate:Type: "AWS::EC2::RouteTable"Properties:VpcId: !Ref VPCTags:- Key: NameValue: !Sub "VPC_${VpcName}_RouteTablePrivate"RouteTablePrivateRoute:Type: "AWS::EC2::Route"Properties:RouteTableId: !Ref RouteTablePrivateDestinationCidrBlock: "0.0.0.0/0"NatGatewayId: !Ref NatGateway# private subnetSubnetAPrivate:Type: "AWS::EC2::Subnet"Properties:AvailabilityZone: !Select [0, !GetAZs ""]CidrBlock: "10.10.16.0/24"VpcId: !Ref VPCTags:- Key: NameValue: !Sub "VPC_${VpcName}_SubnetAPrivate"RouteTableAssociationAPrivate:Type: "AWS::EC2::SubnetRouteTableAssociation"Properties:SubnetId: !Ref SubnetAPrivateRouteTableId: !Ref RouteTablePrivateSubnetBPrivate:Type: "AWS::EC2::Subnet"Properties:AvailabilityZone: !Select [1, !GetAZs ""]CidrBlock: "10.10.48.0/24"VpcId: !Ref VPCTags:- Key: NameValue: !Sub "VPC_${VpcName}_SubnetBPrivate"RouteTableAssociationBPrivate:Type: "AWS::EC2::SubnetRouteTableAssociation"Properties:SubnetId: !Ref SubnetBPrivateRouteTableId: !Ref RouteTablePrivateSubnetCPrivate:Type: "AWS::EC2::Subnet"Properties:AvailabilityZone: !Select [2, !GetAZs ""]CidrBlock: "10.10.80.0/24"VpcId: !Ref VPCTags:- Key: NameValue: !Sub "VPC_${VpcName}_SubnetCPrivate"RouteTableAssociationCPrivate:Type: "AWS::EC2::SubnetRouteTableAssociation"Properties:SubnetId: !Ref SubnetCPrivateRouteTableId: !Ref RouteTablePrivateOutputs:VPC:Description: "VPC."Value: !Ref VPCExport:Name: !Sub "${self:provider.stackName}"SubnetsPublic:Description: "Subnets public."Value:!Join [",",[!Ref SubnetAPublic, !Ref SubnetBPublic, !Ref SubnetCPublic],]Export:Name: !Sub "${self:provider.stackName}-PublicSubnets"SubnetsPrivate:Description: "Subnets private."Value:!Join [",",[!Ref SubnetAPrivate, !Ref SubnetBPrivate, !Ref SubnetCPrivate],]Export:Name: !Sub "${self:provider.stackName}-PrivateSubnets"DefaultSecurityGroup:Description: "VPC Default Security Group"Value: !GetAtt VPC.DefaultSecurityGroupExport:Name: !Sub "${self:provider.stackName}-DefaultSecurityGroup"WebServerSecurityGroup:Description: "VPC Web Server Security Group"Value: !Ref WebServerSecurityGroupExport:Name: !Sub "${self:provider.stackName}-WebServerSecurityGroup"PrivateRedshiftSecurityGroup:Description: "The id of the RedshiftSecurityGroup"Value: !Ref PrivateRedshiftSecurityGroupExport:Name: !Sub "${self:provider.stackName}-PrivateRedshiftSecurityGroup"PublicRedshiftSecurityGroup:Description: "The id of the RedshiftSecurityGroup"Value: !Ref PublicRedshiftSecurityGroupExport:Name: !Sub "${self:provider.stackName}-PublicRedshiftSecurityGroup"GlueRedshiftConnectionSecurityGroup:Description: "The id of the self referring security group"Value: !Ref GlueRedshiftConnectionSecurityGroupExport:Name: !Sub "${self:provider.stackName}-GlueSelfRefringSecurityGroup"
    

Redshift Cluster

  • Private Cluster subnet group
    • 創(chuàng)建一個(gè)包含private subnet的private subnet group
  • Private Cluster:用于測(cè)試glue job同步數(shù)據(jù)到redshift,PubliclyAccessible必須設(shè)為false,否則glue job無(wú)法連接
    • ClusterSubnetGroupName
      • 使用private subnet group
    • VpcSecurityGroupIds
      • 使用private redshift security group
    • NodeType: dc2.large
    • ClusterType: single-node
    • PubliclyAccessible: false

  • Public Cluster subnet group
    • 創(chuàng)建一個(gè)包含public subnet的public subnet group
  • Public Cluster:用于測(cè)試glue job同步數(shù)據(jù)到redshift,PubliclyAccessible必須設(shè)為true,且security group允許kinesis firehose public ip對(duì)5439端口進(jìn)行訪問(wèn),否則firehose無(wú)法連接到redshift
    • ClusterSubnetGroupName
      • 使用public subnet group
    • VpcSecurityGroupIds
      • 使用public redshift security group
    • NodeType: dc2.large
    • ClusterType: single-node
    • PubliclyAccessible: true

redshift全部資源的serverless文件:

  • custom:bucketNamePrefix 替換為自己的創(chuàng)建的bucket
  • service: dynamodb-to-redshift-redshiftcustom:bucketNamePrefix: "jessica"provider:name: awsregion: ${opt:region, "ap-southeast-1"}stackName: ${self:service}deploymentBucket:name: com.${self:custom.bucketNamePrefix}.deploy-bucketserverSideEncryption: AES256resources:Parameters:ServiceName:Type: StringDefault: dynamodb-to-redshiftResources:PrivateClusterSubnetGroup:Type: "AWS::Redshift::ClusterSubnetGroup"Properties:Description: Private Cluster Subnet GroupSubnetIds:Fn::Split:- ","- Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateSubnetsTags:- Key: NameValue: private-subnetPrivateCluster:Type: "AWS::Redshift::Cluster"Properties:ClusterIdentifier: test-data-sync-redshiftClusterSubnetGroupName: !Ref ClusterSubnetGroupVpcSecurityGroupIds:- Fn::ImportValue: !Sub ${ServiceName}-vpc-PrivateRedshiftSecurityGroupDBName: devMasterUsername: adminMasterUserPassword: Redshift_admin_2022NodeType: dc2.largeClusterType: single-nodePubliclyAccessible: falsePublicClusterSubnetGroup:Type: "AWS::Redshift::ClusterSubnetGroup"Properties:Description: Public Cluster Subnet GroupSubnetIds:Fn::Split:- ","- Fn::ImportValue: !Sub ${ServiceName}-vpc-PublicSubnetsTags:- Key: NameValue: public-subnetPublicCluster:Type: "AWS::Redshift::Cluster"Properties:ClusterIdentifier: test-data-sync-redshift-publicClusterSubnetGroupName: !Ref PublicClusterSubnetGroupVpcSecurityGroupIds:- Fn::ImportValue: !Sub ${ServiceName}-vpc-PublicRedshiftSecurityGroupDBName: devMasterUsername: adminMasterUserPassword: Redshift_admin_2022NodeType: dc2.largeClusterType: single-nodePubliclyAccessible: trueOutputs:PrivateRedshiftEndpoint:Description: "Redshift endpoint"Value: !GetAtt Cluster.Endpoint.AddressExport:Name: !Sub "${self:provider.stackName}-PrivateRedshiftEndpoint"PrivateRedshiftPort:Description: "Redshift port"Value: !GetAtt Cluster.Endpoint.PortExport:Name: !Sub "${self:provider.stackName}-PrivateRedshiftPort"PublicRedshiftEndpoint:Description: "Public Redshift endpoint"Value: !GetAtt PublicCluster.Endpoint.AddressExport:Name: !Sub "${self:provider.stackName}-PublicRedshiftEndpoint"PublicRedshiftPort:Description: "Public Redshift port"Value: !GetAtt PublicCluster.Endpoint.PortExport:Name: !Sub "${self:provider.stackName}-PublicRedshiftPort"
    

使用AWS Glue ETL?Job進(jìn)行同步

適用場(chǎng)景

  • 一次性整表同步
  • 對(duì)于典型的時(shí)間序列數(shù)據(jù)(當(dāng)前的數(shù)據(jù)寫入和讀取頻率高,越老的數(shù)據(jù)讀寫頻率越低),通常會(huì)采用為每個(gè)時(shí)間段(每天)創(chuàng)建一張表的方式來(lái)合理的分配WCU和RCU。
  • 如果在當(dāng)時(shí)時(shí)間段結(jié)束之后,需要對(duì)該時(shí)間段內(nèi)的所有數(shù)據(jù)進(jìn)行復(fù)雜的分析操作,則需要將dynamodb的整表同步到redshift

架構(gòu)

優(yōu)點(diǎn)

  • 使用AWS Glue Crawler可以自動(dòng)管理源表和目標(biāo)表的scheme,在Glue Job script中可以省去mapping的過(guò)程,Glue Job script代碼易維護(hù)

資源部署

  • Dynamodb table: 源數(shù)據(jù)表
  • IAM role for glue crawler,crawler需要連接dynamodb和redshift的權(quán)限以讀取表的scheme
  • Dynamodb glue catalog database:用于存儲(chǔ)crawler生成的dynamodb table scheme
  • Redshift glue catalog database:用于存儲(chǔ)crawler生成的redshift table scheme
  • Dynamodb glue crawler:用于讀取dynamodb表,生成對(duì)應(yīng)的dynamodb table scheme???????
  • Redshift glue crawler:用于讀取redshift表,生成對(duì)應(yīng)的redshift table scheme
  • Glue connection:glue job連接redshift需要用到的connection
  • IAM role for glue job:Glue job需要
  • S3 bucket for glue job
  • glue job

如何部署:

  • sls deploy -c glue-etl.yml
    #replace ${bucketNamePrefix} to your own glue bucket name crate in glue-etl.yml
    aws s3 cp dynamodb-to-redshift.py s3://com.${bucketNamePrefix}.glue-temp-bucket/script/

部署文件:glue-etl.yml

  • service: dynamodb-to-redshift-glue-etlcustom:bucketNamePrefix: "jessica"provider:name: awsregion: ${opt:region, "ap-southeast-1"}stackName: ${self:service}deploymentBucket:name: com.${self:custom.bucketNamePrefix}.deploy-bucketserverSideEncryption: AES256resources:Parameters:DynamodbTableName:Type: StringDefault: "TestSyncToRedshift"ServiceName:Type: StringDefault: dynamodb-to-redshiftGlueBucketName:Type: StringDefault: com.${self:custom.bucketNamePrefix}.glue-etl-temp-bucketResources:TestTable:Type: AWS::DynamoDB::TableProperties:TableName: !Sub ${DynamodbTableName}BillingMode: PAY_PER_REQUESTAttributeDefinitions:- AttributeName: pkAttributeType: S- AttributeName: skAttributeType: SKeySchema:- AttributeName: pkKeyType: HASH- AttributeName: skKeyType: RANGECrawlerRole:Type: AWS::IAM::RoleProperties:RoleName: CrawlerRoleAssumeRolePolicyDocument:Version: "2012-10-17"Statement:- Effect: "Allow"Principal:Service:- "glue.amazonaws.com"Action:- "sts:AssumeRole"ManagedPolicyArns:- arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole- arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess- arn:aws:iam::aws:policy/AmazonRedshiftFullAccess- arn:aws:iam::aws:policy/AmazonS3FullAccessDynamodbDatabase:Type: AWS::Glue::DatabaseProperties:CatalogId: !Ref AWS::AccountIdDatabaseInput:Name: "dynamodb-database"DynamodbCrawler:Type: AWS::Glue::CrawlerProperties:Name: "dynamodb-crawler"Configuration:Role: !GetAtt CrawlerRole.ArnDatabaseName: !Ref DynamodbDatabaseTargets:DynamoDBTargets:- Path: !Sub ${DynamodbTableName}SchemaChangePolicy:UpdateBehavior: "UPDATE_IN_DATABASE"DeleteBehavior: "LOG"Schedule:ScheduleExpression: cron(0/10 * * * ? *) # run every 10 minutesGlueRedshiftConnection:Type: AWS::Glue::ConnectionProperties:CatalogId: !Sub "${aws:accountId}"ConnectionInput:Name: ${self:service}-redshift-connectionConnectionType: JDBCMatchCriteria: []PhysicalConnectionRequirements:SecurityGroupIdList:- Fn::ImportValue: !Sub ${ServiceName}-vpc-GlueSelfRefringSecurityGroupSubnetId:Fn::Select:- 1- Fn::Split:- ","- Fn::ImportValue: !Sub "${ServiceName}-vpc-PrivateSubnets"ConnectionProperties:JDBC_CONNECTION_URL:Fn::Join:- ""- - "jdbc:redshift://"- Fn::ImportValue: !Sub ${ServiceName}-redshift-PrivateRedshiftEndpoint- ":"- Fn::ImportValue: !Sub ${ServiceName}-redshift-PrivateRedshiftPort- "/dev"JDBC_ENFORCE_SSL: falseUSERNAME: adminPASSWORD: Redshift_admin_2022RedshiftDatabase:Type: AWS::Glue::DatabaseProperties:CatalogId: !Ref AWS::AccountIdDatabaseInput:Name: "redshift-database"RedshiftCrawler:Type: AWS::Glue::CrawlerProperties:Name: "redshift-crawler"Configuration:Role: !GetAtt CrawlerRole.ArnDatabaseName: !Ref RedshiftDatabaseTargets:JdbcTargets:- ConnectionName: !Ref GlueRedshiftConnectionPath: dev/public/test_sync_to_redshiftSchemaChangePolicy:UpdateBehavior: "UPDATE_IN_DATABASE"DeleteBehavior: "LOG"RedshiftGlueJobRole:Type: AWS::IAM::RoleProperties:RoleName: RedshiftGlueJobRoleAssumeRolePolicyDocument:Version: "2012-10-17"Statement:- Effect: AllowPrincipal:Service:- glue.amazonaws.comAction: sts:AssumeRoleManagedPolicyArns:- arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole- arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess- arn:aws:iam::aws:policy/AmazonRedshiftFullAccess- arn:aws:iam::aws:policy/AmazonS3FullAccess- arn:aws:iam::aws:policy/CloudWatchLogsFullAccessGlueTempBucket:Type: AWS::S3::BucketProperties:BucketName: !Sub ${GlueBucketName}GlueJob:Type: AWS::Glue::JobProperties:Name: dynamodb-to-redshift-glue-etl-jobRole: !GetAtt RedshiftGlueJobRole.ArnCommand:Name: glueetlScriptLocation: !Sub "s3://${GlueBucketName}/script/dynamodb-to-redshift.py"PythonVersion: 3DefaultArguments:--TempDir: !Sub "s3://${GlueBucketName}/tmp/dynamodb-to-redshift/"WorkerType: G.1XNumberOfWorkers: 2GlueVersion: "3.0"Connections:Connections:- !Ref GlueRedshiftConnection
    

glue job腳本:dynamodb-to-redshift.py

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Jobparams = ['JOB_NAME','TempDir',
]args = getResolvedOptions(sys.argv, params)
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)DynamoDBtable_node1 = glueContext.create_dynamic_frame.from_catalog(database="dynamodb-database",table_name="testsynctoredshift",transformation_ctx="DynamoDBtable_node1",
)RedshiftCluster_node2 = glueContext.write_dynamic_frame.from_catalog(frame=DynamoDBtable_node1,database="redshift-database",table_name="dev_public_test_sync_to_redshift",redshift_tmp_dir=args["TempDir"],transformation_ctx="RedshiftCluster_node2",
)job.commit()

測(cè)試

  1. insert some data to dynamodb table with aws web console first, otherwise, the crawler can not detect the table scheme

    2. run dynamodb-crawler, after run success, you can see the database and table in [glue console](https://ap-southeast-1.console.aws.amazon.com/glue/home?region=ap-southeast-1#catalog:tab=tables)

    3. create redshift table with [Redshift query editor v2](https://ap-southeast-1.console.aws.amazon.com/sqlworkbench/home?region=ap-southeast-1#/client)

    ```

    CREATE TABLE "public"."test_sync_to_redshift"(pk varchar(200) not null, sk varchar(200) NOT NULL, primary key(pk, sk));

    ```

    4. run redshift-crawler, if encounter no valid connection error, please update password in the redshift-connection manually with aws console, don't know why the password is not correct when deploy with cloudformation. After run success, you can see the database and table in [glue console](https://ap-southeast-1.console.aws.amazon.com/glue/home?region=ap-southeast-1#catalog:tab=tables)

    5. run glue etl job, after run success, you can check data in redshift table with [Redshift query editor v2](https://ap-southeast-1.console.aws.amazon.com/sqlworkbench/home?region=ap-southeast-1#/client).

    This glue etl job will `insert all data in dynamodb table` to redshift table directly, as for redshift, [primary key, and foreign key constraints are informational only; they are not enforced by Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html), so if you run the jon serval times, you will see duplicate data with some primary key in redshift table.

使用AWS Glue Streaming?Job進(jìn)行同步

適用場(chǎng)景

  • 持續(xù)增量同步
  • 表的操作支持插入,更新和刪除

架構(gòu)圖

資源部署

  • Dynamodb表
  • VPC
  • Redshift Cluster
  • Glue Crawler
  • GlueJob

優(yōu)點(diǎn)

  • 可以支持表的插入,更新和刪除操作的同步

缺點(diǎn)

使用AWS kinesis Firehose進(jìn)行同步

適用場(chǎng)景

  • 持續(xù)增量同步
  • 表的操作只支持插入,部分支持更新,不支持刪除,比如記錄傳感器每秒收集的數(shù)據(jù),記錄網(wǎng)站用戶的行為事件
    • 由于kinesis firehose是通過(guò)Redshift COPY命令與redshift進(jìn)行集成的,而redshift是不保證primary key的唯一性,對(duì)redshift來(lái)說(shuō),primary key只是提供信息,并沒(méi)有保證primary key的唯一性,如果在COPY命令的源數(shù)據(jù)中包含多條primary key相同的數(shù)據(jù)(比如對(duì)一條數(shù)據(jù)進(jìn)行多次修改),則會(huì)導(dǎo)致redshift表出現(xiàn)多條primary key相同的數(shù)據(jù)。
    • 部分支持更新的意思就是如果多條primary key相同的數(shù)據(jù)對(duì)你的業(yè)務(wù)邏輯沒(méi)有影響,那也可以使用AWS kinesis Firehose進(jìn)行同步,如果多條primary key對(duì)業(yè)務(wù)邏輯有影響,那就不可以使用
    • 由于kinesis firehose是通過(guò)Redshift COPY命令與redshift進(jìn)行集成的,COPY命令是不支持刪除的

架構(gòu)圖

資源部署

Reference

Setting up networking for development for AWS Glue - AWS Glue

http://www.risenshineclean.com/news/50221.html

相關(guān)文章:

  • 網(wǎng)站建設(shè)模板一次收費(fèi)如何搜索網(wǎng)頁(yè)關(guān)鍵詞
  • 重慶建網(wǎng)站多少錢百度指數(shù)是免費(fèi)的嗎
  • 網(wǎng)站廣告條動(dòng)畫 怎么做深圳關(guān)鍵詞排名優(yōu)化系統(tǒng)
  • 安全的響應(yīng)式網(wǎng)站建設(shè)聊城網(wǎng)站推廣公司
  • 揭陽(yáng)建設(shè)網(wǎng)站北京網(wǎng)站建設(shè)運(yùn)營(yíng)
  • 公司做賣網(wǎng)站有前景嗎最佳磁力吧ciliba搜索引擎
  • 站群管理系統(tǒng)cms推廣普通話的宣傳內(nèi)容
  • 青海網(wǎng)網(wǎng)站建設(shè)寧波網(wǎng)站關(guān)鍵詞優(yōu)化公司
  • 網(wǎng)站怎么做uc整合百度競(jìng)價(jià)排名的使用方法
  • 找事做網(wǎng)站怎么弄百度網(wǎng)頁(yè)版登錄入口官網(wǎng)
  • 泰安集團(tuán)網(wǎng)站建設(shè)元搜索引擎有哪些
  • 個(gè)人攝影網(wǎng)站制作設(shè)計(jì)培訓(xùn)學(xué)院
  • 鄧州企業(yè)網(wǎng)站有鏈接的網(wǎng)站
  • 商城網(wǎng)站建設(shè)預(yù)算免費(fèi)網(wǎng)頁(yè)制作網(wǎng)站
  • magento建站是傻瓜式的嗎今日疫情最新情況
  • 笑話類網(wǎng)站用什么做網(wǎng)絡(luò)運(yùn)營(yíng)怎么做
  • 長(zhǎng)沙市網(wǎng)站建設(shè)推廣谷歌推廣技巧
  • 南京專業(yè)網(wǎng)站建設(shè)整合營(yíng)銷傳播的概念
  • 世紀(jì)佳緣網(wǎng)站開(kāi)發(fā)公司網(wǎng)站大全軟件下載
  • 上海的外貿(mào)公司排名搜索引擎優(yōu)化不包括
  • 成都網(wǎng)站制作怎么樣網(wǎng)絡(luò)營(yíng)銷網(wǎng)站建設(shè)
  • 計(jì)算機(jī)應(yīng)用軟件開(kāi)發(fā)百度愛(ài)采購(gòu)優(yōu)化排名軟件
  • 公司手機(jī)網(wǎng)站制作seo收費(fèi)標(biāo)準(zhǔn)
  • 食品網(wǎng)站建設(shè)方案項(xiàng)目書百度推廣獲客成本大概多少
  • wordpress搜索標(biāo)簽頁(yè)seo推廣軟件排行榜前十名
  • 一家只做代購(gòu)的網(wǎng)站百度網(wǎng)盤app
  • 網(wǎng)站設(shè)計(jì)主題中文成都建設(shè)網(wǎng)官網(wǎng)
  • 最近一周新聞seo廣州工作好嗎
  • 備案網(wǎng)站內(nèi)容怎么寫寧波seo關(guān)鍵詞排名
  • yourphp企業(yè)網(wǎng)站管理系統(tǒng)抖音的商業(yè)營(yíng)銷手段