23、实现爬虫即服务

实现爬虫即服务

1. 将容器存储到 Elastic Container Repository (ECR)

1.1 访问 ECR 并列出仓库

登录为 ECS 创建的账户后,可访问 Elastic Container Repository。使用以下 AWS CLI 命令列出现有仓库:

$ aws ecr describe-repositories
{
    "repositories": []
}

此时没有任何仓库,接下来创建仓库。

1.2 创建仓库

创建三个仓库,分别对应不同的容器: scraper-rest-api scraper-microservice rabbitmq

$ aws ecr create-repository --repository-name scraper-rest-api
{
  "repository": {
    "repositoryArn": "arn:aws:ecr:us-west-2:414704166289:repository/scraper-rest-api",
    "repositoryUri": "414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-rest-api",
    "repositoryName": "scraper-rest-api",
    "registryId": "414704166289",
    "createdAt": 1515632756.0
  }
}
$ aws ecr create-repository --repository-name scraper-microservice
{
  "repository": {
    "repositoryArn": "arn:aws:ecr:us-west-2:414704166289:repository/scraper-microservice",
    "registryId": "414704166289",
    "repositoryName": "scraper-microservice",
    "repositoryUri": "414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-microservice",
    "createdAt": 1515632772.0
  }
}
$ aws ecr create-repository --repository-name rabbitmq
{
  "repository": {
    "repositoryArn": "arn:aws:ecr:us-west-2:414704166289:repository/rabbitmq",
    "repositoryName": "rabbitmq",
    "registryId": "414704166289",
    "createdAt": 1515632780.0,
    "repositoryUri": "414704166289.dkr.ecr.us-west-2.amazonaws.com/rabbitmq"
  }
}

记录每个仓库的 URL,后续步骤会用到。

1.3 标记本地容器镜像

使用以下命令查看本地 Docker 镜像:

$ docker images
REPOSITORY           TAG          IMAGE ID     CREATED        SIZE
scraper-rest-api     latest       b82653e11635 29 seconds ago 717MB
scraper-microservice latest       efe19d7b5279 11 minutes ago 4.16GB
rabbitmq             3-management 6cb6e2f951a8 2 weeks ago    151MB
python               3            c1e459c00dc3 3 weeks ago    692MB

使用 docker tag 命令标记三个镜像(不需要标记 python 镜像):

$ docker tag b8 414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-rest-api
$ docker tag ef 414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-microservice
$ docker tag 6c 414704166289.dkr.ecr.us-west-2.amazonaws.com/rabbitmq

再次查看 Docker 镜像,会显示标记后的镜像:

$ docker images
REPOSITORY                                            TAG          IMAGE ID     CREATED        SIZE
414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-rest-api  latest       b82653e11635 4 minutes ago 717MB
scraper-rest-api                                      latest       b82653e11635 4 minutes ago 717MB
414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-microservice  latest       efe19d7b5279 15 minutes ago 4.16GB
scraper-microservice                                  latest       efe19d7b5279 15 minutes ago 4.16GB
414704166289.dkr.ecr.us-west-2.amazonaws.com/rabbitmq  latest       6cb6e2f951a8 2 weeks ago    151MB
rabbitmq                                              3-management 6cb6e2f951a8 2 weeks ago    151MB
python                                                3            c1e459c00dc3 3 weeks ago    692MB

1.4 推送镜像到 ECR

$ docker push 414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-rest-api
$ docker push 414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-microservice
$ docker push 414704166289.dkr.ecr.us-west-2.amazonaws.com/rabbitmq

1.5 检查镜像是否推送成功

$ aws ecr list-images --repository-name scraper-rest-api
{
  "imageIds": [
    {
      "imageTag": "latest",
      "imageDigest": "sha256:2fa2ccc0f4141a1473386d3592b751527eaccb37f035aa08ed0c4b6d7abc9139"
    }
  ]
}

流程总结

graph LR
    A[登录 ECS 账户] --> B[列出 ECR 仓库]
    B --> C{是否有仓库}
    C -- 否 --> D[创建仓库]
    C -- 是 --> E[跳过创建]
    D --> F[标记本地镜像]
    E --> F
    F --> G[推送镜像到 ECR]
    G --> H[检查镜像是否推送成功]

2. 创建 ECS 集群

2.1 创建 ECR 集群

使用 AWS CLI 创建名为 scraper-cluster 的 ECR 集群:

$ aws ecs create-cluster --cluster-name scraper-cluster
{
  "cluster": {
    "clusterName": "scraper-cluster",
    "registeredContainerInstancesCount": 0,
    "clusterArn": "arn:aws:ecs:us-west-2:414704166289:cluster/scraper-cluster",
    "status": "ACTIVE",
    "activeServicesCount": 0,
    "pendingTasksCount": 0,
    "runningTasksCount": 0
  }
}

2.2 创建密钥对

$ aws ec2 create-key-pair --key-name ScraperClusterKP --query 'KeyMaterial' --output text > ScraperClusterKP.pem
$ aws ec2 describe-key-pairs --key-name ScraperClusterKP
{
  "KeyPairs": [
    {
      "KeyFingerprint": "4a:8a:22:fa:53:a7:87:df:c5:17:d9:4f:b1:df:4e:22:48:90:27:2d",
      "KeyName": "ScraperClusterKP"
    }
  ]
}

2.3 创建安全组

创建一个安全组,开放端口 22 (ssh)、80 (http) 以及 RabbitMQ 的两个端口 (5672 和 15672)。

$ aws ec2 create-security-group --group-name ScraperClusterSG --description "Scraper Cluster SG”
{
  "GroupId": "sg-5e724022"
}
$ aws ec2 authorize-security-group-ingress --group-name ScraperClusterSG --protocol tcp --port 22 --cidr 0.0.0.0/0
$ aws ec2 authorize-security-group-ingress --group-name ScraperClusterSG --protocol tcp --port 80 --cidr 0.0.0.0/0
$ aws ec2 authorize-security-group-ingress --group-name ScraperClusterSG --protocol tcp --port 5672 --cidr 0.0.0.0/0
$ aws ec2 authorize-security-group-ingress --group-name ScraperClusterSG --protocol tcp --port 15672 --cidr 0.0.0.0/0

可使用以下命令确认安全组内容:

$ aws ec2 describe-security-groups --group-names ScraperClusterSG

2.4 设置 IAM 策略

使用 ecsPolicy.json rolePolicy.json 文件注册 IAM 策略:

$ aws iam create-role --role-name ecsRole --assume-role-policy-document file://ecsPolicy.json
$ aws iam put-role-policy --role-name ecsRole --policy-name ecsRolePolicy --policy-document file://rolePolicy.json
$ aws iam create-instance-profile --instance-profile-name ecsRole
$ aws iam add-role-to-instance-profile --instance-profile-name ecsRole --role-name ecsRole

2.5 启动 EC2 实例

$ aws ec2 run-instances --image-id ami-c9c87cb1 --count 1 --instance-type m4.large --key-name ScraperClusterKP --iam-instance-profile "Name= ecsRole" --security-groups ScraperClusterSG --user-data file://userdata.txt

2.6 检查实例是否运行

$ aws ecs list-container-instances --cluster scraper-cluster
{
  "containerInstanceArns": [
    "arn:aws:ecs:us-west-2:414704166289:container-instance/263d9416-305f-46ff-a344-9e7076ca352a"
  ]
}

步骤总结

步骤 操作 命令
1 创建 ECR 集群 aws ecs create-cluster --cluster-name scraper-cluster
2 创建密钥对 aws ec2 create-key-pair --key-name ScraperClusterKP --query 'KeyMaterial' --output text > ScraperClusterKP.pem aws ec2 describe-key-pairs --key-name ScraperClusterKP
3 创建安全组 一系列 aws ec2 命令
4 设置 IAM 策略 一系列 aws iam 命令
5 启动 EC2 实例 aws ec2 run-instances --image-id ami-c9c87cb1 --count 1 --instance-type m4.large --key-name ScraperClusterKP --iam-instance-profile "Name= ecsRole" --security-groups ScraperClusterSG --user-data file://userdata.txt
6 检查实例是否运行 aws ecs list-container-instances --cluster scraper-cluster

3. 创建运行容器的任务

3.1 任务定义文件

使用 td.json 文件描述如何运行容器,通过以下命令向 ECS 注册任务:

$ aws ecs register-task-definition --cli-input-json file://td.json

输出结果如下:

{
    "taskDefinition": {
        "volumes": [],
        "family": "scraper",
        "memory": "4096",
        "placementConstraints": [],
        "cpu": "1024",
        "containerDefinitions": [
            {
                "name": "rabbitmq",
                "cpu": 0,
                "volumesFrom": [],
                "mountPoints": [],
                "portMappings": [
                    {
                        "hostPort": 15672,
                        "protocol": "tcp",
                        "containerPort": 15672
                    },
                    {
                        "hostPort": 5672,
                        "protocol": "tcp",
                        "containerPort": 5672
                    }
                ],
                "environment": [],
                "image": "414704166289.dkr.ecr.us-west-2.amazonaws.com/rabbitmq",
                "memory": 256,
                "essential": true
            },
            {
                "name": "scraper-microservice",
                "cpu": 0,
                "essential": true,
                "volumesFrom": [],
                "mountPoints": [],
                "portMappings": [],
                "environment": [
                    {
                        "name": "AMQP_URI",
                        "value": "pyamqp://guest:guest@rabbitmq"
                    }
                ],
                "image": "414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-microservice",
                "memory": 256,
                "links": [
                    "rabbitmq"
                ]
            },
            {
                "name": "api",
                "cpu": 0,
                "essential": true,
                "volumesFrom": [],
                "mountPoints": [],
                "portMappings": [
                    {
                        "hostPort": 80,
                        "protocol": "tcp",
                        "containerPort": 8080
                    }
                ],
                "environment": [
                    {
                        "name": "AMQP_URI",
                        "value": "pyamqp://guest:guest@rabbitmq"
                    },
                    {
                        "name": "ES_HOST",
                        "value": "https://elastic:tduhdExunhEWPjSuH73O6yLS@7dc72d3327076cc4daf5528103c46a27.us-west-2.aws.found.io:9243"
                    }
                ],
                "image": "414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-rest-api",
                "memory": 128,
                "links": [
                    "rabbitmq"
                ]
            }
        ],
        "requiresCompatibilities": [
            "EC2"
        ],
        "status": "ACTIVE",
        "taskDefinitionArn": "arn:aws:ecs:us-west-2:414704166289:task-definition/scraper:7",
        "requiresAttributes": [
            {
                "name": "com.amazonaws.ecs.capability.ecr-auth"
            }
        ],
        "revision": 7,
        "compatibilities": [
            "EC2"
        ]
    }
}

3.2 任务定义解析

任务定义主要由两部分组成:
- 整体信息 :定义任务的整体设置,如允许的内存和 CPU 总量,是否挂载卷等。

{
    "family": "scraper-as-a-service",
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "1024",
    "memory": "4096",
    "volumes": []
}
  • 容器定义 :定义要运行的三个容器。
    • rabbitmq 容器
{
    "name": "rabbitmq",
    "image": "414704166289.dkr.ecr.us-west-2.amazonaws.com/rabbitmq",
    "cpu": 0,
    "memory": 256,
    "portMappings": [
        {
            "containerPort": 15672,
            "hostPort": 15672,
            "protocol": "tcp"
        },
        {
            "containerPort": 5672,
            "hostPort": 5672,
            "protocol": "tcp"
        }
    ],
    "essential": true
}
- **scraper-microservice 容器**:
{
    "name": "scraper-microservice",
    "image": "414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-microservice",
    "cpu": 0,
    "memory": 256,
    "essential": true,
    "environment": [
        {
            "name": "AMQP_URI",
            "value": "pyamqp://guest:guest@rabbitmq"
        }
    ],
    "links": [
        "rabbitmq"
    ]
}
- **api 容器**:
{
    "name": "api",
    "image": "414704166289.dkr.ecr.us-west-2.amazonaws.com/scraper-rest-api",
    "cpu": 0,
    "memory": 128,
    "essential": true,
    "portMappings": [
        {
            "containerPort": 8080,
            "hostPort": 80,
            "protocol": "tcp"
        }
    ],
    "environment": [
        {
            "name": "AMQP_URI",
            "value": "pyamqp://guest:guest@rabbitmq"
        },
        {
            "name": "ES_HOST",
            "value": "https://elastic:tduhdExunhEWPjSuH73O6yLS@7dc72d3327076cc4daf5528103c46a27.us-west-2.aws.found.io:9243"
        }
    ],
    "links": [
        "rabbitmq"
    ]
}

3.3 任务定义结构总结

graph LR
    A[任务定义] --> B[整体信息]
    A --> C[容器定义]
    B --> B1[family]
    B --> B2[requiresCompatibilities]
    B --> B3[cpu]
    B --> B4[memory]
    B --> B5[volumes]
    C --> C1[rabbitmq 容器]
    C --> C2[scraper-microservice 容器]
    C --> C3[api 容器]

4. 在 AWS 中启动并访问容器

4.1 获取最新任务修订号

$ aws ecs list-task-definitions
{
    "taskDefinitionArns": [
        "arn:aws:ecs:us-west-2:414704166289:task-definition/scraper-as-a-service:17"
    ]
}

4.2 运行任务

$ aws ecs run-task --cluster scraper-cluster --task-definition scraper-as-a-service:17 --count 1

输出结果包含任务的当前状态,首次运行时,由于要将容器复制到 EC2 实例,可能需要一些时间。

4.3 检查任务状态

$ aws ecs describe-tasks --cluster scraper-cluster --task 00d7b868-1b99-4b54-9f2a-0d5d0ae75197

需将任务 GUID 替换为运行任务输出中 taskArn 属性的 GUID。当所有容器都运行时,即可测试 API。

4.4 获取集群实例的 IP 地址或 DNS 名称

  • 列出集群实例:
$ aws ecs list-container-instances --cluster scraper-cluster
{
    "containerInstanceArns": [
        "arn:aws:ecs:us-west-2:414704166289:container-instance/5959fd63-7fd6-4f0e-92aa-ea136dabd762"
    ]
}
  • 查询 EC2 实例 ID:
$ aws ecs describe-container-instances --cluster scraper-cluster --container-instances 5959fd63-7fd6-4f0e-92aa-ea136dabd762 | grep "ec2InstanceId"
            "ec2InstanceId": "i-08614daf41a9ab8a2",

4.5 操作步骤总结

步骤 操作 命令
1 获取最新任务修订号 aws ecs list-task-definitions
2 运行任务 aws ecs run-task --cluster scraper-cluster --task-definition scraper-as-a-service:17 --count 1
3 检查任务状态 aws ecs describe-tasks --cluster scraper-cluster --task [task-guid]
4 获取集群实例信息 aws ecs list-container-instances --cluster scraper-cluster aws ecs describe-container-instances --cluster scraper-cluster --container-instances [instance-guid] | grep "ec2InstanceId"

通过以上步骤,我们可以将爬虫服务部署到 AWS 上,并确保其正常运行。在实际应用中,可根据需求调整任务定义和容器配置,以满足不同的业务场景。

内容概要:本文档介绍了基于3D FDTD(时域有限差分)方法在MATLAB平台上对微带线馈电的矩形天线进行仿真分析的技术方案,重点在于模拟超MATLAB基于3D FDTD的微带线馈矩形天线分析[用于模拟超宽带脉冲通过线馈矩形天线的传播,以计算微带结构的回波损耗参数]宽带脉冲信号通过天线结构的传播过程,并计算微带结构的回波损耗参数(S11),以评估天线的匹配性能和辐射特性。该方法通过建立三维电磁场模型,精确求解麦克斯韦方程组,适用于高频电磁仿真,能够有效分析天线在宽频带内的响应特性。文档还提及该资源属于一个涵盖多个科研方向的综合性MATLAB仿真资源包,涉及通信、信号处理、电力系统、机器学习等多个领域。; 适合人群:具备电磁场与微波技术基础知识,熟悉MATLAB编程及数值仿真的高校研究生、科研人员及通信工程领域技术人员。; 使用场景及目标:① 掌握3D FDTD方法在天线仿真中的具体实现流程;② 分析微带天线的回波损耗特性,优化天线设计参数以提升宽带匹配性能;③ 学习复杂电磁问题的数值建模与仿真技巧,拓展在射频与无线通信领域的研究能力。; 阅读建议:建议读者结合电磁理论基础,仔细理解FDTD算法的离散化过程和边界条件设置,运行并调试提供的MATLAB代码,通过调整天线几何尺寸和材料参数观察回波损耗曲线的变化,从而深入掌握仿真原理与工程应用方法。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值