Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
K
kb
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2
    • Issues 2
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • kb
  • Wiki
    • Data_stream
  • environmental_protection_grade

Last edited by 蒋家升 Nov 09, 2021
Page history
This is an old version of this page. You can view the most recent version or browse the history.

environmental_protection_grade

基本信息

环保等级

数据名称(中文)

环保等级

数据英文名称

environmental_protection_grade

采集网站(采集入口)

官网PC端入口:
江苏:https://hblp.jsshbt.cn/shencai-envfacial-web/service/envFacial/hblp/tEnvBasKeylistModel/queryKeyListBusinessNumber
浙江:http://223.4.71.96/portal/data/api/auto
福建:http://220.160.52.213:20071/api/template/page/p_list_eval_credit
四川:http://103.203.219.138:18081/data/w/evaluateResults/list4Public
...

采集文件存放路径:
/data/gravel_spiders/environmental_protection

采集频率及采集策略

存量更新策略

目前全量更新一轮

增量采集策略


爬虫

环保等级爬虫 environmental_protection

责任人

蒋家升

爬虫名称

environmental_protection

代码地址

项目地址: http://tech.pingansec.com/granite/project-gravel/-/tree/develop_environmental_protection_grade/scrapy_spiders

队列名称及队列地址

  • redis host: redis://:utn@0818@bdp-mq-001.redis.rds.aliyuncs.com:6379/0
  • redis port: 6379
  • redis db: 0
  • redis key:
    • environmental_protection

优先级队列说明

  • environmental_protection 支持队列优先级

任务来源

任务输入参数(样例)

其中CRNo(公司编号)为必需

{'province': 'jiangsu', "step": "start"}
{'province': 'zhejiang', "step": "start"}
{'province': 'fujian', "step": "start"}
{'province': 'sichuan', "step": "start"}

任务样例

任务参数说明

{'province': 'jiangsu', "step": "start", "index": 0}
  • 主要参数
    • province: 省份拼音
  • 非必要参数
    • step: 步骤
    • index: 翻页的页数

data_type说明

list: 列表页数据

福建:
list_of_normal: 全程公示
list_of_red: 红黑榜

...开发中

爬虫结果的超级数据

实际爬虫结果的数据结构

江苏:
{
  "data":
  [
    {
      "creditDataID": "202110242303108479a164c13b45caa51598e28aac5b69",
      "creditLevel": 4,
      "creditName": "一般守信",
      "efResultID": "13574254755210144971",
      "fullName": "宿迁市-宿豫区-宿豫经济开发区",
      "manageLevel": 2,
      "manageName": "二星",
      "spCode": "142332828000",
      "spName": "宿迁市罐头食品有限责任公司",
      "time": "2021-10-25"
    },
    {
      "creditDataID": "202110242305284f9a3b85fc214f13b2cb5944fbec7e36",
      "creditLevel": 4,
      "creditName": "一般守信",
      "efResultID": "13574254755210175105",
      "fullName": "常州市-武进区",
      "manageLevel": 5,
      "manageName": "五星",
      "spCode": "3101120200000099",
      "spName": "江苏绿浥农业科技股份有限公司",
      "time": "2021-10-25"
    },
    {
      "creditDataID": "202110242308191eaf0c8cb8174469bc192039d249540c",
      "creditLevel": 4,
      "creditName": "一般守信",
      "efResultID": "13574254755210175285",
      "fullName": "苏州市-姑苏区",
      "manageLevel": 5,
      "manageName": "五星",
      "spCode": "3200000200000537",
      "spName": "中建三局第一建设工程有限责任公司",
      "time": "2021-10-25"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-25 17:17:52.098",
  "spider_end_time": "2021-10-25 17:17:54",
  "task_params":{
    "province": "jiangsu",
    "step": "start",
    "index": 1
  },
  "metadata":{
    "province": "jiangsu",
    "index": 1
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.18"
}

浙江:
{
  "data":
  [
    {
      "city": "舟山市",
      "level_title": "优秀",
      "district": "普陀区",
      "level_code": "A",
      "social_credit_code": "91330903336897819J",
      "score_code": "8ca03c8e-634c-49cb-b35c-bac95ce46910",
      "ent_name": "舟山丰瑞海洋生物制品有限公司",
      "region_code": "330903",
      "release_time": 1635264000000
    },
    {
      "city": "舟山市",
      "level_title": "优秀",
      "district": "普陀区",
      "level_code": "A",
      "social_credit_code": "913309031487170831",
      "score_code": "8ca03c8e-634c-49cb-b35c-bac95ce46910",
      "ent_name": "中石化浙江舟山石油有限公司",
      "region_code": "330903",
      "release_time": 1635264000000
    },
    {
      "city": "舟山市",
      "level_title": "优秀",
      "district": "普陀区",
      "level_code": "A",
      "social_credit_code": "9133090307868393X1",
      "score_code": "8ca03c8e-634c-49cb-b35c-bac95ce46910",
      "ent_name": "浙江荣生海洋生物制品有限公司",
      "region_code": "330903",
      "release_time": 1635264000000
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-27 11:28:51.445",
  "spider_end_time": "2021-10-27 11:28:58",
  "task_params":
  {
    "province": "zhejiang",
    "step": "start",
    "index": 2
  },
  "metadata":
  {
    "province": "zhejiang",
    "index": 2
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.18"
}

福建:
{
  "data":
  [
    {
      "id": 11795,
      "social_credit_code": "91350504MA2Y13F352",
      "credit_year_batch": "2021年第二批",
      "ent_name": "泉州佰份佰卫生用品有限公司",
      "county": "洛江区",
      "city": "泉州市",
      "deptName": "泉州市洛江生态环境局",
      "createTime": "2021-06-03",
      "credit": "79",
      "credit_type": "环保良好企业"
    },
    {
      "id": 11794,
      "social_credit_code": "91350504MA346T4N1T",
      "credit_year_batch": "2021年第二批",
      "ent_name": "泉州洛江凤栖石材厂",
      "county": "洛江区",
      "city": "泉州市",
      "deptName": "泉州市洛江生态环境局",
      "createTime": "2021-06-03",
      "credit": "79",
      "credit_type": "环保良好企业"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list_of_normal",
  "spider_start_time": "2021-10-27 14:50:13.824",
  "spider_end_time": "2021-10-27 14:50:14",
  "task_params":
  {
    "province": "fujian",
    "step": "start",
    "index": 2020
  },
  "metadata":
  {
    "province": "fujian",
    "index": 2020
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.6.51"
}

四川:
{
  "data":
  [
    {
      "id": 994222,
      "enterprise":
      {
        "id": 354497052,
        "name": "阆中市枣碧大梁山页岩机砖厂",
        "creditCode": "92511381MA695TYH83",
        "orgCode": "MA695TYH8",
        "enterpriseType": "PRIVATE_OWNED",
        "industry": "OTHER",
        "polluteType": "EXHAUST_GAS",
        "controlType": "CITY_PREFECTURE",
        "enterpriseAttr": "PRODUCTION_ENTERPRISE",
        "legalName": "庄兆雄",
        "legalTel": "15881493231",
        "productScale": "7.5万吨/年",
        "businessScope": "页岩砖生产、销售",
        "headOfEPA": "庄剑平",
        "regTime": "2008-05-10T00:00:00+08:00",
        "enterpriseState": "PRODUCTING",
        "regionCity":
        {
          "code": "511300000",
          "name": "南充市",
          "parent":
          {
            "code": "510000000",
            "name": "四川省"
          }
        },
        "regionDistrict":
        {
          "code": "511381000",
          "name": "阆中市",
          "parent":
          {
            "code": "511300000",
            "name": "南充市"
          }
        },
        "longitude": 105.2579,
        "latitude": 31.0116,
        "isSSGS": false,
        "enable": true,
        "lastModifiedDate": "2021-03-31T09:34:40.045+08:00",
        "address": "阆中市枣碧乡大梁山村"
      },
      "evaluateScore": 84,
      "selfScore": 98,
      "countyScore": 88,
      "cityScore": 84,
      "evaluateResult": "HBLHQY",
      "evaluateYear": 2020,
      "last": true,
      "publishOrg": "南充市生态环境局",
      "evaluateState": "GSZ",
      "veto": false,
      "dataFrom": "NormalCreditEvaluation"
    },
    {
      "id": 994221,
      "enterprise":
      {
        "id": 354496929,
        "name": "阆中市金福旺页岩机砖厂",
        "creditCode": "92511381MA62HXHX0D",
        "orgCode": "MA62HXHX-0",
        "industry": "OTHER",
        "controlType": "OTHER",
        "enterpriseAttr": "PRODUCTION_ENTERPRISE",
        "legalName": "金跃伟",
        "legalTel": "",
        "headOfEPA": "金光禄",
        "regionCity":
        {
          "code": "511300000",
          "name": "南充市",
          "parent":
          {
            "code": "510000000",
            "name": "四川省"
          }
        },
        "regionDistrict":
        {
          "code": "511381000",
          "name": "阆中市",
          "parent":
          {
            "code": "511300000",
            "name": "南充市"
          }
        },
        "isSSGS": false,
        "enable": true,
        "lastModifiedDate": "2020-07-10T12:29:10.841+08:00",
        "address": "阆中市江南镇瓦房沟村十四社"
      },
      "evaluateScore": 95,
      "selfScore": 104,
      "countyScore": 97,
      "cityScore": 95,
      "evaluateResult": "HBLHQY",
      "evaluateYear": 2020,
      "last": true,
      "publishOrg": "南充市生态环境局",
      "evaluateState": "GSZ",
      "veto": false,
      "dataFrom": "NormalCreditEvaluation"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list_of_normal",
  "spider_start_time": "2021-10-27 14:54:17.216",
  "spider_end_time": "2021-10-27 14:54:17",
  "task_params":
  {
    "province": "sichuan",
    "step": "start",
    "index": 1
  },
  "metadata":
  {
    "province": "sichuan",
    "index": 1
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.18"
}

爬虫运行环境

scrapy

爬虫部署信息

target: node_51,
spider_name: environmental_protection
5个进程  

Taskhub地址

提交任务地址: 
代码编写地址: 

Taskhub调度规则说明

爬虫监控指标设计

(先观察,待补充)
索引: 
监控频率: 
监控起止时间: 
报警条件: 
报警群:  
报警内容: 

数据归集

责任人

数据归集方式

  • 爬虫直接写kafka

  • 爬虫写文件logstash采集

爬虫结果目录

采集文件存放路径:
/data/gravel_spiders/environmental_protection

归集后存放目录

10.8.6.228:
/data2_227/grvael_spider_result/environmental_protection

logstash配置文件名称

logstash文件采集type

数据归集的topic

general-taxpayer

ES日志索引及筛选条件

gravel-spider-data-*

监控指标看板

数据保留策略


数据清洗

责任人

代码地址

部署地址

部署方法及说明

  • crontab + data_pump
  • supervisor + data_pump
  • supervisor + consumer

数据接收来源

数据存储表地址

  • 数据库地址:
  • 表名:
Clone repository
  • README
  • basic_guidelines
  • basic_guidelines
    • basic_guidelines
    • dev_guide
    • project_build
    • 开发流程
  • best_practice
  • best_practice
    • AlterTable
    • RDS
    • azkaban
    • create_table
    • design
    • elasticsearch
    • elasticsearch
      • ES运维
    • logstash
View All Pages