Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
K
kb
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2
    • Issues 2
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • kb
  • Wiki
    • Data_pump
    • Writers
  • es

es · Changes

Page history
update: es.EsDocWriter文档 authored Dec 24, 2021 by fanzx's avatar fanzx
Hide whitespace changes
Inline Side-by-side
Showing with 90 additions and 0 deletions
+90 -0
  • data_pump/writers/es.md data_pump/writers/es.md +90 -0
  • No files found.
data_pump/writers/es.md 0 → 100644
View page @ 29e4655c
# 将数据保写入到ES
**class**参数配置为```es.EsDocWriter```
## init 参数
|配置参数|默认值|说明|
|----|----|----|
|index| |ES的index名称|
|doc_type| | 文档的类型|
|hosts| | ES的hosts|
|http_auth| | ES的hosts|
|doc_id| | 数据插入更新删除的唯一索引(ES的_id),默认为None时ES会自己生成一个,eg:{"_id": "Ru5QZXYBsqJa7tOBqcqR"},当没有定义doc_id配置时认为数据行为插入操作,即action=index|
|action| update | ES的数据更新API提供的4种操作方式|
|index_fields| | 义了这个字段时,数据中只保留index_fields定义的字段入ES|
|add_timestamp| True | 默认入ES时间, 默认开启, 会在数据中增加一个"@timestamp"字段|
|clear_null_field| False | 默认False, True时, 会把数据中的空值,None,NULL等字段从数据中剔除|
|update_on_create_fileds| | 指定只新建不更新的字段。这些字段的值一旦设置在后续的更新中不会再发生变化|
|append_array_fields| | 指定数组类型的字段。后续的更新的值会被追加到数组末尾。只保留最后的10个值|
|set_fields| | 指定数组类型的字段。后续的更新的值会被追加到数组末尾。只保留最后的10个值|
|timeout| 10| 指定超时时间单位为秒,默认timeout=10|
## 配置样例:
```yaml
es_test_nest_year:
class: es.EsDocWriter
init:
hosts: es-cn-4591blu580004eavf.elasticsearch.aliyuncs.com:9200
http_auth: [ '{user}', '{passwd}' ]
index: test_nest
doc_type: doc
doc_id: company_name_digest
add_timestamp: False
timeout: 20
set_fields: ['annual_report_years', 'no_ar_submitted']
index_fields:
- annual_report_years
- no_ar_submitted
```
## 操作ES中的集合、数组数据
```html
sync_condition": {"operation": "remove"}
sync_condition": {"operation": "add"}
数据中定义集合的行为,只有remove、add2种行为(也可以不写sync_condition,默认就是add)
1.采用以下的配置:
set_fields: ['annual_report_years', 'no_ar_submitted']
index_fields:
- annual_report_years
- no_ar_submitted
输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": [2019]}
再次输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": [2019]}
备注: set_fields决定了ES中的no_ar_submitted字段中的值是去重的
输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "annual_report_years": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "annual_report_years": [2019]}
备注: set_fields决定了ES中的annual_report_years字段中的值只有一个2019
输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "annual_report_years": 2020}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "annual_report_years": [2019, 2020]}
输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019, "sync_condition": {"operation": "remove"}}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": []}
备注: sync_condition.operation=remove时,会将no_ar_submitted中的2019元素删除
2.采用以下的配置:
append_array_fields: ['annual_report_years', 'no_ar_submitted']
index_fields:
- annual_report_years
- no_ar_submitted
输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": [2019]}
再次输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": [2019, 2019]}
备注: append_array_fields决定了ES中的no_ar_submitted字段中的元素值会一直新增,es.EsDocWriter中定义了append_array_fields最多存储10个元素,超过10个时新进的元素会覆盖最老的元素。
输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019, "sync_condition": {"operation": "remove"}}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": []}
备注: sync_condition.operation=remove时,会将no_ar_submitted中的等于2019元素删除
```
Clone repository
  • README
  • basic_guidelines
  • basic_guidelines
    • basic_guidelines
    • dev_guide
    • project_build
    • 开发流程
  • best_practice
  • best_practice
    • AlterTable
    • RDS
    • azkaban
    • create_table
    • design
    • elasticsearch
    • elasticsearch
      • ES运维
    • logstash
View All Pages