Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
K
kb
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2
    • Issues 2
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • kb
  • Wiki
    • Data_pump
    • Readers
  • sql

sql · Changes

Page history
搬运collie.wiki中的内容 authored Feb 22, 2021 by 吴一博's avatar 吴一博
Hide whitespace changes
Inline Side-by-side
Showing with 184 additions and 0 deletions
+184 -0
  • data_pump/readers/sql.md data_pump/readers/sql.md +184 -0
  • No files found.
data_pump/readers/sql.md 0 → 100644
View page @ 9ae5ba2a
# 从mysql数据库作为数据输入源
**class**参数配置为```sql.SqlDocReader```
## init 参数
|配置参数|子项|说明|
|----|----|----|
|db||数据库配置|
|query| |查询条件配置|
| |table|要查询的数据库表名|
| |columns|查询返回的字段列表|
| |order_by|返回结果排序的字段|
| |limit|返回结果的条数|
| |condition|查询条件|
| |query_sql|完整的查询sql语句, 语句需必需包含占位符 $CONDITIONS。 |
| |pack|关联查询子表,并将子表记录以数据形式存放在结果数据的指定字段中|
|offset|查询偏移记录|
### 查询参数(query)
数据筛选的行为可以通过参数query_sql指定完整的SQL语句或分别指定 table, columns,condition等参数进行配置。
当指定了query_sql参数,table,columns,condition,order_by, limit等参数将被忽略。
#### 完整查询语句 (query_sql)
查询语句的Where子句中必需包含占位符 $CONDITIONS。
$CONDITIONS 可被认为是一个条件表达式, 且它应该与SQL其它条件以 AND 连接。
#### 关联查询(pack)
该功能用于将主表与关联子表数据一起查出。如: 将如下表的记录,进行关联查询,并打包为一条数据:
tb_team
| id |name |
| ---- |---- |
| 1 |marvel|
| 1 |marvel|
tb_members
|id|team_id|member_name|
|----|----|----|
|1 |1 |Iron Man|
|2 |1 |Spider Man|
打包后的数据为:
```json
{
"id": 1,
"name": "marvel",
"members": [
{
"id": 1,
"team_id": 1,
"member_name": "Iron Man"
},
{
"id": 2,
"team_id": 1,
"member_name": "Spider Man"
}
]
}
```
所需的配置如下:
```yaml
table: tb_team
pack:
- name: "members"
table: "tb_members"
join_on:
left: "id"
right: "team_id"
columns:
- id
- team_id
- member_name
```
tb_team
| id |name | sex |
| ---- |---- | --- |
| 1 |marvel| Man |
| 1 |marvel| Woman |
tb_members
|id|team_id|member_name| sex |
|----|----|----|----|
|1 |1 |Iron Man| Man |
|2 |1 |Spider Man| Man |
|3 |1 |Black widow| Woman |
打包后的数据为:
```json
{
"id": 1,
"name": "marvel",
"sex": "Woman",
"members": [
{
"id": 3,
"team_id": 1,
"member_name": "Black widow",
"sex": "Woman"
}
]
}
```
所需的配置如下:
```yaml
query_sql: "select * from tb_team where id='1' and sex='Woman' and $CONDITIONS"
pack:
- name: "members"
table: "tb_members"
join_on:
left: "id, sex"
right: "team_id, sex"
columns:
- id
- team_id
- member_name
- sex
```
|配置参数|说明|
|----|----|
|name|打包后数据的字段名|
|table|关联的子表名称|
|join_on|关联查询所使用的关联字段. left: 主表字段名, right: 子表字段名|
|columns|查询返回的字段列表|
## 配置示例:
```yaml
sql_qichacha_push:
class: sql.SqlDocReader
init:
db:
host: 192.168.109.220
port: 3306
database: collie
user: collie
password: eill
query:
table:
a: company_lawsuit
b: company_lawsuit_parsed_info
columns:
a: "*"
b:
- id as bid
- jd_case_reason
- plaintiffs
- defendants
- thirdParties
condition: a.uuid = b.lawsuit_uuid
offset:
field: id
store: "file:///home/collie/project-collie/offset_company_lawsuit.txt"
```
* **db**: mysql数据库连接配置
* **query**: 查询条件,支持直接完整sql语句、指定表名和列名两种方式
1. 完整sql语句
* **query_sql**: select查询语句,其中where子句必须包含**$CONDITIONS**关键字
2. 指定表名和列名
* **query.table**: 数据表,支持单表,多表join查询
* **query.columns**: select查询的列名列表,逗号(,)分割。默认是所有列
* **query.condition**: select查询的where条件
* **offset**: 记录上次查询结束时记录在表中的offset
1. **offset.field**: 记录offset的字段名称
2. **offset.store**: offset保存的路径,支持本地文件和mysql数据库,前缀分别是```file://```、```mysql://```
Clone repository
  • README
  • basic_guidelines
  • basic_guidelines
    • basic_guidelines
    • dev_guide
    • project_build
    • 开发流程
  • best_practice
  • best_practice
    • AlterTable
    • RDS
    • azkaban
    • create_table
    • design
    • elasticsearch
    • elasticsearch
      • ES运维
    • logstash
View All Pages