本文主要包括:
tiup命令
运维Tidb,主要使用tiup,具体参考[Tiup Cluster命令概览][2]
常用命令
# 查看当前已经安装的tidb集群
tiup cluster list
# 查看某个集群的详情
tiup cluster display dgwhwtidb_1
# 启动集群,启动所有角色
tiup cluster start dgwhwtidb_1
# 停止集群,停止所有角色
tiup cluster stop dgwhwtidb_1
# 启动其中某个角色
tiup cluster start dgwhwtidb_1 -R grafana
# 停止其中某个角色
tiup cluster stop dgwhwtidb_1 -R grafana
查看系统参数:
show variables like '%tidb_enable_rate_limit_action%'
遇到问题
ERROR 1105 (HY000): Out Of Memory Quota![conn_id=4350796098411954611]
运行以下sql,报OOM:
查看执行计划:select instance.start_time,instance.*,process.name as process_instance_name from t_ds_task_instance instance join t_ds_process_instance process on process.id = instance.process_instance_id join t_ds_process_definition define on instance.process_definition_id = define.id where define.project_id = 11 order by instance.start_time DESC LIMIT 0,10;
执行计划如下:EXPLAIN ANALYZE select instance.*,process.name as process_instance_name from t_ds_task_instance instance force INDEX(idx_start_time) join t_ds_process_instance process on process.id = instance.process_instance_id join t_ds_process_definition define on instance.process_definition_id = define.id where define.project_id = 11 order by instance.start_time DESC LIMIT 0,10;
可以看出,project_id = 11的时候,索引都命中了,而project_id = 9的时候,索引都没命中,导致遍历了一遍t_ds_task_instance(数据量1180896),这时候关联的数据比较大project_id = 9 +-----------------------------------------+----------+---------+-----------+--------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +-----------------------------------------+----------+---------+-----------+--------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+ | Limit_23 | 10.00 | 2 | root | | time:2.69s, loops:2 | offset:0, count:10 | N/A | N/A | | └─IndexJoin_63 | 10.00 | 2 | root | | time:2.69s, loops:2, inner:{total:1.59ms, concurrency:5, task:1, construct:16.1µs, fetch:1.56ms, build:4.02µs}, probe:9.4µs | inner join, inner:TableReader_60, outer key:test.t_ds_task_instance.process_instance_id, inner key:test.t_ds_process_instance.id, equal cond:eq(test.t_ds_task_instance.process_instance_id, test.t_ds_process_instance.id) | 278.4 KB | N/A | | ├─IndexJoin_73(Build) | 9.95 | 2 | root | | time:2.69s, loops:3, inner:{total:863.7ms, concurrency:5, task:54, construct:570.4ms, fetch:293.1ms, build:16.1µs}, probe:75.6ms | inner join, inner:TableReader_69, outer key:test.t_ds_task_instance.process_definition_id, inner key:test.t_ds_process_definition.id, equal cond:eq(test.t_ds_task_instance.process_definition_id, test.t_ds_process_definition.id) | 232.0 MB | N/A | | │ ├─IndexLookUp_80(Build) | 12044.56 | 1180896 | root | | time:2.58s, loops:1159, index_task: {total_time: 1.61s, fetch_handle: 51.9ms, build: 143.1ms, wait: 1.41s}, table_task: {total_time: 8.5s, num: 65, concurrency: 5} | | 586.3 MB | N/A | | │ │ ├─IndexFullScan_77(Build) | 12044.56 | 1180896 | cop[tikv] | table:instance, index:idx_start_time(start_time) | time:2.45ms, loops:1163, cop_task: {num: 40, max: 884.8µs, min: 189.4µs, avg: 278.8µs, p95: 724.7µs, max_proc_keys: 992, p95_proc_keys: 224, rpc_num: 40, rpc_time: 10.8ms, copr_cache_hit_ratio: 0.95, distsql_concurrency: 15}, tikv_task:{proc max:24ms, min:0s, avg: 12.4ms, p80:21ms, p95:23ms, iters:1312, tasks:40}, scan_detail: {total_process_keys: 1216, total_process_keys_size: 55936, total_keys: 1218, get_snapshot_time: 213.6µs, rocksdb: {key_skipped_count: 1218, block: {cache_hit_count: 19}}} | keep order:true, desc | N/A | N/A | | │ │ └─Selection_79(Probe) | 12044.56 | 1180896 | cop[tikv] | | time:8.22s, loops:1263, cop_task: {num: 90, max: 375.1ms, min: 265.8µs, avg: 97.2ms, p95: 175.7ms, max_proc_keys: 20481, p95_proc_keys: 20480, tot_proc: 5.04s, tot_wait: 36ms, rpc_num: 90, rpc_time: 8.75s, copr_cache_hit_ratio: 0.19, distsql_concurrency: 15}, tikv_task:{proc max:91ms, min:0s, avg: 34ms, p80:52ms, p95:66ms, iters:1562, tasks:90}, scan_detail: {total_process_keys: 1127045, total_process_keys_size: 1999223209, total_keys: 3760891, get_snapshot_time: 1.6ms, rocksdb: {key_skipped_count: 4883188, block: {cache_hit_count: 62897}}} | not(isnull(test.t_ds_task_instance.process_definition_id)), not(isnull(test.t_ds_task_instance.process_instance_id)) | N/A | N/A | | │ │ └─TableRowIDScan_78 | 12044.56 | 1180896 | cop[tikv] | table:instance | tikv_task:{proc max:90ms, min:0s, avg: 33.3ms, p80:51ms, p95:64ms, iters:1562, tasks:90} | keep order:false | N/A | N/A | | │ └─TableReader_69(Probe) | 9.95 | 1 | root | | time:284.9ms, loops:55, cop_task: {num: 66, max: 126ms, min: 257.3µs, avg: 4.26ms, p95: 18.3ms, max_proc_keys: 102, p95_proc_keys: 97, rpc_num: 66, rpc_time: 279.5ms, copr_cache_hit_ratio: 0.06, distsql_concurrency: 15} | data:Selection_68 | N/A | N/A | | │ └─Selection_68 | 9.95 | 1 | cop[tikv] | | tikv_task:{proc max:3ms, min:0s, avg: 424.2µs, p80:1ms, p95:2ms, iters:116, tasks:66}, scan_detail: {total_process_keys: 3111, total_process_keys_size: 15546354, total_keys: 3509, get_snapshot_time: 1.12ms, rocksdb: {key_skipped_count: 5147, block: {cache_hit_count: 2788}}} | eq(test.t_ds_process_definition.project_id, 9) | N/A | N/A | | │ └─TableRangeScan_67 | 9951.71 | 3425 | cop[tikv] | table:define | tikv_task:{proc max:3ms, min:0s, avg: 424.2µs, p80:1ms, p95:2ms, iters:116, tasks:66} | range: decided by [test.t_ds_task_instance.process_definition_id], keep order:false, stats:pseudo | N/A | N/A | | └─TableReader_60(Probe) | 9.95 | 2 | root | | time:1.18ms, loops:2, cop_task: {num: 2, max: 779.7µs, min: 336.9µs, avg: 558.3µs, p95: 779.7µs, max_proc_keys: 2, p95_proc_keys: 2, rpc_num: 2, rpc_time: 1.09ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15} | data:TableRangeScan_59 | N/A | N/A | | └─TableRangeScan_59 | 9.95 | 2 | cop[tikv] | table:process | tikv_task:{proc max:1ms, min:0s, avg: 500µs, p80:1ms, p95:1ms, iters:2, tasks:2}, scan_detail: {total_process_keys: 2, total_process_keys_size: 2462, total_keys: 3, get_snapshot_time: 351.6µs, rocksdb: {block: {cache_hit_count: 10}}} | range: decided by [test.t_ds_task_instance.process_instance_id], keep order:false | N/A | N/A | +-----------------------------------------+----------+---------+-----------+--------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+ project_id = 9 +-----------------------------------------+----------+---------+-----------+--------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +-----------------------------------------+----------+---------+-----------+--------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+ | Limit_23 | 10.00 | 10 | root | | time:13.8ms, loops:2 | offset:0, count:10 | N/A | N/A | | └─IndexJoin_63 | 10.00 | 10 | root | | time:13.8ms, loops:1, inner:{total:650.8µs, concurrency:5, task:1, construct:58µs, fetch:578.6µs, build:6.9µs}, probe:18.6µs | inner join, inner:TableReader_60, outer key:test.t_ds_task_instance.process_instance_id, inner key:test.t_ds_process_instance.id, equal cond:eq(test.t_ds_task_instance.process_instance_id, test.t_ds_process_instance.id) | 242.7 KB | N/A | | ├─IndexJoin_73(Build) | 9.95 | 78 | root | | time:13.7ms, loops:2, inner:{total:6.28ms, concurrency:5, task:6, construct:1.88ms, fetch:4.34ms, build:14.2µs}, probe:429.1µs | inner join, inner:TableReader_69, outer key:test.t_ds_task_instance.process_definition_id, inner key:test.t_ds_process_definition.id, equal cond:eq(test.t_ds_task_instance.process_definition_id, test.t_ds_process_definition.id) | 8.74 MB | N/A | | │ ├─IndexLookUp_80(Build) | 12044.56 | 8128 | root | | time:17.1ms, loops:11, index_task: {total_time: 11.7ms, fetch_handle: 4.08ms, build: 7.55ms, wait: 38.1µs}, table_task: {total_time: 42.3ms, num: 11, concurrency: 5} | | 35.0 MB | N/A | | │ │ ├─IndexFullScan_77(Build) | 12044.56 | 122304 | cop[tikv] | table:instance, index:idx_start_time(start_time) | time:1.66ms, loops:103, cop_task: {num: 10, max: 823.3µs, min: 221.4µs, avg: 349.4µs, p95: 823.3µs, rpc_num: 10, rpc_time: 3.41ms, copr_cache_hit_ratio: 1.00, distsql_concurrency: 15}, tikv_task:{proc max:17ms, min:2ms, avg: 6ms, p80:12ms, p95:17ms, iters:158, tasks:10}, scan_detail: {get_snapshot_time: 486.4µs, rocksdb: {block: {}}} | keep order:true, desc | N/A | N/A | | │ │ └─Selection_79(Probe) | 12044.56 | 16840 | cop[tikv] | | time:33.5ms, loops:30, cop_task: {num: 8, max: 2.47ms, min: 259.4µs, avg: 836.4µs, p95: 2.47ms, max_proc_keys: 128, p95_proc_keys: 128, tot_proc: 1ms, rpc_num: 8, rpc_time: 6.56ms, copr_cache_hit_ratio: 0.75, distsql_concurrency: 15}, tikv_task:{proc max:15ms, min:0s, avg: 6.13ms, p80:12ms, p95:15ms, iters:51, tasks:8}, scan_detail: {total_process_keys: 224, total_process_keys_size: 410584, total_keys: 898, get_snapshot_time: 692.2µs, rocksdb: {key_skipped_count: 1118, block: {cache_hit_count: 33}}} | not(isnull(test.t_ds_task_instance.process_definition_id)), not(isnull(test.t_ds_task_instance.process_instance_id)) | N/A | N/A | | │ │ └─TableRowIDScan_78 | 12044.56 | 16840 | cop[tikv] | table:instance | tikv_task:{proc max:15ms, min:0s, avg: 6.13ms, p80:12ms, p95:15ms, iters:51, tasks:8} | keep order:false | N/A | N/A | | │ └─TableReader_69(Probe) | 9.95 | 21 | root | | time:3.82ms, loops:10, cop_task: {num: 6, max: 790.3µs, min: 468.8µs, avg: 588.6µs, p95: 790.3µs, max_proc_keys: 87, p95_proc_keys: 87, rpc_num: 6, rpc_time: 3.43ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15} | data:Selection_68 | N/A | N/A | | │ └─Selection_68 | 9.95 | 21 | cop[tikv] | | tikv_task:{proc max:1ms, min:0s, avg: 500µs, p80:1ms, p95:1ms, iters:12, tasks:6}, scan_detail: {total_process_keys: 363, total_process_keys_size: 1188705, total_keys: 408, get_snapshot_time: 42.2µs, rocksdb: {key_skipped_count: 623, block: {cache_hit_count: 283}}} | eq(test.t_ds_process_definition.project_id, 11) | N/A | N/A | | │ └─TableRangeScan_67 | 9951.71 | 363 | cop[tikv] | table:define | tikv_task:{proc max:1ms, min:0s, avg: 500µs, p80:1ms, p95:1ms, iters:12, tasks:6} | range: decided by [test.t_ds_task_instance.process_definition_id], keep order:false, stats:pseudo | N/A | N/A | | └─TableReader_60(Probe) | 9.95 | 18 | root | | time:497µs, loops:2, cop_task: {num: 1, max: 451µs, proc_keys: 18, rpc_num: 1, rpc_time: 436.8µs, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15} | data:TableRangeScan_59 | N/A | N/A | | └─TableRangeScan_59 | 9.95 | 18 | cop[tikv] | table:process | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 18, total_process_keys_size: 65040, total_keys: 21, get_snapshot_time: 7.14µs, rocksdb: {key_skipped_count: 15, block: {cache_hit_count: 42}}} | range: decided by [test.t_ds_task_instance.process_instance_id], keep order:false | N/A | N/A | +-----------------------------------------+----------+---------+-----------+--------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+ 12 rows in set (0.02 sec)
解决办法:
- 通过设置tidb内存大小
--可以先放大 session 级的 mem-quota-query 参数 set tidb_mem_quota_query=16073741824; --如果想让参数持久化生效的话,再改下配置。
- inner join 改成STRAIGHT_JOIN(这个不开启tiflash的话,貌似没什么用处,开了tiflash之后,速度提升很明显)
- 同步数据到tiflash(需要把索引删除,否则会优先走索引,速度会慢,而且也会出现OOM的情况)
--tidb 同步到tiflash ALTER TABLE `test`.`t_ds_process_definition` SET TIFLASH REPLICA 2; ALTER TABLE `test`.`t_ds_process_instance` SET TIFLASH REPLICA 2; ALTER TABLE `test`.`t_ds_task_instance` SET TIFLASH REPLICA 2; -- 查看同步进度 SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 't_ds_process_definition';