• 企业400电话
  • 微网小程序
  • AI电话机器人
  • 电商代运营
  • 全 部 栏 目

    企业400电话 网络优化推广 AI电话机器人 呼叫中心 网站建设 商标✡知产 微网小程序 电商运营 彩铃•短信 增值拓展业务
    postgresql 13.1 insert into select并行查询的实现

    本文信息基于PG13.1。

    从PG9.6开始支持并行查询。PG11开始支持CREATE TABLE … AS、SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询。

    先说结论:

    换用create table as 或者select into或者导入导出。

    首先跟踪如下查询语句的执行计划:

    select count(*) from test t1,test1 t2 where t1.id = t2.id ;
    postgres=# explain analyze select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                      QUERY PLAN                  
    -------------------------------------------------------------------------------------------
    Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=683.246..715.324 rows=1 loops=1)
     -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=681.474..715.311 rows=3 loops=1)
       Workers Planned: 2
       Workers Launched: 2
       -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=674.689..675.285 rows=1 loops=3)
        -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=447.799..645.689 rows=333333 loops=3)
          Hash Cond: (t1.id = t2.id)
          -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.025..74.010 rows=333333 loops=3)
          -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=260.052..260.053 rows=333333 loops=3)
           Buckets: 131072 Batches: 16 Memory Usage: 3520kB
           -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..104.804 rows=333333 loops=3)
     Planning Time: 0.420 ms
     Execution Time: 715.447 ms
    (13 rows)
    

    可以看到走了两个Workers。

    下边看一下insert into select:

    postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;   
                     QUERY PLAN                 
    -------------------------------------------------------------------------------------------
    Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3744.179..3744.187 rows=0 loops=1)
     -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3743.343..3743.352 rows=1 loops=1)
       -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3743.247..3743.254 rows=1 loops=1)
        -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1092.295..3511.301 rows=1000000 loops=1)
          Hash Cond: (t1.id = t2.id)
          -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.030..421.537 rows=1000000 loops=1)
          -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1090.078..1090.081 rows=1000000 loops=1)
           Buckets: 131072 Batches: 16 Memory Usage: 3227kB
           -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.021..422.768 rows=1000000 loops=1)
     Planning Time: 0.511 ms
     Execution Time: 3745.633 ms
    (11 rows)
    

    可以看到并没有Workers的指示,没有启用并行查询。

    即使开启强制并行,也无法走并行查询。

    postgres=# set force_parallel_mode =on;
    SET
    postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                     QUERY PLAN                 
    -------------------------------------------------------------------------------------------
    Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3825.042..3825.049 rows=0 loops=1)
     -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3824.976..3824.984 rows=1 loops=1)
       -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3824.972..3824.978 rows=1 loops=1)
        -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1073.587..3599.402 rows=1000000 loops=1)
          Hash Cond: (t1.id = t2.id)
          -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.034..414.965 rows=1000000 loops=1)
          -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1072.441..1072.443 rows=1000000 loops=1)
           Buckets: 131072 Batches: 16 Memory Usage: 3227kB
           -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.022..400.624 rows=1000000 loops=1)
     Planning Time: 0.577 ms
     Execution Time: 3825.923 ms
    (11 rows)
    

    原因在官方文档有写:

    The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE … AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan.

    解决方案有如下三种:

    1.select into

    postgres=# explain analyze select count(*) into vaa from test t1,test1 t2 where t1.id = t2.id ;
                      QUERY PLAN                  
    -------------------------------------------------------------------------------------------
    Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=742.736..774.923 rows=1 loops=1)
     -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=740.223..774.907 rows=3 loops=1)
       Workers Planned: 2
       Workers Launched: 2
       -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=731.408..731.413 rows=1 loops=3)
        -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=489.880..700.830 rows=333333 loops=3)
          Hash Cond: (t1.id = t2.id)
          -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.033..87.479 rows=333333 loops=3)
          -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=266.839..266.840 rows=333333 loops=3)
           Buckets: 131072 Batches: 16 Memory Usage: 3520kB
           -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.058..106.874 rows=333333 loops=3)
     Planning Time: 0.319 ms
     Execution Time: 783.300 ms
    (13 rows)
    

    2.create table as

    postgres=# explain analyze create table vb as select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                      QUERY PLAN                  
    -------------------------------------------------------------------------------------------
     Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=540.120..563.733 rows=1 loops=1)
     -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=537.982..563.720 rows=3 loops=1)
       Workers Planned: 2
       Workers Launched: 2
       -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=526.602..527.136 rows=1 loops=3)
        -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=334.532..502.793 rows=333333 loops=3)
          Hash Cond: (t1.id = t2.id)
          -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.018..57.819 rows=333333 loops=3)
          -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=189.502..189.503 rows=333333 loops=3)
           Buckets: 131072 Batches: 16 Memory Usage: 3520kB
           -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..77.786 rows=333333 loops=3)
     Planning Time: 0.189 ms
     Execution Time: 565.448 ms
    (13 rows)
    

    3.或者通过导入导出的方式,例如:

    psql -h localhost -d postgres -U postgres -c "select count(*) from test t1,test1 t2 where t1.id = t2.id " -o result.csv -A -t -F ","
    psql -h localhost -d postgres -U postgres -c "COPY va FROM 'result.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER FALSE, ENCODING 'windows-1252')"
    

    一些场景下也会比非并行快。

    补充:POSTGRESQL: 动态SQL语句中不能使用SELECT INTO?

    我的数据库版本是 PostgreSQL 8.4.7 。 下面是出错的存储过程:

    CREATE or Replace FUNCTION func_getnextid(
     tablename varchar(240),
     idname varchar(20) default 'id')
    RETURNS integer AS $funcbody$
    Declare
     sqlstring varchar(240);
     currentId integer;
    Begin
     sqlstring:= 'select max("' || idname || '") into currentId from "' || tablename || '";';
     EXECUTE sqlstring;
     if currentId is NULL or currentId = 0 then
      return 1;
     else
      return currentId + 1;
     end if;
    End;
    $funcbody$ LANGUAGE plpgsq

    执行后出现这样的错误:

    SQL error:

    ERROR: EXECUTE of SELECT ... INTO is not implemented

    CONTEXT: PL/pgSQL function "func_getnextbigid" line 6 at EXECUTE statement

    改成这样的就对了:

    CREATE or Replace FUNCTION func_getnextid(
     tablename varchar(240),
     idname varchar(20) default 'id')
    RETURNS integer AS $funcbody$
    Declare
     sqlstring varchar(240);
     currentId integer;
    Begin
     sqlstring:= 'select max("' || idname || '") from "' || tablename || '";';
     EXECUTE sqlstring into currentId;
     if currentId is NULL or currentId = 0 then
      return 1;
     else
      return currentId + 1;
     end if;
    End;
    $funcbody$ LANGUAGE plpgsql;

    以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。如有错误或未考虑完全的地方,望不吝赐教。

    您可能感兴趣的文章:
    • postgresql insert into select无法使用并行查询的解决
    • mysql 中 replace into 与 insert into on duplicate key update 的用法和不同点实例分析
    • SELECT INTO 和 INSERT INTO SELECT 两种表复制语句详解(SQL数据库和Oracle数据库的区别)
    • php mysql insert into 结合详解及实例代码
    • PHP+MySQL之Insert Into数据插入用法分析
    • 正确使用MySQL INSERT INTO语句
    • MySql中使用INSERT INTO语句更新多条数据的例子
    • SQL insert into语句写法讲解
    上一篇:PostgreSQL 启动失败的解决方案
    下一篇:PostgreSQL 对IN,EXISTS,ANY/ALL,JOIN的sql优化方案
  • 相关文章
  • 

    © 2016-2020 巨人网络通讯 版权所有

    《增值电信业务经营许可证》 苏ICP备15040257号-8

    postgresql 13.1 insert into select并行查询的实现 postgresql,13.1,insert,into,select,