• 企业400电话
  • 微网小程序
  • AI电话机器人
  • 电商代运营
  • 全 部 栏 目

    企业400电话 网络优化推广 AI电话机器人 呼叫中心 网站建设 商标✡知产 微网小程序 电商运营 彩铃•短信 增值拓展业务
    postgresql 中的 like 查询优化方案

    当时数量量比较庞大的时候,做模糊查询效率很慢,为了优化查询效率,尝试如下方法做效率对比

    一、对比情况说明:

    1、数据量100w条数据

    2、执行sql

    二、对比结果

    explain analyze SELECT
     c_patent,
     c_applyissno,
     d_applyissdate,
     d_applydate,
     c_patenttype_dimn,
     c_newlawstatus,
     c_abstract 
    FROM
     public.t_knowl_patent_zlxx_temp 
    WHERE
     c_applicant LIKE '%本溪满族自治县连山关镇安平安养殖场%';

    1、未建索时执行计划:

    "Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=217.264..217.264 rows=0 loops=1)
     Workers Planned: 2
     Workers Launched: 2
     -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=212.355..212.355 rows=0 loops=3)
      Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
      Rows Removed by Filter: 333333
    Planning time: 0.272 ms
    Execution time: 228.116 ms"

    2、btree索引

    建索引语句

    CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx(c_applicant varchar_pattern_ops);
    

    执行计划

    "Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=208.253..208.253 rows=0 loops=1)
     Workers Planned: 2
     Workers Launched: 2
     -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=203.573..203.573 rows=0 loops=3)
      Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
      Rows Removed by Filter: 333333
    Planning time: 0.116 ms
    Execution time: 218.189 ms"

    但是如果将查询sql稍微改动一下,把like查询中的前置%去掉是这样的

    Index Scan using idx_public_t_knowl_patent_zlxx_applicant on t_knowl_patent_zlxx_temp (cost=0.55..8.57 rows=92 width=1278) (actual time=0.292..0.292 rows=0 loops=1)
     Index Cond: (((c_applicant)::text ~>=~ '本溪满族自治县连山关镇安平安养殖场'::text) AND ((c_applicant)::text ~~ '本溪满族自治县连山关镇安平安养殖圻'::text))
     Filter: ((c_applicant)::text ~~ '本溪满族自治县连山关镇安平安养殖场%'::text)
    Planning time: 0.710 ms
    Execution time: 0.378 ms

    3、gin索引

    创建索引语句(postgresql要求在9.6版本及以上)

    create extension pg_trgm;
    CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx USING gin (c_applicant gin_trgm_ops);

    执行计划

    Bitmap Heap Scan on t_knowl_patent_zlxx (cost=244.71..600.42 rows=91 width=1268) (actual time=0.649..0.649 rows=0 loops=1)
     Recheck Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
     -> Bitmap Index Scan on idx_public_t_knowl_patent_zlxx_applicant (cost=0.00..244.69 rows=91 width=0) (actual time=0.647..0.647 rows=0 loops=1)
      Index Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
    Planning time: 0.673 ms
    Execution time: 0.740 ms

    三、结论

    btree索引可以让后置% "abc%"的模糊匹配走索引,gin + gp_trgm可以让前后置% "%abc%" 走索引。但是gin 索引也有弊端,以下情况可能导致无法命中:

    搜索字段少于3个字符时,不会命中索引,这是gin自身机制导致。

    当搜索字段过长时,比如email检索,可能也不会命中索引,造成原因暂时未知。

    补充:PostgreSQL LIKE 查询效率提升实验

    一、未做索引的查询效率

    作为对比,先对未索引的查询做测试

    EXPLAIN ANALYZE select * from gallery_map where author = '曹志耘';
                 QUERY PLAN             
    -----------------------------------------------------------------------------------------------------------------
     Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.011..39.753 rows=1031 loops=1)
     Filter: ((author)::text = '曹志耘'::text)
     Rows Removed by Filter: 71315
     Planning time: 0.194 ms
     Execution time: 39.879 ms
    (5 rows)
     
    Time: 40.599 ms
    EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘';
                 QUERY PLAN             
    -----------------------------------------------------------------------------------------------------------------
     Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.017..41.513 rows=1031 loops=1)
     Filter: ((author)::text ~~ '曹志耘'::text)
     Rows Removed by Filter: 71315
     Planning time: 0.188 ms
     Execution time: 41.669 ms
    (5 rows)
     
    Time: 42.457 ms
     
    EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘%';
                 QUERY PLAN             
    -----------------------------------------------------------------------------------------------------------------
     Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.017..41.492 rows=1031 loops=1)
     Filter: ((author)::text ~~ '曹志耘%'::text)
     Rows Removed by Filter: 71315
     Planning time: 0.307 ms
     Execution time: 41.633 ms
    (5 rows)
     
    Time: 42.676 ms

    很显然都会做全表扫描

    二、创建btree索引

    PostgreSQL默认索引是btree

    CREATE INDEX ix_gallery_map_author ON gallery_map (author);
     
    EXPLAIN ANALYZE select * from gallery_map where author = '曹志耘';  
                    QUERY PLAN                
    -------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.457..1.312 rows=1031 loops=1)
     Recheck Cond: ((author)::text = '曹志耘'::text)
     Heap Blocks: exact=438
     -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.358..0.358 rows=1031 loops=1)
       Index Cond: ((author)::text = '曹志耘'::text)
     Planning time: 0.416 ms
     Execution time: 1.422 ms
    (7 rows)
     
    Time: 2.462 ms
     
    EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘';
                    QUERY PLAN                
    -------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.752..2.119 rows=1031 loops=1)
     Filter: ((author)::text ~~ '曹志耘'::text)
     Heap Blocks: exact=438
     -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.560..0.560 rows=1031 loops=1)
       Index Cond: ((author)::text = '曹志耘'::text)
     Planning time: 0.270 ms
     Execution time: 2.295 ms
    (7 rows)
     
    Time: 3.444 ms
    EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘%';
                 QUERY PLAN             
    -----------------------------------------------------------------------------------------------------------------
     Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.015..41.389 rows=1031 loops=1)
     Filter: ((author)::text ~~ '曹志耘%'::text)
     Rows Removed by Filter: 71315
     Planning time: 0.260 ms
     Execution time: 41.518 ms
    (5 rows)
     
    Time: 42.430 ms
    EXPLAIN ANALYZE select * from gallery_map where author like '%研究室';
                 QUERY PLAN             
    -----------------------------------------------------------------------------------------------------------------
     Seq Scan on gallery_map (cost=0.00..7002.32 rows=2282 width=621) (actual time=0.064..52.824 rows=2152 loops=1)
     Filter: ((author)::text ~~ '%研究室'::text)
     Rows Removed by Filter: 70194
     Planning time: 0.254 ms
     Execution time: 53.064 ms
    (5 rows)
     
    Time: 53.954 ms

    可以看到,等于、like的全匹配是用到索引的,like的模糊查询还是全表扫描

    三、创建gin索引

    CREATE EXTENSION pg_trgm;
     
    CREATE INDEX ix_gallery_map_author ON gallery_map USING gin (author gin_trgm_ops);
    EXPLAIN ANALYZE select * from gallery_map where author like '曹%'; 
                    QUERY PLAN                
    -------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on gallery_map (cost=19.96..2705.69 rows=1028 width=621) (actual time=0.419..1.771 rows=1031 loops=1)
     Recheck Cond: ((author)::text ~~ '曹%'::text)
     Heap Blocks: exact=438
     -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..19.71 rows=1028 width=0) (actual time=0.312..0.312 rows=1031 loops=1)
       Index Cond: ((author)::text ~~ '曹%'::text)
     Planning time: 0.358 ms
     Execution time: 1.916 ms
    (7 rows)
     
    Time: 2.843 ms
    EXPLAIN ANALYZE select * from gallery_map where author like '%耘%'; 
                 QUERY PLAN             
    -----------------------------------------------------------------------------------------------------------------
     Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.015..51.641 rows=1031 loops=1)
     Filter: ((author)::text ~~ '%耘%'::text)
     Rows Removed by Filter: 71315
     Planning time: 0.268 ms
     Execution time: 51.957 ms
    (5 rows)
     
    Time: 52.899 ms
    EXPLAIN ANALYZE select * from gallery_map where author like '%研究室%';
                    QUERY PLAN                
    -------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on gallery_map (cost=31.83..4788.42 rows=2559 width=621) (actual time=0.914..4.195 rows=2402 loops=1)
     Recheck Cond: ((author)::text ~~ '%研究室%'::text)
     Heap Blocks: exact=868
     -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..31.19 rows=2559 width=0) (actual time=0.694..0.694 rows=2402 loops=1)
       Index Cond: ((author)::text ~~ '%研究室%'::text)
     Planning time: 0.306 ms
     Execution time: 4.403 ms
    (7 rows)
     
    Time: 5.227 ms

    gin_trgm索引的效果好多了

    由于pg_trgm的索引是把字符串切成多个3元组,然后使用这些3元组做匹配,所以gin_trgm索引对于少于3个字符(包括汉字)的查询,只有前缀匹配会走索引

    另外,还测试了btree_gin,效果和btree一样

    注意:

    gin_trgm要求数据库必须使用UTF-8编码

    demo_v1 # \l demo_v1
            List of databases
     Name | Owner | Encoding | Collate | Ctype | Access privileges
    ---------+-----------+----------+-------------+-------------+-------------------
     demo_v1 | wmpp_user | UTF8  | en_US.UTF-8 | en_US.UTF-8 |
     

    以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。如有错误或未考虑完全的地方,望不吝赐教。

    您可能感兴趣的文章:
    • postgresql coalesce函数数据转换方式
    • postgresql 中的COALESCE()函数使用小技巧
    • postgresql 实现修改jsonb字段中的某一个值
    • postgresql 实现将字段为空的值替换为指定值
    • PostgreSQL 禁用全表扫描的实现
    • 解决PostgreSQL Array使用中的一些小问题
    • sql 实现将空白值替换为其他值
    上一篇:解析PostgreSQL中Oid和Relfilenode的映射问题
    下一篇:解决PostgreSQL Array使用中的一些小问题
  • 相关文章
  • 

    © 2016-2020 巨人网络通讯 版权所有

    《增值电信业务经营许可证》 苏ICP备15040257号-8

    postgresql 中的 like 查询优化方案 postgresql,中的,like,查询,