數(shù)據(jù)庫(kù)中如何搜索時(shí)空行為數(shù)據(jù)

這篇文章主要講解了“數(shù)據(jù)庫(kù)中如何搜索時(shí)空行為數(shù)據(jù)”，文中的講解內(nèi)容簡(jiǎn)單清晰，易于學(xué)習(xí)與理解，下面請(qǐng)大家跟著小編的思路慢慢深入，一起來(lái)研究和學(xué)習(xí)“數(shù)據(jù)庫(kù)中如何搜索時(shí)空行為數(shù)據(jù)”吧！

成都做網(wǎng)站、網(wǎng)站制作、成都外貿(mào)網(wǎng)站建設(shè)的關(guān)注點(diǎn)不是能為您做些什么網(wǎng)站，而是怎么做網(wǎng)站，有沒(méi)有做好網(wǎng)站，給創(chuàng)新互聯(lián)一個(gè)展示的機(jī)會(huì)來(lái)證明自己，這并不會(huì)花費(fèi)您太多時(shí)間，或許會(huì)給您帶來(lái)新的靈感和驚喜。面向用戶友好，注重用戶體驗(yàn)，一切以用戶為中心。

數(shù)據(jù)結(jié)構(gòu)

時(shí)空行為數(shù)據(jù)包含三個(gè)屬性：時(shí)間、空間和對(duì)象。

非結(jié)構(gòu)化索引：

create table test(   id int8,   crt_time timestamp, -- Time   pos geometry, -- Location   obj jsonb -- Object description  );

除了應(yīng)用于JSON，結(jié)構(gòu)化數(shù)據(jù)還可以用于對(duì)象描述。例如：

create table test(   id int8,   crt_time timestamp, -- Time   pos geometry, -- Location   c1 int, -- Some property examples   c2 int,   c3 text,   c4 float8,   c5 int,   c6 date,   c7 text,   c8 int,   c9 int,   c10 int  );

時(shí)空行為數(shù)據(jù)的SQL查詢實(shí)例

select * from test   where   pos <-> ? < ?   and crt_time between ? and ?   and ( (c1 = ? and c2 between ? and ?) or c10=?)   ...   ;

優(yōu)化方法

考慮運(yùn)用以下知識(shí)：

時(shí)間序列BRIN索引

crt_time字段是一個(gè)時(shí)間序列字段，表示生成數(shù)據(jù)的時(shí)間。在PostgreSQL堆存儲(chǔ)中，存儲(chǔ)和該字段的值具有很強(qiáng)的線性相關(guān)性。

因此，BRIN索引很合適。

使用BRIN索引來(lái)代替分區(qū)表進(jìn)行TPC-H測(cè)試。大范圍搜索的性能甚至優(yōu)于使用分區(qū)表時(shí)的功能。

create index idx_test_1 on test using brin(crt_time);

空間索引

顯然，空間檢索需要空間索引。PostgreSQL中可以使用三種方法實(shí)現(xiàn)空間檢索。

1. 幾何類(lèi)型的GIST索引

create index idx_test_2 on test using gist(pos);

該索引支持空間KNN搜索和空間位置確定等功能。

2. 幾何類(lèi)型的主索引

create index idx_test_2 on test using spgist(pos);

該索引支持空間KNN搜索和空間位置確定等功能。

3. Geohash和B-tree索引(將經(jīng)度和緯度轉(zhuǎn)換為Geohash并為hash值創(chuàng)建B-tree索引)。只需使用表達(dá)式索引。

create index idx_test_3 on test using btree( ST_GeoHash(pos,15) );

此索引支持前綴搜索(其能落實(shí)編碼地理信息網(wǎng)格中包含的關(guān)系)。它屬于有損索引，需要二次過(guò)濾。

GiST和SPGiST空間索引能夠找到準(zhǔn)確的地理位置信息，優(yōu)于GEOHASH索引。但是，查詢信息時(shí)需要特別注意。

GIN 索引

此索引類(lèi)型的目標(biāo)是對(duì)象屬性字段JSONB或多個(gè)結(jié)構(gòu)化對(duì)象屬性字段。只需使用GIN索引。

例如：

create extension btree_gin;

非結(jié)構(gòu)化索引：

create index idx_test_4 on test using gin( obj );

結(jié)構(gòu)化索引：

create index idx_test_4 on test using gin( c1,c2,c3,c4,c5,c6,c7,c8,c9 );

BitmapAnd和BitmapOr

但是，可以同時(shí)使用這些索引嗎? PostgreSQL為多個(gè)索引提供bitmapAnd及bitmapOr接口。它們可以組合多個(gè)索引，減少需要掃描的數(shù)據(jù)庫(kù)數(shù)量。

Heap, one square = one page:  +---------------------------------------------+  |c____u_____X___u___X_________u___cXcc______u_|  +---------------------------------------------+  Rows marked c match customers pkey condition.  Rows marked u match username condition.  Rows marked X match both conditions.  Bitmap scan from customers_pkey:  +---------------------------------------------+  |100000000001000000010000000000000111100000000| bitmap 1  +---------------------------------------------+  One bit per heap page, in the same order as the heap  Bits 1 when condition matches, 0 if not  Bitmap scan from ix_cust_username:  +---------------------------------------------+  |000001000001000100010000000001000010000000010| bitmap 2  +---------------------------------------------+  Once the bitmaps are created a bitwise AND is performed on them:  +---------------------------------------------+  |100000000001000000010000000000000111100000000| bitmap 1  |000001000001000100010000000001000010000000010| bitmap 2   &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&  |000000000001000000010000000000000010000000000| Combined bitmap  +-----------+-------+--------------+----------+   | | |   v v v  Used to scan the heap only for matching pages:  +---------------------------------------------+  |___________X_______X______________X__________|  +---------------------------------------------+  The bitmap heap scan then seeks to the start of each page and reads the page:  +---------------------------------------------+  |___________X_______X______________X__________|  +---------------------------------------------+  seek------->^seek-->^seek--------->^   | | |   ------------------------   only these pages read

例如：

select * from test where   c1 ...   and crt_time between ? and ?   and test->> c1 in (?, ? ...);

根據(jù)統(tǒng)計(jì)數(shù)據(jù)自動(dòng)使用適當(dāng)?shù)乃饕?。如果需要，bitmapAnd和bitmapOr將在多個(gè)索引上自動(dòng)執(zhí)行合并掃描。跳過(guò)不需要掃描的頁(yè)面，重新檢查命中的頁(yè)面。

堆表存儲(chǔ)分級(jí)和分區(qū)

存儲(chǔ)可以分為一級(jí)分區(qū)或多級(jí)分區(qū)：

1. 單一分區(qū)

例如，按時(shí)間劃分。

create table test(   id int8,   crt_time timestamp, -- Time   pos geometry, -- Location   obj jsonb -- Object description  )  PARTITION BY range (crt_time)  ;  create table test_201701 PARTITION OF test for values FROM ( 2017-01-01 ) TO ( 2017-02-01 );  ......

2. 多層分區(qū)

例如，先按時(shí)間分區(qū)，然后按Geohash劃分。

create table test_201701 PARTITION OF test for values  FROM ( 2017-01-01 ) TO ( 2017-02-01 ) partition by range(st_geohash(pos,15));  ...  create table test_201701_prefix1 PARTITION OF test for values  FROM ( xxxx1 ) TO ( xxxx2 );  -- Generate BOX (GRID) on a map, find corresponding boundaries and use  -- boundaries as partitioning conditions

使用分區(qū)時(shí)，如果查詢條件包括分區(qū)鍵(如時(shí)間和空間范圍)，相應(yīng)的分區(qū)將自動(dòng)定位，這即為需要掃描的數(shù)據(jù)量。

創(chuàng)建面向?qū)ο髮傩缘腉IN索引，以實(shí)現(xiàn)高效查詢。

索引分級(jí)與分區(qū)

與數(shù)據(jù)一樣，索引在不使用分區(qū)表的情況下也支持分區(qū)邏輯。

空間索引+時(shí)間分區(qū)

create index idx_20170101  on tbl using gist (pos)  where crt_time between 2017-01-01 and 2017-01-02 ;  ...  create index idx_20170102  on tbl using gist (pos)  where crt_time between 2017-01-02 and 2017-01-03 ;  ...

通過(guò)使用前述分區(qū)索引，可以在輸入時(shí)間范圍后快速定位目標(biāo)數(shù)據(jù)，執(zhí)行空間搜索。

select * from tbl   where crt_time between 2017-01-01 and 2017-01-02 -- Time   and (pos <-> ?) < ? -- Distance to a point to be searched for   and ? -- Other conditions   order by pos <-> ? -- Sort by distance   limit ?; -- Number of results to be returned

可以使用更多的索引分區(qū)，比如用作搜索條件和商店類(lèi)型的維度(對(duì)象屬性)(假設(shè)它是可枚舉的或在范圍相對(duì)較小的情況下)。

create index idx_20170101_mod0 on tbl using gist (pos) where crt_time between 2017-01-01 and 2017-01-02 and dtype=0;  ...  create index idx_20170101_mod1 on tbl using gist (pos) where crt_time between 2017-01-01 and 2017-01-02 and dtype=1;  ...

通過(guò)使用前面的分區(qū)索引，在輸入時(shí)間范圍或特定條件以執(zhí)行空間搜索后，可以快速定位目標(biāo)數(shù)據(jù)。

select * from tbl   where crt_time between 2017-01-01 and 2017-01-02 -- Time   and (pos <-> ?) < ? -- Distance to a point to be searched for   and dtype=0 -- Object condition   and ? -- Other conditions   order by pos <-> ? -- Sort by distance   limit ?; -- Number of results to be returned

請(qǐng)注意，前面的SQL查詢可以實(shí)現(xiàn)最佳性能優(yōu)化。

索引組織形式(或索引結(jié)構(gòu))可以由邏輯分區(qū)重新構(gòu)造，可以用上述類(lèi)似的索引創(chuàng)建方法覆蓋所有條件。

CTID相交陣列連接掃描

如前所述，BitmapAnd和BitmapOr合并掃描是在多個(gè)索引或GIN索引中自動(dòng)執(zhí)行的。事實(shí)上，這種掃描也可以在SQL中顯式執(zhí)行。

每個(gè)條件滲透對(duì)應(yīng)的CTID。

使用Intersect或Union生成滿足總體需求的CTID。(Intersect對(duì)應(yīng)于“and”條件;union對(duì)應(yīng)于“or”條件。)

生成一個(gè)ctid數(shù)組。

示例

1. 創(chuàng)建對(duì)象提要數(shù)據(jù)表

postgres=# create table tbl (id int, info text, crt_time timestamp, pos point, c1 int , c2 int, c3 int );  CREATE TABLE

2. 將5000萬(wàn)條測(cè)試數(shù)據(jù)寫(xiě)入表中

postgres=# insert into tbl select generate_series(1,50000000), md5(random()::text), clock_timestamp(), point(180-random()*180, 90-random()*90), random()*10000, random()*5000, random()*1000;  INSERT 0 50000000

3. 創(chuàng)建對(duì)象索引

postgres=# create index idx_tbl_1 on tbl using gin (info, c1, c2, c3);  CREATE INDEX

4. 創(chuàng)建時(shí)間索引

postgres=# create index idx_tbl_2 on tbl using btree (crt_time);  CREATE INDEX

5. 創(chuàng)建空間索引

postgres=# create index idx_tbl_3 on tbl using gist (pos);  CREATE INDEX

6. 生成數(shù)據(jù)布局以方便后續(xù)查詢

postgres=# select min(crt_time),max(crt_time),count(*) from tbl;   min | max | count  ----------------------------+----------------------------+----------   2017-07-22 17:59:34.136497 | 2017-07-22 18:01:27.233688 | 50000000  (1 row)

7. 創(chuàng)建一個(gè)極限KNN查詢函數(shù)

create or replace function ff(point, float8, int) returns setof tid as  $ declare   v_rec record;   v_limit int := $3;  begin   set local enable_seqscan=off; -- Force index that exits when scanned rows reach a specific number   for v_rec in   select *,   (pos <-> $1) as dist,   ctid   from tbl   order by pos <-> $1   loop   if v_limit <=0 then   -- raise notice "Sufficient data obtained"   return;   end if;   if v_rec.dist > $2 then   -- raise notice "All matching points returned"   return;   else   return next v_rec.ctid;   end if;   v_limit := v_limit -1;   end loop;  end;  $  language plpgsql strict volatile;  postgres=# select * from ff(point (100,100) ,100,100) ;   ff  -------------   (407383,11)   (640740,9)   (26073,51)   (642750,34)  ...  (100 rows)  Time: 1.061 ms

8. CTID合并檢索

顯示符合以下條件的記錄

(  c1 in (1,2,3,4,100,200,99,88,77,66,55)   or  c2 < 10  )   and  pos <-> point (0,0) < 5   and  crt_time between 2017-07-22 17:59:34 and 2017-07-22 17:59:40 ;

首先，分別查看每個(gè)條件，找匹配一個(gè)條件的記錄數(shù)量，以及在索引掃描上所花時(shí)長(zhǎng)。

1. 54,907條記錄

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 in (1,2,3,4,100,200,99,88,77,66,55);   QUERY PLAN  -------------------------------------------------------------------------------------------------------------------------------   Bitmap Heap Scan on postgres.tbl (cost=820.07..65393.94 rows=54151 width=73) (actual time=23.842..91.911 rows=54907 loops=1)   Output: id, info, crt_time, pos, c1, c2, c3   Recheck Cond: (tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[]))   Heap Blocks: exact=52778   Buffers: shared hit=52866   -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..806.54 rows=54151 width=0) (actual time=14.264..14.264 rows=54907 loops=1)   Index Cond: (tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[]))   Buffers: shared hit=88   Planning time: 0.105 ms   Execution time: 94.606 ms  (10 rows)

2. 95,147條記錄

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c2<10;   QUERY PLAN  ---------------------------------------------------------------------------------------------------------------------------------   Bitmap Heap Scan on postgres.tbl (cost=835.73..112379.10 rows=99785 width=73) (actual time=69.243..179.388 rows=95147 loops=1)   Output: id, info, crt_time, pos, c1, c2, c3   Recheck Cond: (tbl.c2 < 10)   Heap Blocks: exact=88681   Buffers: shared hit=88734   -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..810.79 rows=99785 width=0) (actual time=53.612..53.612 rows=95147 loops=1)   Index Cond: (tbl.c2 < 10)   Buffers: shared hit=53   Planning time: 0.094 ms   Execution time: 186.201 ms  (10 rows)

3. 149930條記錄(為快速獲得結(jié)果，PostgreSQL使用位圖進(jìn)行合并掃描)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 in (1,2,3,4,100,200,99,88,77,66,55) or c2 <10;   QUERY PLAN  ------------------------------------------------------------------------------------------------------------------------------------   Bitmap Heap Scan on postgres.tbl (cost=1694.23..166303.58 rows=153828 width=73) (actual time=98.988..266.852 rows=149930 loops=1)   Output: id, info, crt_time, pos, c1, c2, c3   Recheck Cond: ((tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[])) OR (tbl.c2 < 10))   Heap Blocks: exact=134424   Buffers: shared hit=134565   -> BitmapOr (cost=1694.23..1694.23 rows=153936 width=0) (actual time=73.763..73.763 rows=0 loops=1)   Buffers: shared hit=141   -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..806.54 rows=54151 width=0) (actual time=16.733..16.733 rows=54907 loops=1)   Index Cond: (tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[]))   Buffers: shared hit=88   -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..810.79 rows=99785 width=0) (actual time=57.029..57.029 rows=95147 loops=1)   Index Cond: (tbl.c2 < 10)   Buffers: shared hit=53   Planning time: 0.149 ms   Execution time: 274.548 ms  (15 rows)

4. 60,687條記錄(即使運(yùn)用出色的KNN性能優(yōu)化，仍然需要耗費(fèi)195毫秒)。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from ff(point (0,0) ,5,1000000);   QUERY PLAN  ----------------------------------------------------------------------------------------------------------------------   Function Scan on postgres.ff (cost=0.25..10.25 rows=1000 width=6) (actual time=188.563..192.114 rows=60687 loops=1)   Output: ff   Function Call: ff( (0,0) ::point, 5 ::double precision, 1000000)   Buffers: shared hit=61296   Planning time: 0.029 ms   Execution time: 195.097 ms  (6 rows)

讓我們看看不使用KNN優(yōu)化需要多長(zhǎng)時(shí)間。

結(jié)果非常令人驚訝——極限優(yōu)化性能提高了一個(gè)數(shù)量級(jí)。

5. 2,640,751條記錄

使用所有索引逐個(gè)掃描數(shù)據(jù)條件，得到ctid并執(zhí)行ctid掃描。

現(xiàn)在，讓我們來(lái)分解這個(gè)過(guò)程：

首先，讓我們看看時(shí)間和對(duì)象屬性的合并查詢，成果非常驚人。使用位圖BitmapOr時(shí)，查詢可以跳過(guò)大多數(shù)數(shù)據(jù)塊，并且掃描時(shí)間比單索引掃描要短。

注意，在此步驟中記錄的數(shù)量減少到7,847條。

postgres=# explain (analyze,verbose,timing,costs,buffers) select ctid from tbl   where crt_time between 2017-07-22 17:59:34 and 2017-07-22 17:59:40   and (   c1 in (1,2,3,4,100,200,99,88,77,66,55)   or   c2 < 10   );   QUERY PLAN  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------   Bitmap Heap Scan on postgres.tbl (cost=35025.85..44822.94 rows=7576 width=6) (actual time=205.577..214.821 rows=7847 loops=1)   Output: ctid   Recheck Cond: (((tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[])) OR (tbl.c2 < 10)) AND (tbl.crt_time >= 2017-07-22 17:59:34 ::timestamp without time zone) AND (tbl.crt_time <= 2017-07-22 17:59:40 ::timestamp without time zone))   Heap Blocks: exact=6983   Buffers: shared hit=14343   -> BitmapAnd (cost=35025.85..35025.85 rows=7581 width=0) (actual time=204.048..204.048 rows=0 loops=1)   Buffers: shared hit=7360   -> BitmapOr (cost=1621.11..1621.11 rows=153936 width=0) (actual time=70.279..70.279 rows=0 loops=1)   Buffers: shared hit=141   -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..806.54 rows=54151 width=0) (actual time=15.860..15.860 rows=54907 loops=1)   Index Cond: (tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[]))   Buffers: shared hit=88   -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..810.79 rows=99785 width=0) (actual time=54.418..54.418 rows=95147 loops=1)   Index Cond: (tbl.c2 < 10)   Buffers: shared hit=53   -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..33402.60 rows=2462443 width=0) (actual time=127.101..127.101 rows=2640751 loops=1)   Index Cond: ((tbl.crt_time >= 2017-07-22 17:59:34 ::timestamp without time zone) AND (tbl.crt_time <= 2017-07-22 17:59:40 ::timestamp without time zone))   Buffers: shared hit=7219   Planning time: 0.203 ms   Execution time: 216.697 ms  (20 rows)

然后，看KNN的掃描時(shí)間：

注意，60,687條記錄滿足KNN距離條件，所以接下來(lái)將解釋CTID合并掃描與原始掃描之間的性能比較。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from ff(point (0,0) ,5,1000000);   QUERY PLAN  ----------------------------------------------------------------------------------------------------------------------   Function Scan on postgres.ff (cost=0.25..10.25 rows=1000 width=6) (actual time=188.563..192.114 rows=60687 loops=1)   Output: ff   Function Call: ff( (0,0) ::point, 5 ::double precision, 1000000)   Buffers: shared hit=61296   Planning time: 0.029 ms   Execution time: 195.097 ms  (6 rows)

最后，將這些片段合并到ctid中。

select * from ff(point (0,0) ,5,1000000)   intersect  select ctid from tbl   where crt_time between 2017-07-22 17:59:34 and 2017-07-22 17:59:40   and (   c1 in (1,2,3,4,100,200,99,88,77,66,55)   or   c2 < 10   );   ff  ------------   (1394,8)   (3892,50)   (6124,45)   (7235,8)   (7607,45)   (11540,8)   (13397,31)   (14266,36)   (18149,7)   (19256,44)   (24671,62)   (26525,64)   (30235,48)  (13 rows)  Time: 463.012 ms

取得最終紀(jì)錄。

select * from tbl where ctid = any  (  array( -- array start  select * from ff(point (0,0) ,5,1000000) intersect select ctid from tbl   where crt_time between 2017-07-22 17:59:34 and 2017-07-22 17:59:40   and (   c1 in (1,2,3,4,100,200,99,88,77,66,55)   or   c2 < 10   )  ) -- array end  );   id | info | crt_time | pos | c1 | c2 | c3  ---------+----------------------------------+----------------------------+----------------------------------------+------+------+-----   104558 | c4699c933d4e2d2a10d828c4ff0b3362 | 2017-07-22 17:59:34.362508 | (4.20534582808614,2.43749532848597) | 99 | 4858 | 543   291950 | 1c2901689ab1eb7653d8ad972f7aa376 | 2017-07-22 17:59:34.776808 | (2.5384977646172,1.09820357523859) | 3 | 2131 | 360   459345 | 9e46548f29d914019ce53a589be8ebac | 2017-07-22 17:59:35.148699 | (0.715781506150961,3.1486327573657) | 1 | 1276 | 8   542633 | c422d6137f9111d5c2dc723b40c7023f | 2017-07-22 17:59:35.334278 | (0.0631888210773468,2.2334903664887) | 4968 | 3 | 245   570570 | fc57bfc6b7781d89b17c90417bd306f7 | 2017-07-22 17:59:35.39653 | (3.14926156774163,1.04107855819166) | 88 | 2560 | 561   865508 | 34509c7f7640afaf288a5e1d38199701 | 2017-07-22 17:59:36.052573 | (3.12869547866285,2.34822122845799) | 2 | 65 | 875   1004806 | afe9f88cbebf615a7ae5f41180c4b33f | 2017-07-22 17:59:36.362027 | (1.13972157239914,3.28763140831143) | 3 | 1639 | 208   1069986 | 6b9f27bfde993fb0bae3336ac010af7a | 2017-07-22 17:59:36.507775 | (4.51995821669698,2.08761331625283) | 2 | 200 | 355   1361182 | 7c4c1c208c2b2b21f00772c43955d238 | 2017-07-22 17:59:37.155127 | (1.7334086727351,2.18367457855493) | 9742 | 0 | 232   1444244 | 41bf6f8e4b89458c13fb408a7db05284 | 2017-07-22 17:59:37.339594 | (0.52773853763938,2.16670122463256) | 1 | 2470 | 820   1850387 | 6e0011c6db76075edd2aa7f81ec94129 | 2017-07-22 17:59:38.243091 | (0.0168232340365648,0.420973123982549) | 100 | 4395 | 321   1989439 | 6211907ac254a4a3ca54f90822a2095e | 2017-07-22 17:59:38.551637 | (0.0274275150150061,0.490507003851235) | 1850 | 5 | 74   2267673 | 898fdd54dcc5b14c27cf1c8b9afe2471 | 2017-07-22 17:59:39.170035 | (0.394239127635956,2.86229319870472) | 2892 | 6 | 917  (13 rows)  Time: 462.715 ms

過(guò)程花費(fèi)462毫秒。

9. 測(cè)試原始SQL查詢的性能: PostgreSQL Multi-Index BitmapAnd and BitmapOr跳過(guò)掃描

直接編寫(xiě)SQL查詢，而不是使用多CTID掃描。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl   where   crt_time between 2017-07-22 17:59:34 and 2017-07-22 17:59:40   and (   c1 in (1,2,3,4,100,200,99,88,77,66,55)   or   c2 < 10   )   and   pos <-> point (0,0) < 5;   Bitmap Heap Scan on postgres.tbl (cost=35022.06..44857.06 rows=2525 width=73) (actual time=205.542..214.547 rows=13 loops=1)   Output: id, info, crt_time, pos, c1, c2, c3   Recheck Cond: (((tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[])) OR (tbl.c2 < 10)) AND (tbl.crt_time >= 2017-07-22 17:59:34 ::timestamp without time zone) AND (tbl.crt_time <= 2017-07-22 17:59:40 ::timestamp without time zone))   Filter: ((tbl.pos <-> (0,0) ::point) < 5 ::double precision)   Rows Removed by Filter: 7834   Heap Blocks: exact=6983   Buffers: shared hit=14343   -> BitmapAnd (cost=35022.06..35022.06 rows=7581 width=0) (actual time=203.620..203.620 rows=0 loops=1)   Buffers: shared hit=7360   -> BitmapOr (cost=1618.58..1618.58 rows=153936 width=0) (actual time=71.660..71.660 rows=0 loops=1)   Buffers: shared hit=141   -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..806.54 rows=54151 width=0) (actual time=14.861..14.861 rows=54907 loops=1)   Index Cond: (tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[]))   Buffers: shared hit=88   -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..810.79 rows=99785 width=0) (actual time=56.797..56.797 rows=95147 loops=1)   Index Cond: (tbl.c2 < 10)   Buffers: shared hit=53   -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..33402.60 rows=2462443 width=0) (actual time=125.255..125.255 rows=2640751 loops=1)   Index Cond: ((tbl.crt_time >= 2017-07-22 17:59:34 ::timestamp without time zone) AND (tbl.crt_time <= 2017-07-22 17:59:40 ::timestamp without time zone))   Buffers: shared hit=7219   Planning time: 0.160 ms   Execution time: 216.797 ms  (22 rows)

性能如預(yù)期的那樣好，之前解釋過(guò)原因。KNN條件以外的條件已經(jīng)將結(jié)果收斂到7,000條記錄，因此沒(méi)有必要使用包含KNN條件的索引。(即使使用KNN索引也需要195毫秒，因?yàn)橛?0,687條記錄滿足KNN條件。)

校驗(yàn)結(jié)果：

select * from tbl   where   crt_time between 2017-07-22 17:59:34 and 2017-07-22 17:59:40   and (   c1 in (1,2,3,4,100,200,99,88,77,66,55)   or   c2 < 10   )   and   pos <-> point (0,0) < 5;   id | info | crt_time | pos | c1 | c2 | c3  ---------+----------------------------------+----------------------------+----------------------------------------+------+------+-----   104558 | c4699c933d4e2d2a10d828c4ff0b3362 | 2017-07-22 17:59:34.362508 | (4.20534582808614,2.43749532848597) | 99 | 4858 | 543   291950 | 1c2901689ab1eb7653d8ad972f7aa376 | 2017-07-22 17:59:34.776808 | (2.5384977646172,1.09820357523859) | 3 | 2131 | 360   459345 | 9e46548f29d914019ce53a589be8ebac | 2017-07-22 17:59:35.148699 | (0.715781506150961,3.1486327573657) | 1 | 1276 | 8   542633 | c422d6137f9111d5c2dc723b40c7023f | 2017-07-22 17:59:35.334278 | (0.0631888210773468,2.2334903664887) | 4968 | 3 | 245   570570 | fc57bfc6b7781d89b17c90417bd306f7 | 2017-07-22 17:59:35.39653 | (3.14926156774163,1.04107855819166) | 88 | 2560 | 561   865508 | 34509c7f7640afaf288a5e1d38199701 | 2017-07-22 17:59:36.052573 | (3.12869547866285,2.34822122845799) | 2 | 65 | 875   1004806 | afe9f88cbebf615a7ae5f41180c4b33f | 2017-07-22 17:59:36.362027 | (1.13972157239914,3.28763140831143) | 3 | 1639 | 208   1069986 | 6b9f27bfde993fb0bae3336ac010af7a | 2017-07-22 17:59:36.507775 | (4.51995821669698,2.08761331625283) | 2 | 200 | 355   1361182 | 7c4c1c208c2b2b21f00772c43955d238 | 2017-07-22 17:59:37.155127 | (1.7334086727351,2.18367457855493) | 9742 | 0 | 232   1444244 | 41bf6f8e4b89458c13fb408a7db05284 | 2017-07-22 17:59:37.339594 | (0.52773853763938,2.16670122463256) | 1 | 2470 | 820   1850387 | 6e0011c6db76075edd2aa7f81ec94129 | 2017-07-22 17:59:38.243091 | (0.0168232340365648,0.420973123982549) | 100 | 4395 | 321   1989439 | 6211907ac254a4a3ca54f90822a2095e | 2017-07-22 17:59:38.551637 | (0.0274275150150061,0.490507003851235) | 1850 | 5 | 74   2267673 | 898fdd54dcc5b14c27cf1c8b9afe2471 | 2017-07-22 17:59:39.170035 | (0.394239127635956,2.86229319870472) | 2892 | 6 | 917  (13 rows)

分區(qū)索引示例

假設(shè)前面的查詢條件保持不變，使用分區(qū)索引來(lái)測(cè)試性能。

這是為了演示分區(qū)索引的極端效果。在實(shí)際場(chǎng)景中，集合級(jí)別可能沒(méi)有那么高(例如按天集合或按ID散列集合)。只要集合是可能的，就可以展現(xiàn)出色的性能。

postgres=# create index idx_tbl_4 on tbl using gist (pos) where crt_time between 2017-07-22 17:59:34 and 2017-07-22 17:59:40   and (   c1 in (1,2,3,4,100,200,99,88,77,66,55)   or   c2 < 10   ) ;  CREATE INDEX  Time: 8359.330 ms (00:08.359)

重構(gòu)極值KNN優(yōu)化函數(shù)

create or replace function ff(point, float8, int) returns setof record as  $ declare   v_rec record;   v_limit int := $3;  begin   set local enable_seqscan=off; -- Force index that exits when scanned rows reach a specific number   for v_rec in   select *,   (pos <-> $1) as dist   from tbl   where   crt_time between 2017-07-22 17:59:34 and 2017-07-22 17:59:40   and (   c1 in (1,2,3,4,100,200,99,88,77,66,55)   or   c2 < 10   )   order by pos <-> $1   loop   if v_limit <=0 then   -- raise notice "Sufficient data obtained"   return;   end if;   if v_rec.dist > $2 then   -- raise notice "All matching points returned"   return;   else   return next v_rec;   end if;   v_limit := v_limit -1;   end loop;  end;  $  language plpgsql strict volatile;

查詢性能：

postgres=# select * from ff(point (0,0) , 5, 10000000) as t(id int, info text, crt_time timestamp, pos point, c1 int, c2 int, c3 int, dist float8);   id | info | crt_time | pos | c1 | c2 | c3 | dist  ---------+----------------------------------+----------------------------+----------------------------------------+------+------+-----+-------------------   1850387 | 6e0011c6db76075edd2aa7f81ec94129 | 2017-07-22 17:59:38.243091 | (0.0168232340365648,0.420973123982549) | 100 | 4395 | 321 | 0.421309141034319   1989439 | 6211907ac254a4a3ca54f90822a2095e | 2017-07-22 17:59:38.551637 | (0.0274275150150061,0.490507003851235) | 1850 | 5 | 74 | 0.49127323294376   1444244 | 41bf6f8e4b89458c13fb408a7db05284 | 2017-07-22 17:59:37.339594 | (0.52773853763938,2.16670122463256) | 1 | 2470 | 820 | 2.23004532710301   542633 | c422d6137f9111d5c2dc723b40c7023f | 2017-07-22 17:59:35.334278 | (0.0631888210773468,2.2334903664887) | 4968 | 3 | 245 | 2.23438404136508   291950 | 1c2901689ab1eb7653d8ad972f7aa376 | 2017-07-22 17:59:34.776808 | (2.5384977646172,1.09820357523859) | 3 | 2131 | 360 | 2.76586731309247   1361182 | 7c4c1c208c2b2b21f00772c43955d238 | 2017-07-22 17:59:37.155127 | (1.7334086727351,2.18367457855493) | 9742 | 0 | 232 | 2.78803520274409   2267673 | 898fdd54dcc5b14c27cf1c8b9afe2471 | 2017-07-22 17:59:39.170035 | (0.394239127635956,2.86229319870472) | 2892 | 6 | 917 | 2.88931598221975   459345 | 9e46548f29d914019ce53a589be8ebac | 2017-07-22 17:59:35.148699 | (0.715781506150961,3.1486327573657) | 1 | 1276 | 8 | 3.22896754478952   570570 | fc57bfc6b7781d89b17c90417bd306f7 | 2017-07-22 17:59:35.39653 | (3.14926156774163,1.04107855819166) | 88 | 2560 | 561 | 3.31688000783581   1004806 | afe9f88cbebf615a7ae5f41180c4b33f | 2017-07-22 17:59:36.362027 | (1.13972157239914,3.28763140831143) | 3 | 1639 | 208 | 3.47958123047986   865508 | 34509c7f7640afaf288a5e1d38199701 | 2017-07-22 17:59:36.052573 | (3.12869547866285,2.34822122845799) | 2 | 65 | 875 | 3.91188935630676   104558 | c4699c933d4e2d2a10d828c4ff0b3362 | 2017-07-22 17:59:34.362508 | (4.20534582808614,2.43749532848597) | 99 | 4858 | 543 | 4.86069100130757   1069986 | 6b9f27bfde993fb0bae3336ac010af7a | 2017-07-22 17:59:36.507775 | (4.51995821669698,2.08761331625283) | 2 | 200 | 355 | 4.97877009299311  (13 rows)  Time: 0.592 ms

太棒了!查詢時(shí)間從200毫秒減少到1毫秒以內(nèi)。

優(yōu)化方法綜述

優(yōu)化方法回顧：

1. 為不同的數(shù)據(jù)類(lèi)型構(gòu)建不同的索引。

例如，對(duì)空間使用GiST或SP-GiST索引，對(duì)時(shí)間使用B樹(shù)或BRIN索引，對(duì)多個(gè)對(duì)象屬性使用GIN索引。索引的目的是縮小數(shù)據(jù)掃描的范圍。

2. 方法五提到數(shù)據(jù)分區(qū)。

數(shù)據(jù)分區(qū)的目的是有意地組織數(shù)據(jù)，這意味著有意地組織數(shù)據(jù)以滿足搜索需求。例如，如果時(shí)間是必需的查詢條件或公共查詢條件，那么可以按時(shí)間(分區(qū))分割數(shù)據(jù)，以減少需要掃描的數(shù)據(jù)量。

3. 方法六描述了索引分區(qū)。

目的類(lèi)似于方法五。方法五和方法六的區(qū)別在于分區(qū)在索引級(jí)別使用，因此當(dāng)執(zhí)行索引掃描時(shí)，數(shù)據(jù)命中率會(huì)直接提高。

4.方法七中的ctid合并掃描類(lèi)似于PostgreSQL中的多索引bitmapAnd或bitmapOr掃描。

bitmapAnd/bitmapOr跳過(guò)不需要掃描的塊，方法七中的ctid合并掃描跳過(guò)不需要掃描的行。

合并從多個(gè)索引掃描獲得的ctid。跳過(guò)不需要掃描的行數(shù)。

如果當(dāng)其他條件為“AND”時(shí)，過(guò)濾條件可以顯著減少ctid(記錄)，則沒(méi)有必要使用ctid合并掃描。相反，使用FILTER作為另一個(gè)條件。(這將略微增加CPU開(kāi)銷(xiāo)。)

5. 最好的功夫總是以最大的靈活性、自由和對(duì)每一個(gè)動(dòng)作的無(wú)限想象為特征。

PostgreSQL實(shí)現(xiàn)多索引BitmapAnd或BitmapOr掃描，顯著提高了多種條件(索引)下的數(shù)據(jù)命中率。

此外，PostgreSQL具有出色的CBO估計(jì)機(jī)制，它允許PostgreSQL不總是使用位圖合并掃描的所有索引。這也是為什么在“測(cè)試原始SQL查詢的性能——PostgreSQL多索引BitmapAnd位圖或跳過(guò)掃描”一節(jié)中描述的性能更好。

6. 如何實(shí)現(xiàn)極端優(yōu)化

采用方法五或六，并使用可修復(fù)的條件作為分區(qū)鍵來(lái)分區(qū)數(shù)據(jù)或索引。

對(duì)于其他條件，可以使用PostgreSQL中的多索引BitmapAnd或BitmapOr掃描來(lái)提高多條件(索引)的數(shù)據(jù)命中率。

我們可以看到，按照時(shí)間、空間和對(duì)象屬性從5,000萬(wàn)數(shù)據(jù)塊中進(jìn)行多維檢索所需的時(shí)間減少到了0.592毫秒。

7. 對(duì)于空間數(shù)據(jù)，除了使用GiST索引，我們還可以使用BRIN索引，這降低了成本。有條理地組織數(shù)據(jù)后，會(huì)使濾波性能良好。

感謝各位的閱讀，以上就是“數(shù)據(jù)庫(kù)中如何搜索時(shí)空行為數(shù)據(jù)”的內(nèi)容了，經(jīng)過(guò)本文的學(xué)習(xí)后，相信大家對(duì)數(shù)據(jù)庫(kù)中如何搜索時(shí)空行為數(shù)據(jù)這一問(wèn)題有了更深刻的體會(huì)，具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是創(chuàng)新互聯(lián)，小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章，歡迎關(guān)注！

名稱欄目：數(shù)據(jù)庫(kù)中如何搜索時(shí)空行為數(shù)據(jù)
文章位置：http://bm7419.com/article0/jdscoo.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供動(dòng)態(tài)網(wǎng)站、云服務(wù)器、軟件開(kāi)發(fā)、手機(jī)網(wǎng)站建設(shè)、做網(wǎng)站、外貿(mào)建站

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請(qǐng)盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如需處理請(qǐng)聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來(lái)源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容