PostgreSQL怎么支持豐富的NoSQL特性

PostgreSQL怎么支持豐富的NOSQL特性，很多新手對此不是很清楚，為了幫助大家解決這個難題，下面小編將為大家詳細(xì)講解，有這方面需求的人可以來學(xué)習(xí)下，希望你能有所收獲。

創(chuàng)新互聯(lián)是一家專注于網(wǎng)站設(shè)計制作、做網(wǎng)站與策劃設(shè)計,吉首網(wǎng)站建設(shè)哪家好?創(chuàng)新互聯(lián)做網(wǎng)站,專注于網(wǎng)站建設(shè)10余年,網(wǎng)設(shè)計領(lǐng)域的專業(yè)建站公司;建站業(yè)務(wù)涵蓋:吉首等地區(qū)。吉首做網(wǎng)站價格咨詢:13518219792

PostgreSQL不僅是關(guān)系型數(shù)據(jù)庫，同時支持豐富的NoSQL特性，所以小編將介紹PostgreSQL的NoSQL特性。

一、JSON和JSONB數(shù)據(jù)類型

PostgreSQL支持非關(guān)系數(shù)據(jù)類型json (JavaScript Object Notation)，本節(jié)介紹json類型、json與jsonb差異、json與jsonb操作符和函數(shù)以及jsonb鍵值的追加、刪除、更新。

1、JSON類型簡介

PotgreSQL早在9.2版本已經(jīng)提供了json類型，并且隨著大版本的演進(jìn)，PostgreSQL對json的支持趨于完善，例如提供更多的json函數(shù)和操作符方便應(yīng)用開發(fā)，一個簡單的json類型例子如下：

mydb=> SELECT '{"a":1,"b":2}'::json; json --------------- {"a":1,"b":2}

為了更好演示json類型，接下來創(chuàng)建一張表，如下所示：

mydb=> CREATE TABLE test_json1 (id serial primary key,name json);  CREATE TABLE

以上示例定義字段name為json類型，插入表數(shù)據(jù)，如下所示：

mydb=> INSERT INTO test_json1 (name)  VALUES ('{"col1":1,"col2":"francs","col3":"male"}');  INSERT 0 1  mydb=> INSERT INTO test_json1 (name)  VALUES ('{"col1":2,"col2":"fp","col3":"female"}');  INSERT 0 1

查詢表test_json1數(shù)據(jù)：

mydb=> SELECT * FROM test_json1;  id | name  ----+------------------------------------------  1 | {"col1":1,"col2":"francs","col3":"male"}  2 | {"col1":2,"col2":"fp","col3":"female"}

2、查詢JSON數(shù)據(jù)

通過->操作符可以查詢json數(shù)據(jù)的鍵值，如下所示：

mydb=> SELECT name -> 'col2' FROM test_json1 WHERE id=1;  ?column?  ----------  "francs"  (1 row)

如果想以文本格式返回json字段鍵值可以使用->>符，如下所示：

mydb=> SELECT name ->> 'col2' FROM test_json1 WHERE id=1;  francs  (1 row)

3、JSONB與JSON差異

PostgreSQL支持兩種JSON數(shù)據(jù)類型：json和jsonb，兩種類型在使用上幾乎完全相同，主要區(qū)別如下：

json存儲格式為文本，而jsonb存儲格式為二進(jìn)制，由于存儲格式的不同使得兩種json數(shù)據(jù)類型的處理效率不一樣，json類型以文本存儲并且存儲的內(nèi)容和輸入數(shù)據(jù)一樣，當(dāng)檢索json數(shù)據(jù)時必須重新解析，而jsonb以二進(jìn)制形式存儲已解析好的數(shù)據(jù)，當(dāng)檢索jsonb數(shù)據(jù)時不需要重新解析，因此json寫入比jsonb快，但檢索比jsonb慢，后面會通過測試驗(yàn)證兩者讀寫性能差異。

除了上述介紹的區(qū)別之外，json與jsonb在使用過程中還存在差異，例如jsonb輸出的鍵的順序和輸入不一樣，如下所示：

mydb=> SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::jsonb; jsonb -------------------------------------------------- {"bar": "baz", "active": false, "balance": 7.77} (1 row)

而json的輸出鍵的順序和輸入完全一樣，如下所示：

mydb=> SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::json;  json  -------------------------------------------------  {"bar": "baz", "balance": 7.77, "active":false}  (1 row)

另外，jsonb類型會去掉輸入數(shù)據(jù)中鍵值的空格，如下所示：

mydb=> SELECT ' {"id":1, "name":"francs"}'::jsonb;  jsonb  -----------------------------  {"id": 1, "name": "francs"}  (1 row)

上例中id鍵與name鍵輸入時是有空格的，輸出顯示空格鍵被刪除，而json的輸出和輸入一樣，不會刪掉空格鍵：

mydb=> SELECT ' {"id":1, "name":"francs"}'::json; json ------------------------------- {"id":1, "name":"francs"} (1 row)

另外，jsonb會刪除重復(fù)的鍵，僅保留***一個，如下所示：

mydb=> SELECT ' {"id":1,  "name":"francs",  "remark":"a good guy!",  "name":"test"  }'::jsonb;  jsonb  ----------------------------------------------------  {"id": 1, "name": "test", "remark": "a good guy!"}  (1 row)

上面name鍵重復(fù)，僅保留***一個name鍵的值，而json數(shù)據(jù)類型會保留重復(fù)的鍵值。

相比json大多數(shù)應(yīng)用場景建議使用jsonb，除非有特殊的需求，比如對json的鍵順序有特殊的要求。

4、JSONB與JSON操作符

PostgreSQL支持豐富的JSONB和JSON的操作符，舉例如下：

以文本格式返回json類型的字段鍵值可以使用->>符，如下所示：

字符串是否作為頂層鍵值，如下所示：

mydb=> SELECT '{"a":1, "b":2}'::jsonb ? 'a';  t  (1 row)

刪除json數(shù)據(jù)的鍵/值，如下所示：

mydb=> SELECT '{"a":1, "b":2}'::jsonb - 'a';  {"b": 2}  (1 row)

5、JSONB與JSON函數(shù)

json與jsonb相關(guān)的函數(shù)非常豐富，舉例如下：

擴(kuò)展最外層的json對象成為一組鍵/值結(jié)果集，如下所示：

mydb=> SELECT * FROM json_each('{"a":"foo", "b":"bar"}');  key | value  -----+-------  a | "foo"  b | "bar"  (2 rows)

以文本形式返回結(jié)果，如下所示：

mydb=> SELECT * FROM json_each_text('{"a":"foo", "b":"bar"}');  a | foo  b | bar  (2 rows)

一個非常重要的函數(shù)為row_to_json函數(shù)，能夠?qū)⑿凶鳛閖son對象返回，此函數(shù)常用來生成json測試數(shù)據(jù)，比如將一個普通表轉(zhuǎn)換成json類型表：

mydb=> SELECT * FROM test_copy WHERE id=1;  id | name  ----+------  1 | a  (1 row)  mydb=> SELECT row_to_json(test_copy) FROM test_copy WHERE id=1;  row_to_json  ---------------------  {"id":1,"name":"a"}  (1 row)

返回最外層的json對像中的鍵的集合，如下所示：

mydb=> SELECT * FROM json_object_keys('{"a":"foo", "b":"bar"}');  json_object_keys  ------------------  a  b  (2 rows)

6、jsonb鍵/值的追加、刪除、更新

jsonb鍵/值追加可通過||操作符，如下增加sex鍵/值：

mydb=> SELECT '{"name":"francs","age":"31"}'::jsonb ||  '{"sex":"male"}'::jsonb;  ?column?  ------------------------------------------------  {"age": "31", "sex": "male", "name": "francs"}  (1 row)

jsonb鍵/值的刪除有兩種方法，一種是通過操作符號-刪除，另一種通過操作符#-刪除指定鍵/值。

通過操作符號-刪除鍵/值如下：

mydb=> SELECT '{"name": "James", "email": "james@localhost"}'::jsonb  - 'email';  ?column?  -------------------  {"name": "James"}  (1 row)  mydb=> SELECT '["red","green","blue"]'::jsonb - 0;  ["green", "blue"]

第二種方法是通過操作符#-刪除指定鍵/值，通常用于有嵌套json數(shù)據(jù)刪除的場景，如下刪除嵌套contact中的fax鍵/值：

mydb=> SELECT '{"name": "James", "contact": {"phone": "01234 567890", "fax": "01987 543210"}}'::jsonb #- '{contact,fax}'::text;  ?column?  ---------------------------------------------------------  {"name": "James", "contact": {"phone": "01234 567890"}}  (1 row)

刪除嵌套aliases中的位置為1的鍵/值，如下所示：

mydb=> SELECT '{"name": "James", "aliases": ["Jamie","The Jamester","J Man"]}'::jsonb #- '{aliases,1}'::text;  {"name": "James", "aliases": ["Jamie", "J Man"]}  (1 row)

鍵/值的更新也有兩種方式，***種方式為||操作符，||操作符可以連接json鍵，也可覆蓋重復(fù)的鍵值，如下修改age鍵的值：

mydb=> SELECT '{"name":"francs","age":"31"}'::jsonb ||  '{"age":"32"}'::jsonb;  ?column?  ---------------------------------  {"age": "32", "name": "francs"}  (1 row)

第二種方式是通過jsonb_set函數(shù)，語法如下：

jsonb_set(target jsonb, path text[], new_value jsonb[, create_missing boolean])

target指源jsonb數(shù)據(jù)，path指路徑，new_value指更新后的鍵值，create_missing 值為 true表示如果鍵不存在則添加，create_missing 值為 false表示如果鍵不存在則不添加，示例如下：

mydb=> SELECT jsonb_set('{"name":"francs","age":"31"}'::jsonb,'{age}','"32"'::jsonb,false);  jsonb_set  mydb=> SELECT jsonb_set('{"name":"francs","age":"31"}'::jsonb,'{sex}','"male"'::jsonb,true);

7、給JSONB類型創(chuàng)建索引

這一小節(jié)介紹給jsonb數(shù)據(jù)類型創(chuàng)建索引，jsonb數(shù)據(jù)類型支持GIN索引，為了便于說明，假如一個json字段內(nèi)容如下，并且以jsonb格式存儲。

{  "id": 1,  "user_id": 1440933,  "user_name": "1_francs",  "create_time": "2017-08-03 16:22:05.528432+08"  }

假如存儲以上jsonb數(shù)據(jù)的字段名為user_info，表名為tbl_user_jsonb，在user_info字段上創(chuàng)建GIN索引語法如下：

CREATE INDEX idx_gin ON tbl_user_jsonb USING gin(user_info);

jsonb上的GIN索引支持@>、?、 ?&、?|操作符，例如以下查詢將會使用索引：

SELECT * FROM tbl_user_jsonb WHERE user_info @> '{"user_name": "1_frans"}'

但是以下基于jsonb鍵值的查詢不會走索引idx_gin，如下所示：

SELECT * FROM tbl_user_jsonb WHERE user_info->>'user_name'= '1_francs';

如果要想提升基于jsonb類型的鍵值檢索效率，可以在jsonb數(shù)據(jù)類型對應(yīng)的鍵值上創(chuàng)建索引，如下所示：

CREATE INDEX idx_gin_user_infob_user_name ON tbl_user_jsonb USING btree ((user_info ->> 'user_name'));

創(chuàng)建以上索引后，上述根據(jù)user_info->>'user_name'鍵值查詢的SQL將會走索引。

二、JSON與JSONB讀寫性能測試

前面介紹了jsonb數(shù)據(jù)類型索引創(chuàng)建相關(guān)內(nèi)容，本部分將對json、jsonb讀寫性能進(jìn)行簡單對比。json與jsonb讀寫性能存在差異，主要表現(xiàn)為json寫入時比jsonb快，但檢索時比jsonb慢，主要原因?yàn)椋?/p>

json存儲格式為文本，而jsonb存儲格式為二進(jìn)制，存儲格式的不同使得兩種json數(shù)據(jù)類型的處理效率不一樣，json類型存儲的內(nèi)容和輸入數(shù)據(jù)一樣，當(dāng)檢索json數(shù)據(jù)時必須重新解析，而jsonb以二進(jìn)制形式存儲已解析好的數(shù)據(jù)，當(dāng)檢索jsonb數(shù)據(jù)時不需要重新解析。

1、構(gòu)建JSON、JSONB測試表

下面通過一個簡單的例子測試下json、jsonb的讀寫性能差異，計劃創(chuàng)建以下三張表：

quser_ini：基礎(chǔ)數(shù)據(jù)表，并插入200萬測試數(shù)據(jù)；
qtbl_user_json: json 數(shù)據(jù)類型表，200萬數(shù)據(jù)；
qtbl_user_jsonb：jsonb 數(shù)據(jù)類型表，200萬數(shù)據(jù)。

首先創(chuàng)建user_ini表并插入200萬測試數(shù)據(jù)，如下：

mydb=> CREATE TABLE user_ini(id int4 ,user_id int8, user_name character varying(64),create_time timestamp(6) with time zone default clock_timestamp);  CREATE TABLE  mydb=> INSERT INTO user_ini(id,user_id,user_name)  SELECT r,round(random*2000000), r || '_francs' FROM generate_series(1,2000000) as r;  INSERT 0 2000000

計劃使用user_ini表數(shù)據(jù)生成json、jsonb數(shù)據(jù)，創(chuàng)建user_ini_json、user_ini_jsonb表，如下所示：

mydb=> CREATE TABLE tbl_user_json(id serial, user_info json);  CREATE TABLE  mydb=> CREATE TABLE tbl_user_jsonb(id serial, user_info jsonb);  CREATE TABLE

2、JSON與JSONB表寫性能測試

根據(jù)user_ini數(shù)據(jù)通過row_to_json函數(shù)向表user_ini_json插入200萬json數(shù)據(jù)，如下：

mydb=> iming  Timing is on.  mydb=> INSERT INTO tbl_user_json(user_info) SELECT row_to_json(user_ini)  FROM user_ini;  INSERT 0 2000000  Time: 13825.974 ms (00:13.826)

從以上結(jié)果看出tbl_user_json插入200萬數(shù)據(jù)花了13秒左右；接著根據(jù)user_ini表數(shù)據(jù)生成200萬jsonb數(shù)據(jù)并插入表tbl_user_jsonb，如下：

mydb=> INSERT INTO tbl_user_jsonb(user_info)  SELECT row_to_json(user_ini)::jsonb FROM user_ini;  INSERT 0 2000000  Time: 20756.993 ms (00:20.757)

從以上看出tbl_user_jsonb表插入200萬jsonb數(shù)據(jù)花了20秒左右，正好驗(yàn)證了json數(shù)據(jù)寫入比jsonb快，比較兩表占用空間大小，如下所示：

mydb=> dt+ tbl_user_json  List of relations  Schema | Name | Type | Owner | Size | Description  --------+---------------+-------+--------+--------+-------------  pguser | tbl_user_json | table | pguser | 281 MB |  (1 row)  mydb=> dt+ tbl_user_jsonb  --------+----------------+-------+--------+--------+-------------  pguser | tbl_user_jsonb | table | pguser | 333 MB |  (1 row)

從占用空間來看，同樣的數(shù)據(jù)量jsonb數(shù)據(jù)類型占用空間比json稍大。

查詢tbl_user_json表的一條測試數(shù)據(jù)，如下：

mydb=> SELECT * FROM tbl_user_json LIMIT 1;  id | user_info  ---------+------------------------------------------------------------------------------------  2000001 | {"id":1,"user_id":1182883,"user_name":"1_francs","create_time":"2017-08-03T20:59:27.42741+08:00"}  (1 row)

3、JSON與JSONB表讀性能測試

對于json、jsonb讀性能測試我們選擇基于json、jsonb鍵值查詢的場景，例如，根據(jù)user_info字段的user_name鍵的值查詢，如下所示：

mydb=> EXPLAIN ANALYZE SELECT * FROM tbl_user_jsonb WHERE user_info->>'user_name'='1_francs';  QUERY PLAN  -------------------------------------------------------------------------------------  Seq Scan on tbl_user_jsonb (cost=0.00..72859.90 rows=10042 width=143) (actual time=0.023..524.843 rows=1 loops=1)  Filter: ((user_info ->> 'user_name'::text) = '1_francs'::text)  Rows Removed by Filter: 1999999  Planning time: 0.091 ms  Execution time: 524.876 ms  (5 rows)

上述SQL執(zhí)行時間為524毫秒左右，基于user_info字段的user_name鍵值創(chuàng)建btree索引如下：

mydb=> CREATE INDEX idx_jsonb ON tbl_user_jsonb USING btree ((user_info->>'user_name'));

再次執(zhí)行上述查詢，如下所示：

Bitmap Heap Scan on tbl_user_jsonb (cost=155.93..14113.93 rows=10000 width=143) (actual time=0.027..0.027 rows=1 loops=1)  Recheck Cond: ((user_info ->> 'user_name'::text) = '1_francs'::text)  Heap Blocks: exact=1  -> Bitmap Index Scan on idx_jsonb (cost=0.00..153.43 rows=10000 width=0) (actual time=0.021..0.021 rows=1 loops=1)  Index Cond: ((user_info ->> 'user_name'::text) = '1_francs'::text)  Planning time: 0.091 ms  Execution time: 0.060 ms  (7 rows)

根據(jù)上述執(zhí)行計劃看出走了索引，并且SQL時間下降到0.060ms。為更好地對比tbl_user_json、tbl_user_jsonb表基于鍵值查詢的效率，計劃根據(jù)user_info字段id鍵進(jìn)行范圍掃描對比性能，創(chuàng)建索引如下：

mydb=> CREATE INDEX idx_gin_user_info_id ON tbl_user_json USING btree  (((user_info ->> 'id')::integer));  CREATE INDEX  mydb=> CREATE INDEX idx_gin_user_infob_id ON tbl_user_jsonb USING btree

索引創(chuàng)建后，查詢tbl_user_json表如下：

mydb=> EXPLAIN ANALYZE SELECT id,user_info->'id',user_info->'user_name' FROM tbl_user_json  WHERE (user_info->>'id')::int4>1 AND (user_info->>'id')::int4<10000;  Bitmap Heap Scan on tbl_user_json (cost=166.30..14178.17 rows=10329 width=68) (actual time=1.167..26.534 rows=9998 loops=1)  Recheck Cond: ((((user_info ->> 'id'::text))::integer > 1) AND (((user_info ->> 'id'::text))::integer < 10000))  Heap Blocks: exact=338  -> Bitmap Index Scan on idx_gin_user_info_id (cost=0.00..163.72 rows=10329 width=0) (actual time=1.110..1.110 rows=19996 loops= 1)  Index Cond: ((((user_info ->> 'id'::text))::integer > 1) AND (((user_info ->> 'id'::text))::integer < 10000)) Planning time: 0.094 ms  Execution time: 27.092 ms  (7 rows)

根據(jù)以上看出，查詢表tbl_user_json的user_info字段id鍵值在1到10000范圍內(nèi)的記錄走了索引，并且執(zhí)行時間為27.092毫秒，接著測試tbl_user_jsonb表同樣SQL的檢索性能，如下所示：

mydb=> EXPLAIN ANALYZE SELECT id,user_info->'id',user_info->'user_name'  FROM tbl_user_jsonb  Bitmap Heap Scan on tbl_user_jsonb (cost=158.93..14316.93 rows=10000 width=68) (actual time=1.140..8.116 rows=9998 loops=1)  Heap Blocks: exact=393  -> Bitmap Index Scan on idx_gin_user_infob_id (cost=0.00..156.43 rows=10000 width=0) (actual time=1.058..1.058 rows=18992 loops =1)  Planning time: 0.104 ms  Execution time: 8.656 ms  (7 rows)

根據(jù)以上看出，查詢表tbl_user_jsonb的user_info字段id鍵值在1到10000范圍內(nèi)的記錄走了索引并且執(zhí)行時間為8.656毫秒，從這個測試看出jsonb檢索比json效率高。

從以上兩個測試看出，正好驗(yàn)證了“json寫入比jsonb快，但檢索時比jsonb慢”的觀點(diǎn)，值得一提的是如果需要通過key/value進(jìn)行檢索，例如以下：

SELECT * FROM tbl_user_jsonb WHERE user_info @> '{"user_name": "2_francs"}';

這時執(zhí)行計劃為全表掃描，如下所示：

mydb=> EXPLAIN ANALYZE SELECT * FROM tbl_user_jsonb WHERE user_info @> '{"user_name": "2_francs"}';  QUERY PLAN  ------------------------------------------------------------------------------------  Seq Scan on tbl_user_jsonb (cost=0.00..67733.00 rows=2000 width=143) (actual time=0.018..582.207 rows=1 loops=1)  Filter: (user_info @> '{"user_name": "2_francs"}'::jsonb)  Rows Removed by Filter: 1999999  Planning time: 0.065 ms  Execution time: 582.232 ms  (5 rows)

從以上看出執(zhí)行時間為582毫秒左右，在tbl_user_jsonb字段user_info上創(chuàng)建gin索引，如下所示：

mydb=> CREATE INDEX idx_tbl_user_jsonb_user_Info ON tbl_user_jsonb USING gin(user_Info); CREATE INDEX

索引創(chuàng)建后，再次執(zhí)行以下，如下所示：

Bitmap Heap Scan on tbl_user_jsonb (cost=37.50..3554.34 rows=2000 width=143) (actual time=0.079..0.080 rows=1 loops=1)  Recheck Cond: (user_info @> '{"user_name": "2_francs"}'::jsonb)  Heap Blocks: exact=1  -> Bitmap Index Scan on idx_tbl_user_jsonb_user_info (cost=0.00..37.00 rows=2000 width=0) (actual time=0.069..0.069 rows=1 loops=1)  Index Cond: (user_info @> '{"user_name": "2_francs"}'::jsonb)  Planning time: 0.094 ms  Execution time: 0.114 ms  (7 rows)

從以上看出走了索引，并且執(zhí)行時間下降到了0.114毫秒。

這部分內(nèi)容測試了json、jsonb數(shù)據(jù)類型讀寫性能差異，驗(yàn)證了json寫入時比jsonb快，但檢索時比jsonb慢的觀點(diǎn)。

三、全文檢索支持JSON和JSONB

接下來我們來介紹PostgreSQL 10的一個新特性：全文檢索支持json、jsonb數(shù)據(jù)類型。這部分我們會分兩部分來說明，***部分簡單介紹PostgreSQL全文檢索，第二部分演示全文檢索對json、jsonb數(shù)據(jù)類型的支持。

1、PostgreSQL全文檢索簡介

對于大多數(shù)應(yīng)用全文檢索很少放到數(shù)據(jù)庫中實(shí)現(xiàn)，一般使用單獨(dú)的全文檢索引擎，例如基于SQL全文檢索引擎Sphinx。PostgreSQL支持全文檢索，對于規(guī)模不大的應(yīng)用如果不想搭建專門的搜索引擎，PostgreSQL的全文檢索也可以滿足需求。

如果沒有使用專門的搜索引擎，大部檢索需要通過數(shù)據(jù)庫like操作匹配，這種檢索方式主要缺點(diǎn)在于：

不能很好的支持索引，通常需全表掃描檢索數(shù)據(jù)，數(shù)據(jù)量大時檢索性能很低；
不提供檢索結(jié)果排序，當(dāng)輸出結(jié)果數(shù)據(jù)量非常大時表現(xiàn)更加明顯。

PostgreSQL全文檢索能有效地解決這個問題，PostgreSQL全文檢索通過以下兩種數(shù)據(jù)類型來實(shí)現(xiàn)。

Tsvector

tsvector全文檢索數(shù)據(jù)類型代表一個被優(yōu)化的可以基于搜索的文檔，將一串字符串轉(zhuǎn)換成tsvector全文檢索數(shù)據(jù)類型，如下：

mydb=> SELECT 'Hello,cat,how are u? cat is smiling! '::tsvector;  tsvector  --------------------------------------------------  'Hello,cat,how' 'are' 'cat' 'is' 'smiling!' 'u?'  (1 row)

可以看到，字符串的內(nèi)容被分隔成好幾段，但通過::tsvector只是做類型轉(zhuǎn)換，沒有進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化處理，對于英文全文檢索可通過函數(shù)to_tsvector進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化，如下所示：

mydb=> SELECT to_tsvector('english','Hello cat,');  to_tsvector  -------------------  'cat':2 'hello':1  (1 row)

Tsquery

tsquery表示一個文本查詢，存儲用于搜索的詞，并且支持布爾操作&、|、!，將字符串轉(zhuǎn)換成tsquery，如下所示：

mydb=> SELECT 'hello&cat'::tsquery;  tsquery  -----------------  'hello' & 'cat'  (1 row)

上述只是轉(zhuǎn)換成tsquery類型，而并沒有做標(biāo)準(zhǔn)化，使用to_tsquery函數(shù)可以執(zhí)行標(biāo)準(zhǔn)化，如下所示：

mydb=> SELECT to_tsquery( 'hello&cat' );

to_tsquery

一個全文檢索示例如下，檢索字符串是否包括hello和cat字符，本例中返回真。

mydb=> SELECT to_tsvector('english','Hello cat,how are u') @@to_tsquery( 'hello&cat' );

檢索字符串是否包含字符hello和dog，本例中返回假。

mydb=> SELECT to_tsvector('english','Hello cat,how are u') @@ to_tsquery( 'hello&dog' );  f  (1 row)

有興趣的讀者可以測試tsquery的其他操作符，例如|、!等。

注意：這里使用了帶雙參數(shù)的to_tsvector函數(shù)，函數(shù)to_tsvector雙參數(shù)的格式如下：

to_tsvector([ config regconfig , ] document text)，本節(jié)to_tsvector函數(shù)指定了config參數(shù)為english，如果不指定config參數(shù)，則默認(rèn)使用default_text_search_config參數(shù)的配置。

英文全文檢索例子

下面演示一個英文全文檢索示例，創(chuàng)建一張測試表并插入200萬測試數(shù)據(jù)，如下所示：

mydb=> CREATE TABLE test_search(id int4,name text);  CREATE TABLE  mydb=> INSERT INTO test_search(id,name) SELECT n, n||'_francs'  FROM generate_series(1,2000000) n;  INSERT 0 2000000

執(zhí)行以下SQL，查詢test_search表name字段包含字符1_francs的記錄。

mydb=> SELECT * FROM test_search WHERE name LIKE '1_francs';  id | name  ----+----------  1 | 1_francs  (1 row)

執(zhí)行計劃如下：

mydb=> EXPLAIN ANALYZE SELECT * FROM test_search WHERE name LIKE '1_francs';  QUERY PLAN  -------------------------------------------------------------------------------------Seq Scan on test_search (cost=0.00..38465.04 rows=204 width=18) (actual time=0.022..261.766 rows=1 loops=1)  Filter: (name ~~ '1_francs'::text)  Rows Removed by Filter: 1999999  Planning time: 0.101 ms  Execution time: 261.796 ms  (5 rows)

以上執(zhí)行計劃走了全表掃描，執(zhí)行時間為261毫秒左右，性能很低，接著創(chuàng)建索引，如下所示：

mydb=> CREATE INDEX idx_gin_search ON test_search USING gin (to_tsvector('english',name)); mydb=> SELECT * FROM test_search WHERE to_tsvector('english',name) @@ to_tsquery('english','1_francs');

再次查看執(zhí)行計劃和執(zhí)行時間，如下所示：

mydb=> EXPLAIN ANALYZE SELECT * FROM test_search WHERE to_tsvector('english',name) @@ Bitmap Heap Scan on test_search (cost=18.39..128.38 rows=50 width=36) (actual time=0.071..0.071 rows=1 loops=1)  Recheck Cond: (to_tsvector('english'::regconfig, name) @@ '''1'' & ''franc'''::tsquery)  Heap Blocks: exact=1  -> Bitmap Index Scan on idx_gin_search (cost=0.00..18.38 rows=50 width=0) (actual time=0.064..0.064 rows=1 loops=1)  Index Cond: (to_tsvector('english'::regconfig, name) @@ '''1'' & ''franc'''::tsquery)  Planning time: 0.122 ms  Execution time: 0.104 ms  (7 rows)

創(chuàng)建索引后，以上查詢走了索引并且執(zhí)行時間下降到0.104毫秒，性能提升了3個數(shù)量級，值得一提的是如果SQL改成以下，則不走索引，如下所示：

mydb=> EXPLAIN ANALYZE SELECT * FROM test_search WHERE to_tsvector(name) @@ to_tsquery('1_francs');  Seq Scan on test_search (cost=0.00..1037730.00 rows=50 width=18) (actual time=0.036..10297.764 rows=1 loops=1)  Filter: (to_tsvector(name) @@ to_tsquery('1_francs'::text))  Rows Removed by Filter: 1999999  Planning time: 0.098 ms  Execution time: 10297.787 ms  (5 rows)

由于創(chuàng)建索引時使用的是to_tsvector('english',name)函數(shù)索引，帶了兩個參數(shù)，因此where條件中的to_tsvector函數(shù)帶兩個參數(shù)才能走索引，而to_tsvector(name)不走索引。

2、JSON、JSONB全文檢索實(shí)踐

在PostgreSQL 10版本之前全文檢索不支持json和jsonb數(shù)據(jù)類型，10版本的一個重要特性是全文檢索支持json和jsonb數(shù)據(jù)類型。

10版本與9.6版本to_tsvector函數(shù)的差異

先來看下9.6版本to_tsvector函數(shù)，如下：

[postgres@pghost1 ~]$ psql francs francs  psql (9.6.3)  Type "help" for help.  mydb=> df *to_tsvector*  List of functions  Schema | Name | Result data type | Argument data types | Type  ------------+-------------------+------------------+---------------------+--------  pg_catalog | array_to_tsvector | tsvector | text | normal  pg_catalog | to_tsvector | tsvector | regconfig, text | normal  pg_catalog | to_tsvector | tsvector | text | normal  (3 rows)

從以上看出9.6版本to_tsvector函數(shù)的輸入?yún)?shù)僅支持text、text數(shù)據(jù)類型，接著看下10版本的to_tsvector函數(shù)，如下所示：

[postgres@pghost1 ~]$ psql mydb pguser  psql (10.0)  pg_catalog | to_tsvector | tsvector | json | normal  pg_catalog | to_tsvector | tsvector | jsonb | normal  pg_catalog | to_tsvector | tsvector | regconfig, json | normal  pg_catalog | to_tsvector | tsvector | regconfig, jsonb | normal

從以上看出，10版本的to_tsvector函數(shù)支持的數(shù)據(jù)類型增加了json和jsonb。

創(chuàng)建數(shù)據(jù)生成函數(shù)

為了便于生成測試數(shù)據(jù)，創(chuàng)建以下兩個函數(shù)用來隨機(jī)生成指定長度的字符串，創(chuàng)建random_range(int4, int4)函數(shù)如下：

CREATE OR REPLACE FUNCTION random_range(int4, int4)  RETURNS int4  LANGUAGE SQL  AS $$  SELECT ($1 + FLOOR(($2 - $1 + 1) * random ))::int4;  $$;

接著創(chuàng)建random_text_simple(length int4)函數(shù)，此函數(shù)會調(diào)用random_range(int4, int4)函數(shù)。

CREATE OR REPLACE FUNCTION random_text_simple(length int4)  RETURNS text  LANGUAGE PLPGSQL  AS $$  DECLARE  possible_chars text := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';  output text := '';  i int4;  pos int4;  BEGIN  FOR i IN 1..length LOOP  pos := random_range(1, length(possible_chars));  output := output || substr(possible_chars, pos, 1);  END LOOP;  RETURN output;  END; $$;

random_text_simple(length int4)函數(shù)可以隨機(jī)生成指定長度字符串，如下隨機(jī)生成含三位字符的字符串：

mydb=> SELECT random_text_simple(3);  random_text_simple  --------------------  LL9  (1 row)

隨機(jī)生成含六位字符的字符串，如下所示：

mydb=> SELECT random_text_simple(6);  B81BPW  (1 row)

后面會用到這個函數(shù)生成測試數(shù)據(jù)。

創(chuàng)建JSON測試表

創(chuàng)建user_ini測試表，并通過random_text_simple(length int4)函數(shù)插入100萬隨機(jī)生成六位字符的字符串測試數(shù)據(jù)，如下所示：

mydb=> CREATE TABLE user_ini(id int4 ,user_id int8,  user_name character varying(64),  create_time timestamp(6) with time zone default clock_timestamp);  SELECT r,round(random*1000000), random_text_simple(6)  FROM generate_series(1,1000000) as r;  INSERT 0 1000000

創(chuàng)建tbl_user_search_json表，并通過row_to_json函數(shù)將表user_ini行數(shù)據(jù)轉(zhuǎn)換成json數(shù)據(jù)，如下所示：

mydb=> CREATE TABLE tbl_user_search_json(id serial, user_info json);  CREATE TABLE  mydb=> INSERT INTO tbl_user_search_json(user_info)  SELECT row_to_json(user_ini) FROM user_ini;  INSERT 0 1000000

生成的數(shù)據(jù)如下：

mydb=> SELECT * FROM tbl_user_search_json LIMIT 1;  id | user_info  ----+-----------------------------------------------------------------------------------------------  1 | {"id":1,"user_id":186536,"user_name":"KTU89H","create_time":"2017-08-05T15:59:25.359148+08:00"}  (1 row)

JSON數(shù)據(jù)全文檢索測試

使用全文檢索查詢表tbl_user_search_json的user_info字段中包含KTU89H字符的記錄，如下所示：

mydb=> SELECT * FROM tbl_user_search_json  WHERE to_tsvector('english',user_info) @@ to_tsquery('ENGLISH','KTU89H');  id | user_info  ----+----------------------------------------------------------------------------------------

以上SQL能正常執(zhí)行說明全文檢索支持json數(shù)據(jù)類型，只是上述SQL走了全表掃描性能低，執(zhí)行時間為8061毫秒，如下所示：

mydb=> EXPLAIN ANALYZE SELECT * FROM tbl_user_search_json  -----------------------------------------------------------------------------------  Seq Scan on tbl_user_search_json (cost=0.00..279513.00 rows=5000 width=104) (actual time=0.046..8061.858 rows=1 loops=1)  Filter: (to_tsvector('english'::regconfig, user_info) @@ '''ktu89h'''::tsquery)  Rows Removed by Filter: 999999  Planning time: 0.091 ms  Execution time: 8061.880 ms  (5 rows)

創(chuàng)建如下索引：

mydb=> CREATE INDEX idx_gin_search_json ON tbl_user_search_json USING  gin(to_tsvector('english',user_info));  CREATE INDEX

索引創(chuàng)建后，再次執(zhí)行以下SQL，如下所示：

mydb=> EXPLAIN ANALYZE SELECT * FROM tbl_user_search_json WHERE to_tsvector('english',user_info) @@ to_tsquery('ENGLISH','KTU89H');  Bitmap Heap Scan on tbl_user_search_json (cost=50.75..7876.06 rows=5000 width=104) (actual time=0.024..0.024 rows=1 loops=1)  Recheck Cond: (to_tsvector('english'::regconfig, user_info) @@ '''ktu89h'''::tsquery)  Heap Blocks: exact=1  -> Bitmap Index Scan on idx_gin_search_json (cost=0.00..49.50 rows=5000 width=0) (actual time=0.018..0.018 rows=1 loops=1)  Index Cond: (to_tsvector('english'::regconfig, user_info) @@ '''ktu89h'''::tsquery)  Planning time: 0.113 ms  Execution time: 0.057 ms  (7 rows)

從上述執(zhí)行計劃看出走了索引，并且執(zhí)行時間降為0.057毫秒，性能非常不錯。

這一小節(jié)前一部分對PostgreSQL全文檢索的實(shí)現(xiàn)做了簡單介紹，并且給出了一個英文檢索的例子，后一部分通過示例介紹了PostgreSQL10的一個新特性，即全文檢索支持json、jsonb類型。

看完上述內(nèi)容是否對您有幫助呢？如果還想對相關(guān)知識有進(jìn)一步的了解或閱讀更多相關(guān)文章，請關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道，感謝您對創(chuàng)新互聯(lián)的支持。

文章題目：PostgreSQL怎么支持豐富的NoSQL特性
本文地址：http://bm7419.com/article44/gipshe.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供手機(jī)網(wǎng)站建設(shè)、面包屑導(dǎo)航、Google、服務(wù)器托管、虛擬主機(jī)、品牌網(wǎng)站設(shè)計

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請盡快告知，我們將會在第一時間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容