ES详解 - 查询:DSL查询之Term详解
约 1063 字大约 4 分钟
ES详解 - 查询:DSL查询之Term详解
DSL查询另一种极为常用的是对词项进行搜索,官方文档中叫”term level“查询,本文主要对term level搜索进行详解。@pdai
Term查询引入
如前文所述,查询分基于文本查询和基于词项的查询:
data:image/s3,"s3://crabby-images/ddd59/ddd597c246087fe302838863c148d4a5dd204055" alt="es-dsl-full-text-3.png"
本文主要讲基于词项的查询。
data:image/s3,"s3://crabby-images/c3c23/c3c235e05cab8fdfb0749c6d93b6169a2c9831d1" alt="es-dsl-term-1.png"
Term查询
很多比较常用,也不难,就是需要结合实例理解。这里综合官方文档的内容,我设计一个测试场景的数据,以覆盖所有例子。@pdai
准备数据
PUT /test-dsl-term-level
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"programming_languages": {
"type": "keyword"
},
"required_matches": {
"type": "long"
}
}
}
}
POST /test-dsl-term-level/_bulk
{ "index": { "_id": 1 }}
{"name": "Jane Smith", "programming_languages": [ "c++", "java" ], "required_matches": 2}
{ "index": { "_id": 2 }}
{"name": "Jason Response", "programming_languages": [ "java", "php" ], "required_matches": 2}
{ "index": { "_id": 3 }}
{"name": "Dave Pdai", "programming_languages": [ "java", "c++", "php" ], "required_matches": 3, "remarks": "hello world"}
字段是否存在:exist
由于多种原因,文档字段的索引值可能不存在:
- 源JSON中的字段是null或[]
- 该字段已"index" : false在映射中设置
- 字段值的长度超出ignore_above了映射中的设置
- 字段值格式错误,并且ignore_malformed已在映射中定义
所以exist表示查找是否存在字段。
data:image/s3,"s3://crabby-images/081bd/081bd9a91598e8ee7e051a4c013520a66c12941c" alt="es-dsl-term-2.png"
id查询:ids
ids 即对id查找
GET /test-dsl-term-level/_search
{
"query": {
"ids": {
"values": [3, 1]
}
}
}
data:image/s3,"s3://crabby-images/8a770/8a77007f7c819bbd3fe5b03e1cde0a8cce45f126" alt="es-dsl-term-3.png"
前缀:prefix
通过前缀查找某个字段
GET /test-dsl-term-level/_search
{
"query": {
"prefix": {
"name": {
"value": "Jan"
}
}
}
}
data:image/s3,"s3://crabby-images/48b1b/48b1b54f9ad947e3105b3e3a7d22e76d425614cc" alt="es-dsl-term-4.png"
分词匹配:term
前文最常见的根据分词查询
GET /test-dsl-term-level/_search
{
"query": {
"term": {
"programming_languages": "php"
}
}
}
data:image/s3,"s3://crabby-images/879a2/879a2cc29fb564f1e6e36b450e33c3fb397a2f9b" alt="es-dsl-term-5.png"
多个分词匹配:terms
按照读个分词term匹配,它们是or的关系
GET /test-dsl-term-level/_search
{
"query": {
"terms": {
"programming_languages": ["php","c++"]
}
}
}
data:image/s3,"s3://crabby-images/3ae9d/3ae9d9f9d04e4560195d807bac92842176905a6f" alt="es-dsl-term-6.png"
按某个数字字段分词匹配:term set
设计这种方式查询的初衷是用文档中的数字字段动态匹配查询满足term的个数
GET /test-dsl-term-level/_search
{
"query": {
"terms_set": {
"programming_languages": {
"terms": [ "java", "php" ],
"minimum_should_match_field": "required_matches"
}
}
}
}
data:image/s3,"s3://crabby-images/0178f/0178fdd71a8ca4753282e4bc323d4b4521d29127" alt="es-dsl-term-7.png"
通配符:wildcard
通配符匹配,比如*
GET /test-dsl-term-level/_search
{
"query": {
"wildcard": {
"name": {
"value": "D*ai",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
data:image/s3,"s3://crabby-images/c66d9/c66d99a01f98b03486ee64288ebb79b218765eb0" alt="es-dsl-term-8.png"
范围:range
常常被用在数字或者日期范围的查询
GET /test-dsl-term-level/_search
{
"query": {
"range": {
"required_matches": {
"gte": 3,
"lte": 4
}
}
}
}
data:image/s3,"s3://crabby-images/78c2f/78c2f30a4805e15ea2563cddb8010f9a09e8d75d" alt="es-dsl-term-9.png"
正则:regexp
通过正则表达式 查询
以"Jan"开头的name字段
GET /test-dsl-term-level/_search
{
"query": {
"regexp": {
"name": {
"value": "Ja.*",
"case_insensitive": true
}
}
}
}
data:image/s3,"s3://crabby-images/76b0d/76b0df19fe6d4ed4b95422ffe88f003f2c6e6d09" alt="es-dsl-term-10.png"
模糊匹配:fuzzy
官方文档对模糊匹配:编辑距离是将一个术语转换为另一个术语所需的一个字符更改的次数。这些更改可以包括:
- 更改字符(box→ fox)
- 删除字符(black→ lack)
- 插入字符(sic→ sick)
- 转置两个相邻字符(act→ cat)
GET /test-dsl-term-level/_search
{
"query": {
"fuzzy": {
"remarks": {
"value": "hell"
}
}
}
}
data:image/s3,"s3://crabby-images/98759/98759d649738e1771645c5fe75eac41f446a2cd0" alt="es-dsl-term-11.png"
参考文章
https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html