ElasticSearch

  1. 1. Elasticsearch 概述
    1. 1.1. 核心概念
  2. 2. 安装 Elasticsearch
    1. 2.1. 使用 Docker 安装(推荐)
    2. 2.2. Docker Compose 集群
    3. 2.3. 手动安装
  3. 3. 基本操作
    1. 3.1. 索引管理
    2. 3.2. 文档操作
    3. 3.3. 搜索查询
  4. 4. 高级特性
    1. 4.1. Mapping 定义
    2. 4.2. 分析器配置
    3. 4.3. 索引别名
  5. 5. 性能优化
    1. 5.1. 索引优化
    2. 5.2. 查询优化
  6. 6. 集群管理
    1. 6.1. 集群健康检查
    2. 6.2. 分片管理
  7. 7. 总结核心知识要点
    1. 7.1. Elasticsearch 架构
    2. 7.2. 核心 API 示例
      1. 7.2.1. 1. CRUD 操作
      2. 7.2.2. 2. 搜索查询
      3. 7.2.3. 3. 聚合分析
    3. 7.3. 数据类型对比
    4. 7.4. 查询类型对比
    5. 7.5. 性能调优清单
    6. 7.6. 常用运维命令
    7. 7.7. 最佳实践
    8. 7.8. 核心原理
  8. 8. References

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases.

Elasticsearch 概述

Elasticsearch 是一个基于 Lucene 的分布式搜索和分析引擎,由 Elastic 公司开发。核心特性包括:

  • 全文搜索: 强大的全文检索能力
  • 分布式架构: 水平扩展,支持 PB 级数据
  • 实时性: 近实时的搜索和分析
  • RESTful API: 简单易用的 HTTP 接口
  • 多租户: 支持多索引并行查询
  • 高可用: 自动副本和分片管理

核心概念

  • Index(索引): 类似数据库的 Database,存储相关文档的集合
  • Document(文档): 索引中的一条记录,JSON 格式
  • Field(字段): 文档中的键值对
  • Mapping(映射): 定义文档结构和字段类型
  • Shard(分片): 索引的水平分割,实现分布式存储
  • Replica(副本): 分片的复制,提供高可用和读性能
  • Node(节点): Elasticsearch 集群中的单个服务器
  • Cluster(集群): 多个节点的集合

安装 Elasticsearch

使用 Docker 安装(推荐)

1
2
3
4
5
6
7
8
9
10
# 单节点开发模式
docker run -d \
--name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0

# 验证安装
curl http://localhost:9200

Docker Compose 集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
version: '3'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
container_name: es01
environment:
- node.name=es01
- cluster.name=es-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false
ulimits:
memlock:
soft: -1
hard: -1
ports:
- 9200:9200

es02:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
container_name: es02
environment:
- node.name=es02
- cluster.name=es-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false

es03:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
container_name: es03
environment:
- node.name=es03
- cluster.name=es-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false

手动安装

1
2
3
4
5
6
7
8
9
10
# 下载
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-linux-x86_64.tar.gz
tar -xzf elasticsearch-8.11.0-linux-x86_64.tar.gz
cd elasticsearch-8.11.0

# 启动
./bin/elasticsearch

# 后台运行
./bin/elasticsearch -d -p pid

基本操作

索引管理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 创建索引
PUT /my-index
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 2
},
"mappings": {
"properties": {
"title": { "type": "text" },
"author": { "type": "keyword" },
"publish_date": { "type": "date" },
"content": { "type": "text" },
"views": { "type": "integer" }
}
}
}

# 查看索引
GET /my-index

# 查看所有索引
GET /_cat/indices?v

# 删除索引
DELETE /my-index

# 关闭/打开索引
POST /my-index/_close
POST /my-index/_open

# 更新索引设置
PUT /my-index/_settings
{
"number_of_replicas": 1
}

文档操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# 创建文档(指定 ID)
PUT /my-index/_doc/1
{
"title": "Elasticsearch Guide",
"author": "John Doe",
"publish_date": "2024-01-01",
"content": "Introduction to Elasticsearch",
"views": 1000
}

# 创建文档(自动生成 ID)
POST /my-index/_doc
{
"title": "Advanced Search",
"author": "Jane Smith",
"publish_date": "2024-01-15",
"content": "Deep dive into search",
"views": 500
}

# 获取文档
GET /my-index/_doc/1

# 更新文档(全量)
PUT /my-index/_doc/1
{
"title": "Elasticsearch Guide Updated",
"author": "John Doe",
"publish_date": "2024-01-01",
"content": "Updated introduction",
"views": 1500
}

# 更新文档(部分)
POST /my-index/_update/1
{
"doc": {
"views": 2000
}
}

# 删除文档
DELETE /my-index/_doc/1

# 批量操作
POST /_bulk
{"index":{"_index":"my-index","_id":"1"}}
{"title":"Doc 1","author":"Author 1"}
{"index":{"_index":"my-index","_id":"2"}}
{"title":"Doc 2","author":"Author 2"}
{"delete":{"_index":"my-index","_id":"3"}}

搜索查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# 简单查询
GET /my-index/_search
{
"query": {
"match": {
"title": "elasticsearch"
}
}
}

# 多字段查询
GET /my-index/_search
{
"query": {
"multi_match": {
"query": "search engine",
"fields": ["title", "content"]
}
}
}

# 精确匹配
GET /my-index/_search
{
"query": {
"term": {
"author.keyword": "John Doe"
}
}
}

# 范围查询
GET /my-index/_search
{
"query": {
"range": {
"views": {
"gte": 100,
"lte": 1000
}
}
}
}

# 布尔查询
GET /my-index/_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "elasticsearch" } }
],
"filter": [
{ "range": { "views": { "gte": 500 } } }
],
"should": [
{ "match": { "author": "John" } }
],
"must_not": [
{ "match": { "content": "deprecated" } }
]
}
}
}

# 聚合查询
GET /my-index/_search
{
"size": 0,
"aggs": {
"authors": {
"terms": {
"field": "author.keyword"
}
},
"avg_views": {
"avg": {
"field": "views"
}
}
}
}

高级特性

Mapping 定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "standard",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"description": {
"type": "text",
"analyzer": "english"
},
"price": {
"type": "float"
},
"stock": {
"type": "integer"
},
"category": {
"type": "keyword"
},
"tags": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"location": {
"type": "geo_point"
},
"specifications": {
"type": "nested",
"properties": {
"key": { "type": "keyword" },
"value": { "type": "text" }
}
}
}
}
}

分析器配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
PUT /articles
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "my_stop", "my_synonym"]
}
},
"filter": {
"my_stop": {
"type": "stop",
"stopwords": ["the", "a", "an"]
},
"my_synonym": {
"type": "synonym",
"synonyms": ["quick,fast", "jumps,leaps"]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}

索引别名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# 创建别名
POST /_aliases
{
"actions": [
{ "add": { "index": "my-index-v1", "alias": "my-index" } }
]
}

# 切换别名(零停机)
POST /_aliases
{
"actions": [
{ "remove": { "index": "my-index-v1", "alias": "my-index" } },
{ "add": { "index": "my-index-v2", "alias": "my-index" } }
]
}

# 过滤别名
POST /_aliases
{
"actions": [
{
"add": {
"index": "logs",
"alias": "logs-error",
"filter": { "term": { "level": "error" } }
}
}
]
}

性能优化

索引优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
PUT /optimized-index
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "30s",
"index": {
"codec": "best_compression",
"max_result_window": 10000
}
}
}

// 批量索引时临时禁用 refresh
PUT /my-index/_settings
{
"refresh_interval": "-1"
}

// 索引完成后恢复
PUT /my-index/_settings
{
"refresh_interval": "1s"
}

查询优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// 使用 filter 而非 query(可缓存)
GET /products/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "category": "electronics" } },
{ "range": { "price": { "lte": 1000 } } }
]
}
}
}

// 限制返回字段
GET /products/_search
{
"_source": ["name", "price"],
"query": { "match_all": {} }
}

// 使用 scroll API 处理大数据集
POST /my-index/_search?scroll=1m
{
"size": 1000,
"query": { "match_all": {} }
}

GET /_search/scroll
{
"scroll": "1m",
"scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAA..."
}

集群管理

集群健康检查

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 查看集群健康
GET /_cluster/health

# 查看节点信息
GET /_cat/nodes?v

# 查看分片分配
GET /_cat/shards?v

# 查看待处理任务
GET /_cluster/pending_tasks

# 查看集群状态
GET /_cluster/stats

分片管理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 手动分配分片
POST /_cluster/reroute
{
"commands": [
{
"move": {
"index": "my-index",
"shard": 0,
"from_node": "node1",
"to_node": "node2"
}
}
]
}

# 设置分片分配规则
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "all"
}
}

总结核心知识要点

Elasticsearch 架构

  • 集群结构: Master Node(管理)+ Data Node(存储)+ Coordinating Node(协调)
  • 分片机制: 主分片(Primary Shard)+ 副本分片(Replica Shard)
  • 倒排索引: 基于 Lucene,实现快速全文搜索
  • 近实时: 1秒内可见(默认 refresh_interval)

核心 API 示例

1. CRUD 操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Create/Update
PUT /products/_doc/1
{
"name": "Laptop",
"price": 999.99,
"category": "electronics"
}

# Read
GET /products/_doc/1

# Update
POST /products/_update/1
{
"doc": { "price": 899.99 }
}

# Delete
DELETE /products/_doc/1

2. 搜索查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"query": {
"bool": {
"must": [
{ "match": { "name": "laptop" } }
],
"filter": [
{ "range": { "price": { "lte": 1000 } } },
{ "term": { "category": "electronics" } }
]
}
},
"sort": [
{ "price": "asc" }
],
"from": 0,
"size": 10
}

3. 聚合分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"size": 0,
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 100 },
{ "from": 100, "to": 500 },
{ "from": 500 }
]
}
},
"avg_price_by_category": {
"terms": {
"field": "category.keyword"
},
"aggs": {
"avg_price": {
"avg": { "field": "price" }
}
}
}
}
}

数据类型对比

类型 说明 示例
text 全文索引,会分词 文章内容
keyword 精确匹配,不分词 标签、分类
integer/long 整数 数量、ID
float/double 浮点数 价格、评分
date 日期时间 创建时间
boolean 布尔值 是否启用
geo_point 地理坐标 经纬度
nested 嵌套对象 复杂结构

查询类型对比

查询类型 使用场景 缓存
match 全文搜索,会分词
term 精确匹配,不分词 是(filter)
range 范围查询 是(filter)
bool 组合查询 部分
wildcard 通配符查询
prefix 前缀查询
fuzzy 模糊查询

性能调优清单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 1. 合理设置分片数
# 分片数 = (数据量 / 50GB) 向上取整
# 单分片大小建议 20-50GB

# 2. 批量索引
POST /_bulk
{"index":{"_index":"products"}}
{"name":"Product 1","price":100}
{"index":{"_index":"products"}}
{"name":"Product 2","price":200}

# 3. 禁用 refresh(批量导入时)
PUT /products/_settings
{
"refresh_interval": "-1"
}

# 4. 使用 filter 而非 query
# filter 可缓存,query 不可缓存

# 5. 限制结果集大小
GET /products/_search
{
"size": 100,
"from": 0
}

# 6. 使用 routing 优化查询
PUT /logs/_doc/1?routing=user123

常用运维命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 集群健康
GET /_cluster/health

# 节点统计
GET /_nodes/stats

# 索引统计
GET /my-index/_stats

# 查看慢日志
GET /_cat/indices?v&s=search.query_time_in_millis:desc

# 清理缓存
POST /my-index/_cache/clear

# Force merge(合并段)
POST /my-index/_forcemerge?max_num_segments=1

# 重新索引
POST /_reindex
{
"source": { "index": "old-index" },
"dest": { "index": "new-index" }
}

最佳实践

  • 索引设计: 按时间分割索引(logs-2024-01),便于管理和删除
  • 分片规划: 主分片数固定,副本数可动态调整
  • Mapping 设计: 禁用不需要的字段索引("enabled": false
  • 查询优化: 优先使用 filter,缓存常用查询
  • 监控告警: 监控集群健康、节点负载、查询延迟
  • 备份恢复: 定期快照备份(Snapshot & Restore)

核心原理

  • 倒排索引: Term → Document ID 列表,快速定位文档
  • 分片路由: shard = hash(routing) % number_of_primary_shards
  • 副本一致性: 主分片写入成功后同步到副本
  • 段合并: 后台自动合并小段文件,提升查询性能
  • 评分机制: TF-IDF 或 BM25 算法计算相关性得分

References