ElasticSearch

2023-05-14
作者 Sirius
~16.61K 字
次阅读
条评论

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases.

Elasticsearch 概述

Elasticsearch 是一个基于 Lucene 的分布式搜索和分析引擎，由 Elastic 公司开发。核心特性包括：

全文搜索: 强大的全文检索能力
分布式架构: 水平扩展，支持 PB 级数据
实时性: 近实时的搜索和分析
RESTful API: 简单易用的 HTTP 接口
多租户: 支持多索引并行查询
高可用: 自动副本和分片管理

核心概念

Index（索引）: 类似数据库的 Database，存储相关文档的集合
Document（文档）: 索引中的一条记录，JSON 格式
Field（字段）: 文档中的键值对
Mapping（映射）: 定义文档结构和字段类型
Shard（分片）: 索引的水平分割，实现分布式存储
Replica（副本）: 分片的复制，提供高可用和读性能
Node（节点）: Elasticsearch 集群中的单个服务器
Cluster（集群）: 多个节点的集合

安装 Elasticsearch

使用 Docker 安装（推荐）

# 单节点开发模式
docker run -d \
  --name elasticsearch \
  -p 9200:9200 -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
  docker.elastic.co/elasticsearch/elasticsearch:8.11.0

# 验证安装
curl http://localhost:9200

Docker Compose 集群

version: '3'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - xpack.security.enabled=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200

  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - xpack.security.enabled=false

  es03:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - xpack.security.enabled=false

手动安装

# 下载
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-linux-x86_64.tar.gz
tar -xzf elasticsearch-8.11.0-linux-x86_64.tar.gz
cd elasticsearch-8.11.0

# 启动
./bin/elasticsearch

# 后台运行
./bin/elasticsearch -d -p pid

基本操作

索引管理

# 创建索引
PUT /my-index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 2
  },
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "author": { "type": "keyword" },
      "publish_date": { "type": "date" },
      "content": { "type": "text" },
      "views": { "type": "integer" }
    }
  }
}

# 查看索引
GET /my-index

# 查看所有索引
GET /_cat/indices?v

# 删除索引
DELETE /my-index

# 关闭/打开索引
POST /my-index/_close
POST /my-index/_open

# 更新索引设置
PUT /my-index/_settings
{
  "number_of_replicas": 1
}

文档操作

# 创建文档（指定 ID）
PUT /my-index/_doc/1
{
  "title": "Elasticsearch Guide",
  "author": "John Doe",
  "publish_date": "2024-01-01",
  "content": "Introduction to Elasticsearch",
  "views": 1000
}

# 创建文档（自动生成 ID）
POST /my-index/_doc
{
  "title": "Advanced Search",
  "author": "Jane Smith",
  "publish_date": "2024-01-15",
  "content": "Deep dive into search",
  "views": 500
}

# 获取文档
GET /my-index/_doc/1

# 更新文档（全量）
PUT /my-index/_doc/1
{
  "title": "Elasticsearch Guide Updated",
  "author": "John Doe",
  "publish_date": "2024-01-01",
  "content": "Updated introduction",
  "views": 1500
}

# 更新文档（部分）
POST /my-index/_update/1
{
  "doc": {
    "views": 2000
  }
}

# 删除文档
DELETE /my-index/_doc/1

# 批量操作
POST /_bulk
{"index":{"_index":"my-index","_id":"1"}}
{"title":"Doc 1","author":"Author 1"}
{"index":{"_index":"my-index","_id":"2"}}
{"title":"Doc 2","author":"Author 2"}
{"delete":{"_index":"my-index","_id":"3"}}

搜索查询

# 简单查询
GET /my-index/_search
{
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}

# 多字段查询
GET /my-index/_search
{
  "query": {
    "multi_match": {
      "query": "search engine",
      "fields": ["title", "content"]
    }
  }
}

# 精确匹配
GET /my-index/_search
{
  "query": {
    "term": {
      "author.keyword": "John Doe"
    }
  }
}

# 范围查询
GET /my-index/_search
{
  "query": {
    "range": {
      "views": {
        "gte": 100,
        "lte": 1000
      }
    }
  }
}

# 布尔查询
GET /my-index/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "elasticsearch" } }
      ],
      "filter": [
        { "range": { "views": { "gte": 500 } } }
      ],
      "should": [
        { "match": { "author": "John" } }
      ],
      "must_not": [
        { "match": { "content": "deprecated" } }
      ]
    }
  }
}

# 聚合查询
GET /my-index/_search
{
  "size": 0,
  "aggs": {
    "authors": {
      "terms": {
        "field": "author.keyword"
      }
    },
    "avg_views": {
      "avg": {
        "field": "views"
      }
    }
  }
}

高级特性

Mapping 定义

PUT /products
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "price": {
        "type": "float"
      },
      "stock": {
        "type": "integer"
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      },
      "location": {
        "type": "geo_point"
      },
      "specifications": {
        "type": "nested",
        "properties": {
          "key": { "type": "keyword" },
          "value": { "type": "text" }
        }
      }
    }
  }
}

分析器配置

PUT /articles
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stop", "my_synonym"]
        }
      },
      "filter": {
        "my_stop": {
          "type": "stop",
          "stopwords": ["the", "a", "an"]
        },
        "my_synonym": {
          "type": "synonym",
          "synonyms": ["quick,fast", "jumps,leaps"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }
}

索引别名

# 创建别名
POST /_aliases
{
  "actions": [
    { "add": { "index": "my-index-v1", "alias": "my-index" } }
  ]
}

# 切换别名（零停机）
POST /_aliases
{
  "actions": [
    { "remove": { "index": "my-index-v1", "alias": "my-index" } },
    { "add": { "index": "my-index-v2", "alias": "my-index" } }
  ]
}

# 过滤别名
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "logs",
        "alias": "logs-error",
        "filter": { "term": { "level": "error" } }
      }
    }
  ]
}

性能优化

索引优化

PUT /optimized-index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "index": {
      "codec": "best_compression",
      "max_result_window": 10000
    }
  }
}

// 批量索引时临时禁用 refresh
PUT /my-index/_settings
{
  "refresh_interval": "-1"
}

// 索引完成后恢复
PUT /my-index/_settings
{
  "refresh_interval": "1s"
}

查询优化

// 使用 filter 而非 query（可缓存）
GET /products/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "category": "electronics" } },
        { "range": { "price": { "lte": 1000 } } }
      ]
    }
  }
}

// 限制返回字段
GET /products/_search
{
  "_source": ["name", "price"],
  "query": { "match_all": {} }
}

// 使用 scroll API 处理大数据集
POST /my-index/_search?scroll=1m
{
  "size": 1000,
  "query": { "match_all": {} }
}

GET /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAA..."
}

集群管理

集群健康检查

# 查看集群健康
GET /_cluster/health

# 查看节点信息
GET /_cat/nodes?v

# 查看分片分配
GET /_cat/shards?v

# 查看待处理任务
GET /_cluster/pending_tasks

# 查看集群状态
GET /_cluster/stats

分片管理

# 手动分配分片
POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "my-index",
        "shard": 0,
        "from_node": "node1",
        "to_node": "node2"
      }
    }
  ]
}

# 设置分片分配规则
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}

总结核心知识要点

Elasticsearch 架构

集群结构: Master Node（管理）+ Data Node（存储）+ Coordinating Node（协调）
分片机制: 主分片（Primary Shard）+ 副本分片（Replica Shard）
倒排索引: 基于 Lucene，实现快速全文搜索
近实时: 1秒内可见（默认 refresh_interval）

核心 API 示例

1. CRUD 操作

# Create/Update
PUT /products/_doc/1
{
  "name": "Laptop",
  "price": 999.99,
  "category": "electronics"
}

# Read
GET /products/_doc/1

# Update
POST /products/_update/1
{
  "doc": { "price": 899.99 }
}

# Delete
DELETE /products/_doc/1

2. 搜索查询

{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" } }
      ],
      "filter": [
        { "range": { "price": { "lte": 1000 } } },
        { "term": { "category": "electronics" } }
      ]
    }
  },
  "sort": [
    { "price": "asc" }
  ],
  "from": 0,
  "size": 10
}

3. 聚合分析

{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500 }
        ]
      }
    },
    "avg_price_by_category": {
      "terms": {
        "field": "category.keyword"
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    }
  }
}

数据类型对比

类型	说明	示例
`text`	全文索引，会分词	文章内容
`keyword`	精确匹配，不分词	标签、分类
`integer/long`	整数	数量、ID
`float/double`	浮点数	价格、评分
`date`	日期时间	创建时间
`boolean`	布尔值	是否启用
`geo_point`	地理坐标	经纬度
`nested`	嵌套对象	复杂结构

查询类型对比

查询类型	使用场景	缓存
`match`	全文搜索，会分词	否
`term`	精确匹配，不分词	是（filter）
`range`	范围查询	是（filter）
`bool`	组合查询	部分
`wildcard`	通配符查询	否
`prefix`	前缀查询	否
`fuzzy`	模糊查询	否

性能调优清单

# 1. 合理设置分片数
# 分片数 = (数据量 / 50GB) 向上取整
# 单分片大小建议 20-50GB

# 2. 批量索引
POST /_bulk
{"index":{"_index":"products"}}
{"name":"Product 1","price":100}
{"index":{"_index":"products"}}
{"name":"Product 2","price":200}

# 3. 禁用 refresh（批量导入时）
PUT /products/_settings
{
  "refresh_interval": "-1"
}

# 4. 使用 filter 而非 query
# filter 可缓存，query 不可缓存

# 5. 限制结果集大小
GET /products/_search
{
  "size": 100,
  "from": 0
}

# 6. 使用 routing 优化查询
PUT /logs/_doc/1?routing=user123

常用运维命令

# 集群健康
GET /_cluster/health

# 节点统计
GET /_nodes/stats

# 索引统计
GET /my-index/_stats

# 查看慢日志
GET /_cat/indices?v&s=search.query_time_in_millis:desc

# 清理缓存
POST /my-index/_cache/clear

# Force merge（合并段）
POST /my-index/_forcemerge?max_num_segments=1

# 重新索引
POST /_reindex
{
  "source": { "index": "old-index" },
  "dest": { "index": "new-index" }
}

最佳实践

索引设计: 按时间分割索引（logs-2024-01），便于管理和删除
分片规划: 主分片数固定，副本数可动态调整
Mapping 设计: 禁用不需要的字段索引（"enabled": false）
查询优化: 优先使用 filter，缓存常用查询
监控告警: 监控集群健康、节点负载、查询延迟
备份恢复: 定期快照备份（Snapshot & Restore）

核心原理

倒排索引: Term → Document ID 列表，快速定位文档
分片路由: shard = hash(routing) % number_of_primary_shards
副本一致性: 主分片写入成功后同步到副本
段合并: 后台自动合并小段文件，提升查询性能
评分机制: TF-IDF 或 BM25 算法计算相关性得分

Hi, Sirius

ElasticSearch

Elasticsearch 概述

核心概念

安装 Elasticsearch

使用 Docker 安装（推荐）

Docker Compose 集群

手动安装

基本操作

索引管理

文档操作

搜索查询

高级特性

Mapping 定义

分析器配置

索引别名

性能优化

索引优化

查询优化

集群管理

集群健康检查

分片管理

总结核心知识要点

Elasticsearch 架构

核心 API 示例

1. CRUD 操作

2. 搜索查询

3. 聚合分析

数据类型对比

查询类型对比

性能调优清单

常用运维命令

最佳实践

核心原理

References

本作品采用知识共享署名-相同方式共享 4.0 国际许可协议进行许可

Hi, Sirius

Elasticsearch 概述

核心概念

安装 Elasticsearch

使用 Docker 安装（推荐）

Docker Compose 集群

手动安装

基本操作

索引管理

文档操作

搜索查询

高级特性

Mapping 定义

分析器配置

索引别名

性能优化

索引优化

查询优化

集群管理

集群健康检查

分片管理

总结核心知识要点

Elasticsearch 架构

核心 API 示例

1. CRUD 操作

2. 搜索查询

3. 聚合分析

数据类型对比

查询类型对比

性能调优清单

常用运维命令

最佳实践

核心原理

References

本作品采用 知识共享署名-相同方式共享 4.0 国际许可协议 进行许可

本作品采用知识共享署名-相同方式共享 4.0 国际许可协议进行许可