Elasticsearch

什么是 Elasticsearch？

Elasticsearch 是一个分布式搜索引擎，基于 Lucene 构建，提供全文搜索、结构化搜索、分析等功能。

核心概念

ES 概念	对应关系型数据库
Index（索引）	Database
Type（类型）	Table（已废弃）
Document（文档）	Row
Field（字段）	Column
Mapping（映射）	Schema
Shard（分片）	Partition

倒排索引

传统索引：文档 → 词
倒排索引：词 → 文档

文档：
- 文档1：苹果手机很好用
- 文档2：华为手机也很好用
- 文档3：苹果很好吃

倒排索引：
┌────────┬───────────────────┐
│  词     │ 文档 ID           │
├────────┼───────────────────┤
│ 苹果    │ [1, 3]           │
│ 手机    │ [1, 2]           │
│ 很好    │ [1, 2, 3]        │
│ 华为    │ [2]              │
└────────┴───────────────────┘

基本操作

创建索引

PUT /user
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "age": { "type": "integer" },
      "email": { "type": "keyword" },
      "createTime": { "type": "date" }
    }
  },
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

索引文档

POST /user/_doc/1
{
  "name": "张三",
  "age": 25,
  "email": "zhangsan@example.com",
  "createTime": "2024-01-01"
}

查询文档

// 根据 ID 查询
GET /user/_doc/1
 
// 条件查询
GET /user/_search
{
  "query": {
    "match": {
      "name": "张三"
    }
  }
}
 
// 多条件查询
GET /user/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "张三" } }
      ],
      "filter": [
        { "range": { "age": { "gte": 20, "lte": 30 } } }
      ]
    }
  }
}

更新文档

POST /user/_update/1
{
  "doc": {
    "age": 26
  }
}

删除文档

DELETE /user/_doc/1

Java 客户端

依赖

<dependency>
    <groupId>co.elastic.clients</groupId>
    <artifactId>elasticsearch-java</artifactId>
    <version>8.11.0</version>
</dependency>

配置

@Configuration
public class EsConfig {
    
    @Bean
    public ElasticsearchClient elasticsearchClient() {
        RestClient client = RestClient.builder(
            HttpHost.create("localhost:9200")
        ).build();
        
        ElasticsearchTransport transport = new RestClientTransport(
            client, new JacksonJsonpMapper()
        );
        
        return new ElasticsearchClient(transport);
    }
}

使用

@Service
public class UserService {
    
    @Autowired
    private ElasticsearchClient client;
    
    // 索引文档
    public void indexUser(User user) throws IOException {
        client.index(i -> i
            .index("user")
            .id(user.getId().toString())
            .document(user)
        );
    }
    
    // 查询文档
    public User getUser(Long id) throws IOException {
        GetResponse<User> response = client.get(g -> g
            .index("user")
            .id(id.toString()),
            User.class
        );
        return response.found() ? response.source() : null;
    }
    
    // 搜索
    public List<User> searchUser(String name) throws IOException {
        SearchResponse<User> response = client.search(s -> s
            .index("user")
            .query(q -> q
                .match(m -> m
                    .field("name")
                    .query(name)
                )
            ),
            User.class
        );
        
        return response.hits().hits().stream()
            .map(Hit::source)
            .collect(Collectors.toList());
    }
}

查询语法

match（全文搜索）

{
  "query": {
    "match": {
      "name": "张三"
    }
  }
}

term（精确匹配）

{
  "query": {
    "term": {
      "email": "zhangsan@example.com"
    }
  }
}

range（范围查询）

{
  "query": {
    "range": {
      "age": {
        "gte": 20,
        "lte": 30
      }
    }
  }
}

bool（布尔查询）

{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "张三" } }
      ],
      "should": [
        { "match": { "email": "example" } }
      ],
      "must_not": [
        { "term": { "status": "deleted" } }
      ],
      "filter": [
        { "range": { "age": { "gte": 20 } } }
      ]
    }
  }
}

聚合

{
  "aggs": {
    "age_stats": {
      "stats": { "field": "age" }
    },
    "age_histogram": {
      "histogram": {
        "field": "age",
        "interval": 10
      }
    }
  }
}

分片与副本

索引 = 多个主分片 + 每个主分片的副本

┌─────────────────────────────────────────────┐
│              Index（索引）                   │
├──────────────┬──────────────┬───────────────┤
│   Shard 0    │   Shard 1    │   Shard 2     │
│  （主分片）   │  （主分片）   │  （主分片）   │
├──────────────┼──────────────┼───────────────┤
│  Replica 0   │  Replica 1   │  Replica 2    │
│  （副本）     │  （副本）     │  （副本）     │
└──────────────┴──────────────┴───────────────┘

作用：
1. 分片：水平扩展，提高吞吐量
2. 副本：高可用，故障恢复

面试高频问题

Q1: Elasticsearch 和 MySQL 的区别？

对比	MySQL	Elasticsearch
查询	B+树索引	倒排索引
全文搜索	弱	强
事务	支持	不支持
适用场景	OLTP	搜索、分析

Q2: 倒排索引的原理？

分词：将文档拆分为词项
建立索引：词项 → 文档列表
查询：通过词项快速定位文档

Q3: 如何优化 Elasticsearch 查询？

使用 filter 替代 query（可缓存）
避免深度分页
合理设置分片数量
使用批量操作

Q4: 深度分页问题？

问题：查询第 10000 页，每页 10 条
需要从每个分片取 10010 条，排序后取最后 10 条

解决：
1. scroll API（适合导出）
2. search_after（适合翻页）

总结

Elasticsearch 核心要点：
1. 核心概念：Index、Document、Field、Mapping
2. 倒排索引：词项 → 文档列表
3. 查询：match、term、range、bool
4. 分片：水平扩展；副本：高可用

RabbitMQ 消息队列分布式链路追踪