记一次ElasticSearch分片分配失败的故障

最近ElasticSearch频繁的宕掉，检查其健康状态发现有一个索引的分片是UNASSIGNED的状态，于是就开始着手处理这个问题，并在此记录下解决这个问题的办法。

查找分片异常的索引

1	curl -X GET 10.10.161.1:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason\| grep UNASSIGNED

返回结果：

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 51215  100 51215    0     0   479k      0 --:--:-- --:--:-- --:--:--  480k
test_index          3 r UNASSIGNED ALLOCATION_FAILED

由此信息得知是第三个分片的从分片出现无法分配的情况。

尝试解决

在Kibana中尝试重试分配分片

1	POST _cluster/reroute?retry_failed

失败的分配返回如下：

{
                "state": "INITIALIZING",
                "primary": false,
                "node": "KbRtC52uQWS_6C5l1kocKA",
                "relocating_node": null,
                "shard": 3,
                "index": "test_index",
                "recovery_source": {
                  "type": "PEER"
                },
                "allocation_id": {
                  "id": "V6ePWlEBSCGve4sIXXWm1w"
                },
                "unassigned_info": {
                  "reason": "ALLOCATION_FAILED",
                  "at": "2019-09-12T06:38:16.568Z",
                  "failed_attempts": 5,
                  "delayed": false,
                  "details": "failed recovery, failure RecoveryFailedException[[test_index][3]: Recovery failed from {5cCOK1o}{5cCOK1oWT_KrCgerQGfcaA}{Lk2vCAhzQJK6r5C6UzXLZw}{10.10.161.102}{10.10.161.102:9300} into {KbRtC52}{KbRtC52uQWS_6C5l1kocKA}{0KKoe57ATfyA6oKehRoArw}{10.10.161.103}{10.10.161.103:9300}]; nested: RemoteTransportException[[5cCOK1o][10.10.161.102:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: IllegalStateException[try to recover [test_index][3] from primary shard with sync id but number of docs differ: 4609345 (5cCOK1o, primary) vs 4609267(KbRtC52)]; ",
                  "allocation_status": "no_attempt"
                }
              }

这个报错说明主从的doc数量不一致导致无法重新分配分片。

查看该索引的配置信息：

1	GET /test_index/_settings

输出如下：

{
  "test_index": {
    "settings": {
      "index": {
        "creation_date": "1567406954255",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "1g3CpeaVRX-fp3-PBniCJg",
        "version": {
          "created": "5020299"
        },
        "provided_name": "test_index"
      }
    }
  }
}

由此得知，该索引有5个分片，一个副本。于是手动分配一个分片：

POST /_cluster/reroute
{
  "commands": [
    {
      "allocate_replica": {
        "index": "test_index",
        "shard": 3,
        "node": "10.10.161.103"
        }
    }
  ]
}

结果返回如下错误

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[5cCOK1o][10.10.161.102:9300][cluster:admin/reroute]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "[allocate_replica] allocation of [test_index][3] on node {KbRtC52}{KbRtC52uQWS_6C5l1kocKA}{0KKoe57ATfyA6oKehRoArw}{10.10.161.103}{10.10.161.103:9300} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-12T09:02:57.343Z], failed_attempts[6], delayed=false, details[failed recovery, failure RecoveryFailedException[[test_index][3]: Recovery failed from {5cCOK1o}{5cCOK1oWT_KrCgerQGfcaA}{Lk2vCAhzQJK6r5C6UzXLZw}{10.10.161.102}{10.10.161.102:9300} into {KbRtC52}{KbRtC52uQWS_6C5l1kocKA}{0KKoe57ATfyA6oKehRoArw}{10.10.161.103}{10.10.161.103:9300}]; nested: RemoteTransportException[[5cCOK1o][10.10.161.102:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: IllegalStateException[try to recover [test_index][3] from primary shard with sync id but number of docs differ: 4609345 (5cCOK1o, primary) vs 4609267(KbRtC52)]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(target node version [5.2.2] is the same or newer than source node version [5.2.2])][YES(the shard is not being snapshotted)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [3tb], shard size: [0b], free after allocating shard: [3tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it)]"
  },
  "status": 400
}

看来手动无法给分配分片了。

最终解决方案

将其先设置为一个副本

PUT /test_index/_settings
{
    "index" : {
        "number_of_replicas" : 0
    }
}

查看该索引的分片

1	GET /_cat/shards/test_index

此时，所有的分片都没有副本，于是就不存在什么主从（P\R）。

test_index          3 p STARTED     4609345   1.3gb 10.10.161.102 5cCOK1o
test_index          2 p STARTED     4609050   1.6gb 10.10.161.103 KbRtC52
test_index          1 p STARTED     4607156   1.5gb 10.10.161.103 KbRtC52
test_index          4 p STARTED     4596510   1.3gb 10.10.161.102 5cCOK1o
test_index          0 p STARTED     4605846   1.4gb 10.10.161.102 5cCOK1o

此时再将其在设置为1

PUT /test_index/_settings
{
    "index" : {
        "number_of_replicas" : 1
    }
}

等待一会儿，该索引便可添加一个副本

1	curl -X GET 10.10.161.1:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason\| grep UNASSIGNED

输出如下：

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0test_index          3 r UNASSIGNED   REPLICA_ADDED
test_index          2 r UNASSIGNED   REPLICA_ADDED
test_index          1 r UNASSIGNED   REPLICA_ADDED
test_index          4 r UNASSIGNED   REPLICA_ADDED
100 53195  100 53195    0     0   500k      0 --:--:-- --:--:-- --:--:--  504k