I wrote a blog about How Confluence Data Center Manages Index Files. Now let’s have a quick look how Jira manages index files. Comparing to Confluence, Jira manages index files in a quite different way.
In a multiple nodes Jira Data Center cluster, each node keeps the index files locally and tries to reach eventual consistency. When a change is made on one of the node (e.g a new issue is created), the node adds index for that change and also adds an entry to the database replicatedindexoperation table and will remove it after two days. So that other nodes can re-play the operation. That means at a given point of time, the index files could be inconsistent across nodes.
When a new Jira node joins the cluster, it will query the database to find out which node has the latest operation. Then the new node sends a request to that Jira node asking for a copy of the index files. That node will take a snapshot and copy the snapshot file to the shared home folder, so the new Jira node can restore from it. If the new node is unable to restore the index files from other nodes, it won’t restore from the index files backup nor automatically triggers a re-indexing. So it means you always have to re-index or restore index files from backup for the first node in the cluster.
When an old Jira node re-joins the cluster, it will compare its local index files ID with the latest ID that is recorded in database. If the delta is less than two days, it will replays the operations that are kept in the replicatedindexoperation table. If the delta are more than two days, it will send request to other node to get a copy of the latest index files.
Also it is worth mentioning that a Jira node will push the index files to other nodes after it has done a fore-ground re-indexing.
With all said above, I think Jira is really designed for a static environment – infrequent change to nodes. To make it work in a dynamic environment, e.g use AWS auto-scaling group to setup a cluster, the EC2 instances come and go. There are a fair bit of automation work need to be done for the index file and stale nodes management.
We have worked out a solution to run Jira in AWS, which I will write a blog to share the knowledge soon. As a preparation for my next blog, here are a couple of things you need to be familiar with first:
- local index caches folder
- shared home folder index backup folder
- .zip vs .sz (snappy) format
- clusternode table
- replicatedindexoperation table