Replace Flannel with Weave Net in Kubernetes


The major reason that I replaced Flannel with Weave Net as the Kubernetes CNI is that Flannel does not support multicast.

I did not realize it until my test Confluence DC cluster became dysfunctional. There are two Confluence nodes in the Confluence DC cluster. Each one is a pod, they are working fine when they are running on the same k8s node. But when they are running on different k8s node, it is like two kings can not wear the same crown, they take turns to be killed by the liveness probe, then start over again and again.

From the log, I can see Hazelcast (which is the underlying cluster management tool) throws error. The node complains that an unknown node is modifying the Database:

Clustered Confluence: Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic.

2019-11-01 00:00:50,922 ERROR [Caesium-1-4] [impl.schedule.caesium.JobRunnerWrapper] runJob Scheduled job ====DEBUG DUMP END====
2019-11-01 00:00:50,832 WARN [Caesium-1-4] [analytics.client.listener.ProductEventListener] processEventWithTiming Processing a critical event: com.atlassian.confluence.cluster.safety.ClusterPanicAnalyticsEvent@7f215a2f
2019-11-01 00:00:50,836 ERROR [Caesium-1-4] [confluence.cluster.safety.ClusterPanicListener] onClusterPanicEvent Received a panic event, stopping processing on the node: [Origin node: 538c39e6 listening on /10.244.1.4:5801] Clustered Confluence: Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic.
2019-11-01 00:00:50,836 WARN [Caesium-1-4] [confluence.cluster.safety.ClusterPanicListener] onClusterPanicEvent com.atlassian.confluence.cluster.hazelcast.HazelcastClusterInformation@7167aab3
2019-11-01 00:00:50,837 WARN [Caesium-1-4] [confluence.cluster.safety.ClusterPanicListener] onClusterPanicEvent Shutting down scheduler
2019-11-01 00:00:50,860 INFO [Caesium-1-4] [confluence.cluster.hazelcast.HazelcastClusterManager] stopCluster Shutting down the cluster
2019-11-01 00:00:50,916 WARN [Caesium-1-4] [confluence.cluster.hazelcast.HazelcastClusterSafetyManager] rateLimitHazelcastLogging Rate limiting logging caused by HazelcastInstanceNotActiveException
2019-11-01 00:00:50,921 WARN [Caesium-1-4] [confluence.cluster.hazelcast.HazelcastClusterSafetyManager] rateLimitHazelcastLogging Rate limiting logging caused by HazelcastInstanceNotActiveException
2019-11-01 00:00:50,922 WARN [Caesium-1-4] [confluence.cluster.hazelcast.HazelcastClusterSafetyManager] rateLimitHazelcastLogging Rate limiting logging caused by HazelcastInstanceNotActiveException
2019-11-01 00:00:50,922 ERROR [Caesium-1-4] [impl.schedule.caesium.JobRunnerWrapper] runJob Scheduled job ClusterSafetyJob#ClusterSafetyJob failed to run
com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
	at com.hazelcast.spi.AbstractDistributedObject.throwNotActiveException(AbstractDistributedObject.java:105)
	at com.hazelcast.spi.AbstractDistributedObject.lifecycleCheck(AbstractDistributedObject.java:100)
	at com.hazelcast.spi.AbstractDistributedObject.getNodeEngine(AbstractDistributedObject.java:94)
	at com.hazelcast.map.impl.proxy.MapProxyImpl.unlock(MapProxyImpl.java:332)
	at com.atlassian.confluence.cluster.hazelcast.HazelcastDualLock.unlock(HazelcastDualLock.java:53)
	at com.atlassian.confluence.impl.schedule.caesium.JobRunnerWrapper.doRunJob(JobRunnerWrapper.java:150)
	at com.atlassian.confluence.impl.schedule.caesium.JobRunnerWrapper.lambda$runJob$0(JobRunnerWrapper.java:87)
	at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContextInternal(VCacheRequestContextManager.java:84)
	at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContext(VCacheRequestContextManager.java:68)
	at com.atlassian.confluence.impl.schedule.caesium.JobRunnerWrapper.runJob(JobRunnerWrapper.java:87)
	at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:134)
	at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:106)
	at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:90)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:435)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeLocalJob(CaesiumSchedulerService.java:402)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:380)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35)
	at java.lang.Thread.run(Unknown Source)

It is Confluence cluster running on Kubernetes, the pod IP is dynamic, and AWS role does not apply here, so multicast in the only option when I first setup the Conflucne Cluster. So the first thing came into my mind is that the multicast IP address does not pass across the k8s nodes. And I confirmed it by finding this open issue in the Flannel Github repository.

The fix is to replace Flannel with other CNI plugin that supports multicast. In my case, I use Weave Net. Here is how I replace it, just be aware of that it requires an outage.

First, install Weave Net CNI plugin

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Second, uninstall Flannel CNI plugin

# Assume kube-flannel.yml is the config file that you used to install Flannel

kubectl delete -f kube-flannel.yml

Third, delete the Interface and config file that were created by Flannel on each k8s node:

ip link delete cni0
ip link delete flannel.1
rm /etc/cni/net.d/10-flannel.conflist

Fourth, restart kubelet service on each k8s node:

$ systemctl restart kubelet

Lastly, you need to kill all Pods as they still use the old CIDR that is from Flannel. In my case, I just scale down the statefulSet down to 0, then scale it up.

kubectl scale --replicas=0 sts/confluence
kubectl scale --replicas=3 sts/confluence

Now look the three happy nodes 🙂

$ kubectl get pods -l app=confluence -o wide

NAME           READY   STATUS    RESTARTS   AGE    IP           NODE           NOMINATED NODE   READINESS GATES
confluence-0   1/1     Running   1          133m   10.32.0.3    node-2   	   <none>           <none>
confluence-1   1/1     Running   0          85m    10.40.0.19   node-1         <none>           <none>
confluence-2   1/1     Running   0          78m    10.32.0.5    node-2		   <none>           <none>

Some interesting reading: Multi CNI and Containers with Multi Network Interfaces on Kubernetes with CNI-Genie

References:

https://kubernetes.io/docs/concepts/cluster-administration/addons/
https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#network-plugin-requirements

Leave a comment