Elasticache Redis Unreachable Issue

We have a Elasticache Redis replication group, it has two nodes: one primary and one replica. Last week, we noticed that the primary redis node suddenly stops working – any connections to the primary node timed out eventually.

According to the log, there was a load burst and following that the redis reboot itself.

Unfortunately, the redis node stops responding after that. The weird thing is the replication between nodes still works. So I promoted the original replica to primary, and login into it. The ‘role’ or ‘info’ commands tells me the replication is working fine, and the slave ip is 10.0.x.x. Ah, that’s interesting as my VPC network is 172.31.x.x. So it means there is something wrong with the instance’s 172.31.x.x NIC. Contacted AWS and their service team restart that NIC, then things are back to normal.

The ironical thing is that the AWS console still shows everything is green while the the 172.31.x.x NIC is not functional. Looks to me that AWS only monitor their internal network (in this case it is 10.0.x.x network). I have submitted a feature request to suggest them to improve the monitoring.

	Levon Ritter on AWS DataSync vs S3 Sync
	Joe on AWS Bedrock AgentCore: Enterpr…
	ABDUL YASEEN BABA MO… on TSM
	Heather W on Puppet push Nagios
	Umesh Kumar on Yum gets ‘HTTPS Error 40…
	Pavel on Check Confluence team calendar…
	withanHdammit on Renew AWS credential for a lon…
	Unleashing the Power… on Image-Reader: A project to exp…
	Bob on Build docker image with kaniko…
	Voces De La Tierra on Puppet for Windows: Remote…

Elasticache Redis Unreachable Issue

Published by Jackie Chen

Leave a comment Cancel reply

Share this:

Related

Published by Jackie Chen

Leave a comment Cancel reply