We noticed that one of our Java applications that is deployed on Elastic Beanstalk has hourly service interruption. Each outage only lasts for about 40 – 60 seconds. By looking at the CloudWatch, the ELB has intermittent 5xx errors every one hour, like 09:01, 10:01, 11:01 …
After some investigations, I found it out that it is by httpd logrotation. More specifically, it is the post script in the logrotation configuration (/etc/logrotate.d/logrotate.elasticbeanstalk.httpd.conf)
/sbin/service httpd reload > /dev/null 2>/dev/null || true
Under the hood ‘service httpd reload‘ runs as ‘/etc/init.d/httpd reload’ which send HUP signal to httpd parent process to tell it kill all child processes immediately then reload the http config file. That’s exactly how the service interruption happens. As it is not a graceful reload. Graceful reload signal should be USR1 which let all the child processes to finish the in flight requests before exit. Once I changed HUP to USR1 in the init script, the issues did not happen again (starting 21:00 in the above graph).
reload() { echo -n $"Reloading $prog: " if ! LANG=$HTTPD_LANG $httpd $OPTIONS -t >&/dev/null; then RETVAL=6 echo $"not reloading due to configuration syntax error" failure $"not reloading $httpd due to configuration syntax error" else # Force LSB behaviour from killproc LSB=1 killproc -p ${pidfile} $httpd -HUP RETVAL=$? if [ $RETVAL -eq 7 ]; then failure $"httpd shutdown" fi fi echo }
References:
https://httpd.apache.org/docs/2.4/stopping.html
https://www.datadoghq.com/blog/top-elb-health-and-performance-metrics/