GCP – Graceful shutdowns on Cloud Run: Deep dive
Cloud Run now sends a SIGTERM signal to your container instance before the container instance terminates, due to an event like scale down or deleted revision. By handling this signal, you can now gracefully terminate your applications and do some cleanup tasks –as opposed to an abrupt shutdown of the container.
In this blog, we will explore some use cases for this and how you can try it out.
Graceful shutdowns
When a container instance is shut down on Cloud Run, a SIGTERM signal will be sent to the container and your application will have 10 seconds to exit. If the container does not exit by then, a SIGKILL signal (which you cannot capture) will be sent to abruptly close your application. If you choose not to write a signal handler for SIGTERM, your process is terminated instantly.
Using this termination signal, you can perform various “graceful shutdown” tasks in your application code:
-
Flush monitoring data: If you use Cloud Trace or upload metrics from your application, you can develop a signal handler and call the function that flushes out the trace spans collected before your container quits and loses these in-memory trace spans that are not uploaded.
-
Log termination of your container: By logging the termination event of the container, you can refer to your application logs to see when a specific container instance has started and exited, and get full visibility into the lifecycle of individual container instances.
-
Close file descriptors or database connections: Some abruptly quit connections can confuse the connected servers and cause them to keep connections open for a long time than gracefully disconnecting.
The graceful termination signal is primarily sent to your application when it’s scaling down container instances that are not getting traffic. Therefore, you don’t need to handle draining in-flight requests in your signal handler. However, you might sometimes receive this signal before your container will be shut down due to underlying infrastructure reasons and your container might still have in-flight connections. The graceful termination is therefore not always guaranteed.
Trapping signals, the right way
Most programming languages provide libraries to trap termination signals like SIGTERM and run routines before your program terminates.
If your application does not receive the termination signal on Cloud Run, the most prominent reason for this might be because your application is not running as the init process (PID 1) and its parent process is not forwarding the signal appropriately.
The leading reason why this happens is that the ENTRYPOINT statement in your container image’s Dockerfile is not set directly to your application process. For example, the Dockerfile statement:
ENTRYPOINT node server.js
internally is translated to:
ENTRYPOINT ["/bin/sh", "-c", "node server.js"]
when your Dockerfile is executed to build a container image.
Most notably, the GNU /bin/sh and other shells like bash do not forward signals to child processes by default. Therefore, you should write your entrypoint statements in the vector form, like the following, to prevent your app to be executed as the sub-process of a shell:
ENTRYPOINT ["python", "server.js"]
Similarly, if you use an entrypoint script to kick off background processes in your containers, consider using a proper init process that can forward signals to child processes, such as tini, dumb-init or supervisord. (I have compared init process alternatives for the multi-process container use case here.)
See graceful shutdowns in action
To try trapping SIGTERM signals in Cloud Run, let’s a small Node.js server app to try this out. For this purpose, let’s take the Node.js sample application for Cloud Run. You can download it from this repository on GitHub.
Add this snippet of code to index.js:
process.on('SIGTERM', function () {
console.log('helloworld: received SIGTERM, exiting gracefully');
process.exit(0);
});
After completing this step, you can now build and push this container image, and deploy it to Cloud Run. As part of your new deployment a container instance is spun up to handle the request.
After some time passes, your container will scale to zero since it is not getting any requests. (If you want to trigger a scale-to-zero event, you can also edit your Cloud Run application’s settings such as CPU or memory on the Google Cloud Console. This will deploy a new revision, and the old revision will be turned off.)
The scale-to-zero will trigger a SIGTERM signal to be sent to your container before it is shut down, and you can see the graceful shutdown routine executed in the Logs tab:
As you can see, Cloud Run does not require additional settings to enable graceful shutdowns. It’s turned on by default for all Cloud Run services.
Conclusion
If you have cleanup tasks or have monitoring data to push out before your serverless container instances on Cloud Run shut down, give termination signals a try. Check out our documentation for more information about this feature.
Read More for the details.