Graceful Shutdown
a bunch of gophers gracefully shutting/closing their shops
Original: https://blog.eightnoteight.dev/p/graceful-shutdown
If you are working on a go service, it’s essential to properly shut down without causing errors to clients and not losing any information while shutting down. Without a proper graceful shutdown, the clients would get errors whenever a container is shutting down on the server side. In the worst case, it could lead to data loss as well. Let’s better understand the below example.
Simple HTTP Server
For example, imagine “a simple http service that logs some payloads in kafka and some are saved in mysql”. This http service running as a container in prod is essentially a system with input as incoming requests and output as the response for the incoming requests, producing messages to kafka topic, inserting into mysql using a mysql connection.
// pseudo code
func (svc Service) HandleRequest(ctx context.Context, request http.Request) (*http.Response, error) {
err := svc.mysqlClient.Query(
"INSERT INTO request_audit_log(request_payload, received_timestamp) "
+"VALUES (?, ?)",
request.Payload,
time.Now(),
)
if err != nil {
return nil, err
}
err := svc.kafkaClient.Produce(encode(Message{
Payload: request.Payload,
ReceivedTimestamp: time.Now(),
}))
if err != nil {
return nil, err
}
return &http.Response{
Payload: "success",
}, nil
}
As soon as this system receives a request, it can’t instantly produce all the outputs. So there is a delay between receiving the incoming request and generating all the necessary outputs. From the above request, we could infer the order of outputs, i.e. the first output is MySQL, and then kafka and finally, response.
So to gracefully shut down the application, we could just stop the incoming requests from the system and then just wait for the response output for all the corresponding incoming requests. and this is exactly what http.Server
does when you trigger srv.Shutdown(ctx)
https://pkg.go.dev/net/http#Server.Shutdown
func (srv *Server) Shutdown(ctx context.Context) error
Shutdown gracefully shuts down the server without interrupting any active connections. Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down. If the provided context expires before the shutdown is complete, Shutdown returns the context's error, otherwise it returns any error returned from closing the Server's underlying Listener(s).
Outputs of the System outside the Request/Response Lifecycle
So it is simple, we need to hook up the Shutdown
function with SIGTERM
, SIGINT
signals and we are set, right? Well, no. Let's tweak the above HTTP server a little bit by changing the kafka producer from sync to async producer, here the async producer will buffer the produced records in memory and flush out to kafka at regular intervals, so the kafka output of the system could come after the response output of the system. the same principle applies to other kinds of in-memory buffers, worker pools, stray go-routines, MySQL connections, redis connections etc.
In this case, you would start noticing drops in messages during deploys, container rotation, and normal scale-down.
So now, the graceful shutdown needs to be applied to the kafka producer as well, and every other component that maintains buffers or any other resources(connections, goroutines etc.) with it even after the lifecycle of the request/response.
Order of Packages Graceful Shutdown
from the above example, if we gracefully stop the kafka producer first, then until the HTTP server is stopped, requests that write any data to kafka will get errors. And the same would happen if parallelly try to gracefully stop the async kafka producer and HTTP server. so the order of shutting down individual packages of the system is essential.
Here the order comes from the fact that the kafka producer needs to be passed to the HTTP server handler, so until the server is shut down, it is not safe to close the kafka producer as the handler could use the kafka producer object anytime.
For more complex systems, there could be multiple levels of buffers and ongoing operations like the above HTTP server example. So we must strictly follow a proper order when shutting down the packages.
Dependency Injection FTW
After some thinking, it is straightforward to reason that we need to gracefully shut down all the dependers of a package before shutting down the package itself. We already know the dependers during the program's startup as we would have manually wired them together using dependency injection. The shutdown order needs to replicate the startup dependency injection in reverse; this way, whenever we are trying to gracefully shut down a component, we can be relieved that no one else is in possession of that component object and still using it.
// pseudo code
func main() {
mysqlClient := NewMySQLClient()
kafkaProducerClient := NewKafkaProducerClient()
svc := NewService(
mysqlClient,
kafkaProducerClient
)
httpServer := http.NewServer()
httpServer.Register("/", svc.HandleRequest)
httpServer.ListenAndServeNonBlocking()
// unblock once there is a graceful shutdown signal
<- signal.Notify(signal.SIGINT, signal.SIGTERM)
// creation order is
// 1. mysql client
// 2. kafka producer client
// 3. http request handler service
// 4. http server
// so now the shutdown order needs to be exactly reverse
// 1. http server
// 2. http request handler service
// 3. kafka producer client
// 4. mysql client
// http.Server will make sure to stop accepting new requests
httpServer.Close()
// in case the request handler maintains any buffers or state, this will trigger the necessary cleanup
svc.Close()
// kafka producer client will flush all the data it buffered to the kafka and close connections
kafkaProducerClient.Close()
// mysql client will gracefully close all the connections and makes sure that
mysqlClient.Close()
}
Not following dependency injection or using global packages, or using cyclic dependencies in the code significantly complicates the graceful shutdown logic and makes it harder to gracefully shutdown in all scenarios.
Conclusion
While the graceful shutdown is widely documented, it is also very important to note about the component's shutdown order. Following Dependency Injection will indirectly provide a way to figure out what is the right order of component’s shutdown.