AWS – Amazon SageMaker Inference now supports rolling update for inference component endpoints
Amazon SageMaker Inference now supports rolling updates for inference component (IC) endpoints. This allows customers to update running IC endpoints without traffic interruption while using minimal extra instances, rather than requiring doubled instances as in the past. SageMaker Inference makes it easy to deploy ML models, including foundation models (FMs). As a capability of SageMaker Inference, IC enables customers to deploy multiple FMs on the same endpoint and control accelerator allocation for each model.
Now, rolling updates enables customers to update ICs within an endpoint batch by batch, instead of all at once like the previous blue/green update method. Blue/green updates required provisioning a new fleet of ICs with the updated model before shifting traffic from the old fleet to the new one, effectively doubling the number of required instances. With rolling updates, new ICs are created in smaller batches, significantly reducing the number of additional instances needed during updates. This helps customers minimize costs from extra capacity and maintain smaller buffer requirements in their capacity reservations.
Rolling update for IC is available in all regions where IC is supported: Asia Pacific (Tokyo, Seoul, Mumbai, Singapore, Sydney, Jakarta), Canada (Central), Europe (Frankfurt, Stockholm, Ireland, London), Middle East (UAE), South America (Sao Paulo), US East (N. Virginia, Ohio), and US West (N. California, Oregon). To learn more, see the documentation.
Read More for the details.