Codementor Events

AWS S3 Consistency Model

Published Jul 13, 2020

AWS S3 is a highly scalable and durable storage service provided by Amazon. For individuals/companies that are switching to AWS, S3 will be an obvious choice for its storage solutions. It's important to understand S3’s consistency model and its limitations and how it impacts applications.

S3 data consistency model states that it provides read-after-write consistency for PUTS of new objects with one caveat. The caveat is that if you make a HEAD or GET request to an object before its created, then create the object shortly after that, a subsequent GET might not return the object due to eventual consistency. And it offers eventual consistency for overwrite PUTS and DELETES in all Regions.
Before delving into what this data consistency model means, let's see how AWS's global infrastructure is structured.

AWS Infrastructure.png

AWS infrastructure is structured as Regions, Availability Zones, and data centers. A region consists of multiple availability zones, typically ranging from 2 to 6. Availability zone is a logical grouping of multiple data centers.
As of this writing, S3 standard storage provides 99.99% availability and 99.999999999% durability (11 9’s). S3 achieves its high durability and availability by replicating objects across multiple devices spanning a minimum of 3 availability zones(except for One Zone-IA) in a given region.
The tradeoff/limitation to achieve high availability, durability while achieving high throughput is its consistency model. Now let's understand the S3 consistency model.

Read-after-write consistency

S3 provides read after write consistency for new objects. This indicates that read operation on an object after write operation will always be a success. So, any new objects that are created will be replicated across multiple availability zones before returning success.

PUT /myObjects/file-1.jpg — 200 
GET /myObjects/file-1.jpg — 200

There is a caveat here - if you make a HEAD or GET request to an object before its created, then create the object shortly after that, a subsequent GET might not return the object.

GET /myObjects/file-1.jpg — 404
PUT /myObjects/file-1.jpg — 200
GET /myObjects/file-1.jpg — 404

Caching may be the reason for this behavior(we don’t know the internal implementation of S3)

Eventual consistency

S3 offers eventual consistency for overwrite PUTS and DELETES. This indicates any GET/HEAD operation shortly after overwrite PUT and DELETE may or may not see the updated value.

PUT /myObjects/file-2.jpg — 200
DELETE /myObjects/file-2.jpg — 200
GET /myObjects/file-2.jpg — (200 or 404)
PUT /myObjects/file-3.jpg — 200
PUT /myObjects/file-3.jpg — 200 (updated content)
GET /myObjects/file-1.jpg — 200 (can be old content or updated content)

Update and Delete operations return success before propagating the change to all the object’s availability zones which results in inconsistent behavior for operations executed shortly after update/delete. Once the changes are propagated to all availability zones, it will be consistent behavior.

One thing to note here is, the above consistency model is independent of client sending requests. GET operations can be from the same or different from the client making PUT, DELETE requests.

Please refer to https://en.wikipedia.org/wiki/Consistency_model for various consistency models of distributed systems.

Discover and read more posts from Sudha Vankadara
get started