Is cross-region replication 100% fool-proof for S3 region outages?

19

2

Amazon S3 has an option of cross-region replication which should be pretty fault-tolerant against region/zone outages.

Does that mean those who are ranting about the outage did not make use of this aspect?

Or is that cross-region replication is not completely fool-proof and would not have helped?

Dawny33

Posted 2017-03-01T04:02:59.617

Reputation: 2 554

@Evgeny Thank you. I was reading that same post, before asking this :)Dawny33 2017-03-01T04:09:23.820

Answers

11

The drawback when there's replication come from the note below:

Amazon S3 routes any virtual hosted–style requests to the US East (N. Virginia) region by default if you use the US East (N. Virginia) endpoint (s3.amazonaws.com), instead of the region-specific endpoint (for example, s3-eu-west-1.amazonaws.com).

When you use replication you usualy let AWS takes care of routing the alias to one region by targetting s3.amazonaws.com in your REST request from your servers and let the redirect do it's job.

Whenever N.Virginia is down, the magic cease to work and you're out of luck to access your data and have to update your configuration to choose a specific region endpoint.

The problem does not come from the DNS (a request to the bucket itself will work) but from S3 clients, which will connect to the S3 API endpoint before accessing the bucket, in this case the dns resolution is done on s3.amazonaws.com and this is us-east-1 endpoint.

When you use regions alias, you loose the ease of load balancing over regions with the health check from AWS included.

If you use DNS cname targeting the regions to switch quickly, you're responsible of your DNS TTL but nothing guarantee cache servers of client ISP will honor your value (one of many cache your client may encounter).

And lastly, if you try to load balance yourself you'll probably create the same SPOF than AWS already have with the added burden of maintaining it.

AWS is working on it but that's all the information I have at time of writing.

Tensibai

Posted 2017-03-01T04:02:59.617

Reputation: 9 733

According to http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html it is possible to use 'bucketname.s3-eu-west-1.amazonaws.com' (substitute your favourite region) as a DNS alias. IFF that works, it can be a way to quickly switch (as quick as your pre-set TTL allows)

Michael Bravo 2017-03-01T11:07:58.487

@MichaelBravo extended the answer to address your concern :)Tensibai 2017-03-01T12:31:37.253

"Even if in theory you could use CNAME to region endpoints, the authoritative answer rely on the service in N.Virginia as far as I know and I've read about it" You're taking the quote about routing to the US East region by default out of context. Before the example-bucket bucket exists, example-bucket.s3.amazonaws.com already points to US East in DNS. Within a few minutes of initial bucket creation, this permanently changes to point to the correct regional endpoint. The warning here is that this hostname may initially be briefly misrouted immediately after bucket creation -- not later.Michael - sqlbot 2017-03-02T12:21:10.067

...so "Whenever N.Virginia is down, the magic cease to work and you're out of luck to access your data in any region by the DNS alias method" is therefore incorrect. Buckets in other regions were not impacted by the us-east-1 outage, including those referenced using this style of hostname.Michael - sqlbot 2017-03-02T12:25:40.433

1No, you don't. They change the DNS entry for your bucket in the s3.amazonaws.com zone within a few minutes of bucket creation, and this change persists independently of us-east-1. Create a bucket in another region and watch how your-bucket-name.s3.amazonaws.com resolves before, during, and a few minutes after bucket creation. The information is pushed to the s3-1.amazonaws.com zone in Route 53 after bucket creation and persists there, without further reliance on us-east-1.Michael - sqlbot 2017-03-02T12:29:07.630

@Michael-sqlbot I see where the confusion comes from, indeed the quote may not be the better back up for the position. I'll try to find another one and edit.Tensibai 2017-03-02T12:42:35.370

@Michael-sqlbot I've reworded the answer and cleaned up the comments. Indeed it was misleading focusing on the DNS part where it's the API endpoint the root cause.Tensibai 2017-03-02T14:34:52.647

10

Many big companies would be at fault for not using this feature. It does add additional cost, and historically any kind of real disaster recovery solution is untested even if implemented.

Other than the cost issue, companies who are actively using cross-region replication can offer a valid concern regarding the latency it takes for an object to replicate. S3 does not allow (as far as I know) read-after-write consistency on replicated objects, while it does allow it for a bucket in a single region.

This SE question raises a concern where objects are not being replicated properly, or take too long to replicate. Provided that cross-region replication is done in a mode of eventual consistency, there are a lot of concerns to address.

Evgeny

Posted 2017-03-01T04:02:59.617

Reputation: 7 247

8I would put further emphasis on the fact that S3 cross-region replication offers eventual consistency for some operations. That is not trivial to take into account. Depending on the application, it might be downright unacceptable. In any case, it is not fool-proof (can lead to bigger issues if someone assumes it's magic)Alexandre 2017-03-01T04:19:13.020