A Study on using Google Cloud Storage with the S3 Compatibility API

Vamsi Ramakrishnan
6 min readSep 24, 2021

--

Google Cloud Storage’s XML API Provides Interoperability with some of the client libraries that use S3. If you have existing applications that read and write data using the S3 API/SDK/Client Libraries within configuration changes and minimal code change. While compatibility offers ease, it is important to be aware about where things break. This post outlines some of those scenarios

Sections

The post has 4 major sections

1. Where things break ( TL;DR )
2. Server Side Config
3. Client Side Config
4. Code Samples

Where things break

ACLs

There are some minor differences in the way AWS’s Predefined/Canned ACLs work and the way GCPs Canned ACLs work. Before that, a small refresher in ACL Concepts

ACLs have 2 Properties
1. Grantees ( Who gets access )
2. Scope ( How much access do they get )
ACLs are of 2 types
1. Canned ACLs ( Predefined Scopes & Grantees )
2. Custom ACLs ( Custom Scope & Grantees )
ACLs can be Applied at 2 Levels
1. Bucket
2. Object

Differences in Canned ACLs

The bolded ones where upstream programs use these permissions will make the the Compatibility break.

| AWS Canned ACL            | GCP Canned ACL            | Applies |
|---------------------------|---------------------------|---------|
| private | private | Both |
| public-read | public-read | Both |
| public-read-write | public-read-write | Both |
| aws-exec-read | - | Both |
| authenticated-read | authenticated-read | Both |
| bucket-owner-read | bucket-owner-read | Object |
| bucket-owner-full-control | bucket-owner-full-control | Object |
| log-delivery-write | - | Bucket |
| - | project-private | Both |

CORS

2 Types of CORS 
1. Simple
2. Preflighted

While both GCS and AWS S3 have the same fields in CORS the way we specify CORS Configurations are different and hence reusing the client libraries is not possible in this case.

| AWS CORS       | GCP CORS        |
|----------------|-----------------|
| AllowedHeaders | ResponseHeaders |
| AllowedMethods | Methods |
| AllowedOrigins | Origins |
| MaxAgeSeconds | MaxAgeSec |

So when setting up CORS Configuration at the bucket the following error pops up

ClientError: An error occurred (MalformedLifecycleConfiguration) when calling the PutBucketLifecycleConfiguration operation: The XML you provided was not well-formed or did not validate against our published schema.

Object Lifecycle Policy

While setting Object Lifecycle Policies are supported by the XML API, the request structures are different in the case of GCS , and you will recieve this common error

ClientError: An error occurred (MalformedLifecycleConfiguration) when calling the PutBucketLifecycleConfiguration operation: The XML you provided was not well-formed or did not validate against our published schema.

Object Integrity Check

If your application uses Object Integrity Checks in it’s logic during upload then you may want to read this. We encounter 2 Broad Scenarios in Object Integrity Checks

2 Scenarios
1. Single Part Upload
2. Multi-Part Upload

There are different types of File Integrity Checks

Types of File Integrity Checks
1. CRC32C
2. MD5
3. ETags

How do these components Relate, The client side while uploading needs to validate

AWS Multi-Part Upload

On multipart uploads, the etagis computed by taking the binary encoding of each part’s md5hash, concatenating them together, doing an md5of that, hex-encoding the result, then appending the - followed by the number of parts.

GCP Multi-Part Upload

GCP Composite crc32cis computed by taking the individual part’s crc32cs

A Table comparing the differences

|     | Single Part | Multi-Part                            |
|-----|-------------|---------------------------------------|
| GCP | MD5, CRC32C | b64 encoded CRC32C (CRC32Cs of Parts) |
| AWS | MD5 | Hex encoded MD5 ( MD5 of Parts ) |

Server Side Configuration

Step 1: Create a Custom Role (Not Mandatory), skip this step if a pre-defined role can be assigned to the Principal.

Step 2: Create a Service Account/Accounts

Step 3: Add Storage Admin Role to Service Account

Or Assign the Custom Role that was created

Step 4: Go to Cloud Storage, Copy Storage Endpoint

Step 5: Create HMAC Keys

Once you have the HMAC Keys and the Interop Endpoint Setup for that project you are all set to use the S3 Interoperability

Client Side Configuration

Call out to this Stack Overflow Post

As a Resource

s3_resource = boto3.resource(service_name='s3', endpoint_url=GCP_URL, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, region_name=GCP_REGION_NAME)

As a Session

session = Session()
s3_session = session.resource(service_name='s3', endpoint_url=GCP_URL, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=ACCESS_KEY, region_name=GCP_REGION_NAME)

As a Client

s3_client = boto3.client('s3', endpoint_url= GCP_URL, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, region_name=GCP_REGION_NAME)

Or alternatively change the boto.cfg

cat /etc/boto.cfg

Add the right values

[Credentials]
aws_access_key_id = ACCESS_KEY
aws_secret_access_key = SECRET_KEY
s3_host = storage.googleapis.com

Code Samples

Skipping the simple List Bucket, List Objects based examples as they are repetitive.

1. Create Bucket
2. Multipart Upload
3. Signed URLs
4. Object Versioning Enable

Create Bucket

Please note that the default configuration for any GCP Region with multi-region bucket configuration follows that multi-region as a default. Specifying

import boto3s3 = boto3.resource(service_name='s3', endpoint_url=GCP_URL, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, region_name=GCP_REGION_NAME)
s3.create_bucket(Bucket = BUCKET_NAME, CreateBucketConfiguration= {'LocationConstraint': GCP_REGION_NAME })

Multi-part Upload

import boto3
from boto3.s3.transfer import TransferConfig
s3 = boto3.resource(service_name='s3', endpoint_url=GCP_URL, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, region_name=GCP_REGION_NAME)
config = TransferConfig(multipart_threshold=1024 * 25,
max_concurrency=10,
multipart_chunksize=1024 * 25,
use_threads=True)
file_path = os.path.dirname(__file__) + FILE_NAMEs3_resource.Object(BUCKET_NAME, OBJECT_NAME).upload_file(file_path,
ExtraArgs={'ContentType': 'xxx/yyy'},
Config=config)

Signed URLs

It is a URL that provides limited permission and time to make a request. Signed URLs contain authentication information in their query string, allowing users without credentials to perform specific actions on a resource.

import boto3s3 = boto3.resource(service_name='s3', endpoint_url=GCP_URL, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, region_name=GCP_REGION_NAME)response = s3.meta.client.generate_presigned_url('get_object', Params={'Bucket': BUCKET_NAME, 'Key': OBJECT_NAME}, ExpiresIn=EXPIRATION)print(response.data)

Object Versioning

Enable object versioning in a bucket

import boto3s3 =boto3.resource(service_name='s3', endpoint_url=GCP_URL, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, region_name=GCP_REGION_NAME)versioning = s3.BucketVersioning(BUCKET_NAME)
versioning.enable()
print(versioning.status())

--

--

Vamsi Ramakrishnan
Vamsi Ramakrishnan

Written by Vamsi Ramakrishnan

I work for Google. All views expressed in this publication are my own. Google Cloud | ex-Oracle | https://goo.gl/aykaPB

No responses yet