Solving Big File Woes: Unveiling the Magic of S3 Multipart Upload

In an earlier article I wrote on Mastering the AWS CLI for Amazon S3, I covered the basics of working with Amazon's Simple Storage Service from the command line. I looked at commands for creating and deleting buckets, creating, uploading and deleting objects, and more. Now that we are familiar with the fundamentals, it's time to look at more advanced S3 capabilities.

Imagine the following scenario: You are working for a company that deals with large multimedia files and you have been given a large 2 terabyte (TB) file that you need to upload to Amazon S3. Trying to upload such a large file as a single file could be risky. If the transfer fails at any point, you would need to restart the entire upload, wasting time and resources. This is where S3 Multipart Uploads come to the rescue.

S3 Multipart Uploads allow you to upload large objects in parts. This is useful if you have files larger than 5GB, which is currently the maximum size for a single PUT operation in S3. Multipart uploads also offer enhanced throughput and the ability to stop and resume uploads if there are any network issues. In this article, I will explain how you can use the AWS CLI to leverage multipart uploads for large files or enhanced performance. I will cover initiating, uploading parts of, and completing multipart uploads. By the end, you'll have the knowledge to improve upload times and support large objects with S3 through the command line. So let's get started!

S3 Multipart Upload Commands

To start using Multipart Upload for S3, you will first need to ensure that you have installed the AWS CLI tool on your laptop/computer and have the necessary permissions to use S3. If you are practicing using your own AWS account, you will need to assign the AdministratorAccess policy on your IAM user account. You will also need to make sure that you have generated an access key and secret access key for your user. If you have not already done so, you will need to run the aws configure command where you will insert your access key, secret access key, as well as your preferred AWS region and CLI output (i.e. us-east-1, JSON, text, etc).

Once you have done this, you can then start uploading your files. You will of course need a bucket! To create one, run the following:

aws s3 mb s3://name_of_your_new_bucket

Remember your bucket name will need to be globally unique across all of AWS otherwise the AWS CLI will return an error.

Initiating a Multipart Upload

To start a multipart upload, enter the command:

aws s3api create-multipart-upload --bucket your_bucket_name --key your_file_name

This will initialise the upload and return an upload ID to use for each part. In the above command, the command line option --key refers to the name of the file you are uploading. Take note of the Upload ID as you will need it as you continue.

If you don't have any large files to hand for testing, you can download various sized files from https://fastest.fish/test-files.

Splitting your original file to smaller parts

The next step is to divide your original file into smaller parts so that each of these parts can be individually uploaded to S3 (e.g. if your original file is 105mb, you could break it up into 30mb pieces). To do this, run the command:

split -b 100M -d your_original_large_file_name your_new_small_file_part_name