Implement a simple AWS S3 multi-file downloader in Ruby

Published Jul 30, 2021Last updated Aug 09, 2021

Recently, I was requested to implement a multi-file download feature like Google drive's. If you use AWS S3 like me, you know that there isn't a direct way to do this, so in this post I'll show you a reference implementation for that.

We are going to need a few dependencies in our project:

# Gemfile
gem 'aws-sdk-s3', '~> 1.17'
gem 'rubyzip', '~> 2.2'

This post assumes you know how to configure your AWS credentials.

Now let's get to the code! The goal here is to create a zip file which has all our desired files so we can send it to our users. The first thing to do is to download the files, with the official SDK this is quite simple:

@filepaths = @s3_keys.map do |key|
  new_path = "#{@dir}/#{key.split('/').last}" # Keep the filename but avoid the fill path
  # This should go in a separate service
  Aws::S3::Resource.new.bucket(@bucket).object(key).download_file(new_path)
  new_path
end

Now that we have the files in the machine, lets create a zip with all of them. Following the rubyzip's documentation, we can do create the zipped file like this:

Zip::File.open('Archive.zip', Zip::File::CREATE) do |zipfile|
  @filepaths.each do |filepath|
    zipfile.add(filepath.split('/').last, filepath)
  end
end

These simple steps will create our desired zip file! Now we need to put the file somewhere our users can access them. In my case I upload the zip in S3 and create a presigned url for it, but you can change this to fit your needs:

object = Aws::S3::Resource.new.bucket(@bucket).object('Archive.zip')
object.upload_file('Archive.zip')

download_url = object.presigned_url(:get)

Now that you get the idea, let me join everything to get this code working:

It's better to have a class that hides the S3 related code as much as possible:

# s3_service.rb
require 'aws-sdk-s3' # This is not needed in Rails

module S3Service
  class << self
    def upload_file(from:, to:, bucket:)
      object = object(to, bucket: bucket)
      object.upload_file(from)
      object.presigned_url(:get)
    end

    def download_file(key:, to:, bucket:)
      object = object(key, bucket: bucket)
      object.download_file(to)
    end

    def get_download_link(file_name, bucket:)
      object(file_name, bucket: bucket).presigned_url(:get).to_s
    end

    private

    def object(file_name, bucket:)
      bucket(bucket).object(file_name)
    end

    def bucket(bucket)
      Aws::S3::Resource.new.bucket(bucket)
    end
  end
end

Now we are ready to build another class that handles the download, zip and upload of the zipped file:

# multi_file_zipper_download.rb
require 'zip'

class MultiFileZipperDownload
  ZIPPED_FILE_NAME = 'Archive.zip' # Same as Google drive :P

  def initialize(s3_keys, bucket)
    @s3_keys = s3_keys
    @bucket = bucket
  end

  def call
    zip_files(download_objects)
    build_zipped_s3_key
    upload_zip
    delete_tmp_file
    @zipped_s3_key
  end

  private

  def download_objects
    @s3_keys.map.each_with_index do |key, idx|
      # Avoid replacing files with same name by using the index, you can skip this if you like
      new_path = "#{tmp_dir}/#{idx} - #{key.split('/').last}"
      S3Service.download_file(key: key, to: new_path, bucket: @bucket)
      new_path
    end
  end

  def zip_files(files)
    ::Zip::File.open(zipped_file_path, Zip::File::CREATE) do |zipfile|
      files.each do |filepath|
        zipfile.add(filepath.split('/').last, filepath)
      end
    end
  end

  def tmp_dir
    @tmp_dir ||= Dir.mktmpdir
  end

  def zipped_file_path
    "#{tmp_dir}/#{ZIPPED_FILE_NAME}"
  end

  def build_zipped_s3_key
    # I use the hash of the file to avoid collisions, but you can change this to whatever you like
    hash = Digest::SHA256.file(zipped_file_path).to_s

    @zipped_s3_key = "multi_downloads/#{hash}/#{ZIPPED_FILE_NAME}"
  end

  def upload_zip
    # I upload the zipped file to S3 so we can send a link to tje file afterwards
    S3Service.upload_file(from: zipped_file_path, to: @zipped_s3_key, bucket: @bucket)
  end

  def delete_tmp_file
    # Remove all the files to avoid disk usage leaks
    FileUtils.rm_rf(tmp_dir)
  end
end

In order to use this class you can create a simple Sinatra server to test it:

# app.rb
require 'sinatra'
require 'json'

# Here you can initialize the AWS config, use ENV variables or any other valid configuration method

require_relative 's3_service'
require_relative 'multi_file_zipper_download'

BUCKET = 'my-bucket'

get '/keys' do
  # Here you can send the list of available s3 keys using your desired storage
end

post '/download' do
  # Get the keys from the JSON body
  params = JSON.parse(request.body.read)

  zipped_key = MultiFileZipperDownload.new(params['keys'], BUCKET).call
  url = S3Service.get_download_link(zipped_key, bucket: BUCKET)

  [200, { url: url }.to_json]
end

This code will get you going with this feature, but you need to consider that downloading and uploading files from S3 will block your Ruby's server, so you should run this code in background. I'll leave that as an exercise for the reader.

Ruby AWS Amazon s3 Ruby on Rails

Report

Enjoy this post? Give Alejandro Bezdjian a like if it's helpful.

Alejandro Bezdjian

Senior Software Engineer

I have a background building backend distributed systems for different kinds of projects in several industries (financial, social networks, e-commerce, travel, car rental, etc). I also have some experience building frontend appli...

Discover and read more posts from Alejandro Bezdjian

get started