Codementor Events

Ruby on Rails multitenancy in 2020

Published Jul 02, 2020

2020 is definitely the craziest year I have seen in my entire life. What is happening around the world, no one could see that coming. It's a mad time.

Here are some crazy things that happened:

  • Almost WWIII
  • Australian wildfires
  • COVID-19
  • Trump impeachment
  • Kobe Bryant death
  • George Floyd riots

Meantime, in the Ruby On Rails we are stuck in the old ways. An earlier solution for multitenancy in Rails is still widely used even though it is not optimal in contemporary times.

I am talking about you, https://github.com/influitive/apartment. You were quite a good fellow over the years and truly useful, but it's time to move on. Let's start this new decade with something that doesn't cause so many problems.

What is multitenancy?

Multitenancy usually means handling many independent tenants from one app. Most popular in SAAS models. There are several types of implementation. Let's look at them closer.

Row-level approach

You have a tenant_id column in every DB table and filter by tenant_id in each query.

Advantages:

  • Just another row in the table
  • No brainer
  • Migrations run in O(1) complexity
  • Comes out of the box in Rails
  • No additional costs
  • No operational overhead
  • No cost of switching

Disadvantages:

  • tenant_id keys everywhere
  • Leaking of data between tenants possible, if you forget about WHERE

Schema-level approach (Apartment gem)

Every tenant has a separate namespace. In practice, we are creating more tables like:

  • tenant1.users
  • tenant2.users
  • tenant3.users

Namespacing is easily achievable with PostgreSQL schemas.

Advantages:

  • No data leakages between tenants
  • Easy extraction of single-tenant data
  • No additional costs

Disadvantages:

Database-level approach

New database set up for every tenant. Not practical, including just for comparison purposes.

Advantages:

  • Almost impossible to get data leakage between tenants
  • Non-invasive
  • Easy extraction of single-tenant data

Disadvantages:

  • Additional costs of maintenance of multiple databases
  • Hard to create databases on the fly
  • Establishing a separate database connection is required
  • Migrations run in O(n) complexity

As you see, the row-level is a clear winner in most cases.

What if you already use an apartment gem?

The cost of switching may look intimidating. Great that you have PTSD Engineer with you. I don't think it was ever done on the internet. I never came around this kind of guide. Let's talk about switching from Apartment to a row-level approach. Mission Impossible? Don't have nerves from steel? Just keep reading, you may be amazed. I want to warn that it is only one approach to this kind of migration. There are other options, but this seems most straight-forward.

Starting from scratch or using a library

I will not write about starting from scratch. We already have enough problems on our hands. We can trust one company that never disappointed me https://www.citusdata.com/ They are an incredible company with significant expertise in databases. I believe them, so should you. They came up with a clean solution to our problem https://github.com/citusdata/activerecord-multi-tenant

This gem solves our problems; syntax looks kinda similar to Apartment, so that should also be useful.

Where to start?

Right now, we have probably a hundred calls to Apartment, like this: Apartment::Tenant.switch(tenant) do. It would be tough to change all of them. What if we want to go back?

My solution, let's create a wrapper that will help with the transition.

class Multitenancy
  def self.switch(tenant)
    Apartment::Tenant.switch(tenant) do
      yield
    end
  end

  def self.switch!(tenant)
    Apartment::Tenant.switch!(tenant)
  end

  def self.current
    Apartment::Tenant.current
  end

  def self.tenant_names
    Apartment.tenant_names
  end

  def self.drop(tenant)
    Apartment::Tenant.drop(tenant)
  end

  def self.create(tenant)
    Apartment::Tenant.create(tenant)
  end

  def self.reset
    Apartment::Tenant.reset
  end
end

Now we can call our wrapper and get ready for making first changes.

How to merge all the schemas into one?

The main problem of our migration, we have multiple schemas. There will be broken constraints if we try to merge into one: primary keys, foreign keys.

Our options:

  • Calculate exactly on which id every table is ending and add them using a complicated script
  • Exchange id in tables and foreign keys *_id with UUID. That would probably be hard and would require changes on the frontend
  • Use composite keys, a fantastic option. Unfortunately, not well supported in Ruby on Rails. Sorry, it's too risky to go with this one.

Let's think outside the box and approach our problem from a different perspective. Why are we caring so much that ids have to be in consecutive order? What if there is another way?

Do gaps happen in databases? Yes, they do. If we have users from 1 to 1 000 000 , and users between 20 000 and 100 000 will be hard deleted we will have a gap. Databases are ready for those cases to happen a lot.

Let's use a simple formula for migration. Find the highest id in all the tables across all schemas and use it as our offset number. Then we need to add a tenant_id column to all tables, that are used per schema.

The formula

def offset_record(practice_id)
    # 10_000 as our arbitrary number in these case, yours will be different
    # tenant_id will be the id of our new column
    10_000 * tenant_id
end

Formula for calculating offsets

Let's create a migration that will add the practice_id column to tables.

class AddTenantIdToTables < ActiveRecord::Migration[5.2]
  def change
    add_column :users, :tenant_id, :integer
    add_column :customers, :tenant_id, :integer
    ...
  end
end

Now we are left with the task of filling those ids based on schemas. We could approach it this way.

SCHEMA_MODELS = [
      Users,
      Clients,
    ].freeze

    ActiveRecord::Base.transaction do
      Tenant.all.each do |tenant|
        Multitenancy.switch(tenant.domain) do
          update_model_tables(tenant_id: tenant.id)
        end
      end
    end
  end
end

def update_model_tables(tenant_id:, )
  SCHEMA_MODELS.each do |model|
    model.update_all(tenant_id: tenant_id)
  end
end

Filling practice id for each tenant

Running this will seed our database with correct tenant_ids that will be helpful in migration.

What about our images?

Yes, you are right; if we don't figure something out, we will break them. Let's migrate images too. In this guide, I am assuming that we are using Carrierwave.

Our uploaders should be ready for this change. That's how BaseUploader should look like.

class BaseUploader < CarrierWave::Uploader::Base
  # our base uploader to keep a code a little bit dry
  OFFSET = 10_000

  def current_tenant_domain
    Multitenancy.current
  end

  def tenant_id
    Tenant.find_by(domain: current_tenant_domain).id
  end

  def migration_offset
    tenant_id * OFFSET
  end

  def model_new_id
    model.id + migration_offset
  end
end

And this will be our typical uploader. We will have many of them, but I think you will figure out what to do next.

class CVUploader < BaseUploader
  def store_dir
    "tenants/#{current_tenant_domain}/#{model.class.to_s.underscore}/#{model.id}/#{mounted_as}"
  end

  before :store, :swap_out_store_dir

  def swap_out_store_dir(_test)
    self.class_eval do
      def store_dir
        "tenant/#{current_tenant_domain}/#{model.class.to_s.underscore}/#{model_new_id}/#{mounted_as}"
      end
    end
  end
end

Now let's write a script to migrate everything. At this point, I should add that you should have a testing environment with production backup and production s3 bucket clone , don't believe you can do it on your first approach. It's a complicated process; I'm trying to make it as easy as possible.

MODELS_AND_FIELDS = {
    cv_uploader: [:file]
  }.freeze

  ActiveRecord::Base.transaction do
    Tenant.find_each do |tenant|
      Apartment::Tenant.switch(tenant.domain) do
        puts "<<<<<<<<<<<<< Tenant #{tenant.name} START"

        MODELS_AND_FIELDS.each do |key, tab|
          puts " <<<<<<<< Model #{key} START"

          key.to_s.classify.constantize.find_each do |record|
            puts record.id
            tab.each do |resource_name|
              if record.public_send(resource_name).present? && record.public_send(resource_name).file.exists?
                record.public_send(resource_name).cache!
                record.public_send(resource_name).store!
              end
            end
          end

          puts " <<<<<<<< Model #{key} FINISHED"
        end
      end

      puts "<<<<<<<<<<<<< Tenant #{tenant.name} FINISHED"
    end

After migration passes, you need to change uploaders so that they won't use the before_store callbacks.

This code should be deleted:

before :store, :swap_out_store_dir

  def client_new_id
    model.client_case.client.id + migration_offset
  end

  def swap_out_store_dir(_test)
    self.class_eval do
      def store_dir
        "practices/#{current_practice_domain}/client/#{client_new_id}/#{model.class.to_s.underscore}/#{model_new_id}/#{mounted_as}"
      end
    end
  end

Migrate offset-ids for Rails models

The code is straight-forward. All my examples are in code, but obviously, all should be in separate classes. It's just for more fundamental understanding.

SCHEMA_BASED_MODELS = [
    Document,
    Task,
  ].freeze

  IRREGULAR_COLUMNS = ['third_api_account_id', ''tenant_id',].freeze

  def call
    # I am disabling triggers so won't get any messages about breaking constraints
    disable_db_triggers

    tenants.each do |tenant|
      Rails.logger.info "Migrating ids for tenant #{tenant.domain}"

      Apartment::Tenant.switch(tenant.domain) do
        update_ids_for_models(tenant)
      end
      Rails.logger.info "Finished migrating ids for tenant #{tenant.domain}"
    end
  ensure
    enable_db_triggers
  end

  private

  def tenants
    @tenants ||= Tenant.all
  end

  def update_ids_for_models(tenant)
    SCHEMA_BASED_MODELS.each do |model|
      columns = columns_for_model(model)
      model.update_all(sql_update_all(columns, tenant.id))
      model.update_all(tenant_id: tenant.id)
    end
  end

  def columns_for_model(model)
    model.columns.select do |column|
      column.name.match?(/^(.+_id|id){1}$/)
    end.map(&:name) - IRREGULAR_COLUMNS
  end

  def sql_update_all(columns, tenant_id)
    # generates update all query
    columns.map do |column|
      "#{column} = #{column} + #{offset_id(tenant_id)}"
    end.join(',')
  end

  def offset_id(tenant_id)
    # simple formula
    10_000 * tenant_id
  end

  def disable_db_triggers
    ActiveRecord::Base.connection.execute("SET session_replication_role = 'replica';")
  end

  def enable_db_triggers
    ActiveRecord::Base.connection.execute("SET session_replication_role = 'origin';")
  end

Creating a temporary schema dump

Now let's merge those schemas into one, called temporary.

We need to create our temporary schema first  Apartment::Tenant.create('temporary')

TABLE_NAMES = ['documents',
                   'invitations',
                   'messages',
                   'notifications',
                 ].freeze

    PUBLIC_TABLE_NAMES = ['tags',
                          'taggings',
                          'super_admins',
                          'users'].freeze

    schemas_sql = "SELECT nspname AS schema_name FROM pg_namespace WHERE NOT (nspname ~ '^pg_' or nspname = 'information_schema' or nspname = 'public' or nspname = 'temporary')"
    schemas = ActiveRecord::Base.connection.execute(schemas_sql).to_a.map do |row|
      row['schema_name']
    end

# again to avoid problems with constraints
    ActiveRecord::Base.connection.execute("SET session_replication_role = 'replica';")
    TABLE_NAMES.each do |table_name|
      columns_sql = <<~SQL
        select column_name
        from INFORMATION_SCHEMA.COLUMNS
        where table_name = '#{table_name}' AND table_schema = 'name of primary schema, with proper data structure';
      SQL

      columns = ActiveRecord::Base.connection.execute(columns_sql).to_a.map do |row|
        row['column_name']
      end.uniq

      schemas.each do |schema|
        columns_joined = columns.join(', ')

        insert_into_sql = <<~SQL
          INSERT INTO temporary.#{table_name} (#{columns_joined}) (
            SELECT #{columns_joined} from "#{schema}".#{table_name}
          )
        SQL

        ActiveRecord::Base.connection.execute(insert_into_sql)
      end
    end

    PUBLIC_TABLE_NAMES.each do |table_name|
      columns_sql = <<~SQL
        select column_name
        from INFORMATION_SCHEMA.COLUMNS
        where table_name = '#{table_name}' AND table_schema = 'primary schema, with good data structure';
      SQL

      columns = ActiveRecord::Base.connection.execute(columns_sql).to_a.map do |row|
        row['column_name']
      end.uniq

      columns_joined = columns.join(', ')

      insert_into_sql = <<~SQL
        INSERT INTO temporary.#{table_name} (#{columns_joined}) (
          SELECT #{columns_joined} from "public".#{table_name}
        )
      SQL

      ActiveRecord::Base.connection.execute(insert_into_sql)
    end

# and back to normal session replication role
    ActiveRecord::Base.connection.execute("SET session_replication_role = 'origin';")

Okay, we have all schemas merged into a temporary one. Let's create dump.

pg_dump --username postgres -n temporary database_production > ~/temporary.sql

We are after the hardest steps, almost over. Let's add our new gem to Gemfile

gem 'activerecord-multi-tenant'

And let's update our wrapper. This migration will take place during multiple deploys; the maintenance window is recommended. If you can't allow yourself for a maintenance window, you will have to create particular triggers in the database that will sync data.

class Multitenancy
  def self.switch(tenant)
    MultiTenant.switch(tenant) do
      yield
    end
  end

  def self.switch!(tenant)
    MultiTenant.current_tenant = tenant
  end

  def self.current
    MultiTenant.current_tenant
  end

The database has to be recreated now, commands to use in Postgres. Remember about backups before this step:

  1. dropdb -U postgres database_production
  2. createdb -U postgres database_production

Now let's restore the database from our dump.

cat temporary.sql | psql -U postgres -d database_production

Oh, but we don't have public schema now. Don't worry, I thought about everything. Let's login into psql console and do renaming.

psql -U postgres -d database_production

DROP SCHEMA public;

ALTER SCHEMA temporary RENAME to public;

Now you can enjoy your new multitenancy free from Apartment. It's 2020, and I hope this will help so many companies stuck with the old solution.

Please share this article and also my message.

I am the PTSD Engineer, and this is my story.

My story of struggling with PTSD
Drugs harm reduction guide:

Links to social media:

Facebook page

Instagram page

Twitter

Discover and read more posts from Przemyslaw Mroczek
get started
post comments1Reply
Bailey Hunter
9 months ago

This is very amazing post thanks for sharing this. https://mcdvoice.me/