Ruby on Rails multitenancy in 2020
2020 is definitely the craziest year I have seen in my entire life. What is happening around the world, no one could see that coming. It's a mad time.
Here are some crazy things that happened:
- Almost WWIII
- Australian wildfires
- COVID-19
- Trump impeachment
- Kobe Bryant death
- George Floyd riots
Meantime, in the Ruby On Rails we are stuck in the old ways. An earlier solution for multitenancy in Rails is still widely used even though it is not optimal in contemporary times.
I am talking about you, https://github.com/influitive/apartment. You were quite a good fellow over the years and truly useful, but it's time to move on. Let's start this new decade with something that doesn't cause so many problems.
What is multitenancy?
Multitenancy usually means handling many independent tenants from one app. Most popular in SAAS models. There are several types of implementation. Let's look at them closer.
Row-level approach
You have a tenant_id column in every DB table and filter by tenant_id in each query.
Advantages:
- Just another row in the table
- No brainer
- Migrations run in O(1) complexity
- Comes out of the box in Rails
- No additional costs
- No operational overhead
- No cost of switching
Disadvantages:
- tenant_id keys everywhere
- Leaking of data between tenants possible, if you forget about WHERE
Schema-level approach (Apartment gem)
Every tenant has a separate namespace. In practice, we are creating more tables like:
- tenant1.users
- tenant2.users
- tenant3.users
Namespacing is easily achievable with PostgreSQL schemas.
Advantages:
- No data leakages between tenants
- Easy extraction of single-tenant data
- No additional costs
Disadvantages:
Database-level approach
New database set up for every tenant. Not practical, including just for comparison purposes.
Advantages:
- Almost impossible to get data leakage between tenants
- Non-invasive
- Easy extraction of single-tenant data
Disadvantages:
- Additional costs of maintenance of multiple databases
- Hard to create databases on the fly
- Establishing a separate database connection is required
- Migrations run in O(n) complexity
As you see, the row-level is a clear winner in most cases.
What if you already use an apartment gem?
The cost of switching may look intimidating. Great that you have PTSD Engineer with you. I don't think it was ever done on the internet. I never came around this kind of guide. Let's talk about switching from Apartment to a row-level approach. Mission Impossible? Don't have nerves from steel? Just keep reading, you may be amazed. I want to warn that it is only one approach to this kind of migration. There are other options, but this seems most straight-forward.
Starting from scratch or using a library
I will not write about starting from scratch. We already have enough problems on our hands. We can trust one company that never disappointed me https://www.citusdata.com/ They are an incredible company with significant expertise in databases. I believe them, so should you. They came up with a clean solution to our problem https://github.com/citusdata/activerecord-multi-tenant
This gem solves our problems; syntax looks kinda similar to Apartment
, so that should also be useful.
Where to start?
Right now, we have probably a hundred calls to Apartment, like this: Apartment::Tenant.switch(tenant) do
. It would be tough to change all of them. What if we want to go back?
My solution, let's create a wrapper that will help with the transition.
class Multitenancy
def self.switch(tenant)
Apartment::Tenant.switch(tenant) do
yield
end
end
def self.switch!(tenant)
Apartment::Tenant.switch!(tenant)
end
def self.current
Apartment::Tenant.current
end
def self.tenant_names
Apartment.tenant_names
end
def self.drop(tenant)
Apartment::Tenant.drop(tenant)
end
def self.create(tenant)
Apartment::Tenant.create(tenant)
end
def self.reset
Apartment::Tenant.reset
end
end
Now we can call our wrapper and get ready for making first changes.
How to merge all the schemas into one?
The main problem of our migration, we have multiple schemas. There will be broken constraints if we try to merge into one: primary keys, foreign keys.
Our options:
- Calculate exactly on which id every table is ending and add them using a complicated script
- Exchange id in tables and foreign keys *_id with UUID. That would probably be hard and would require changes on the frontend
- Use composite keys, a fantastic option. Unfortunately, not well supported in Ruby on Rails. Sorry, it's too risky to go with this one.
Let's think outside the box and approach our problem from a different perspective. Why are we caring so much that ids have to be in consecutive order? What if there is another way?
Do gaps happen in databases? Yes, they do. If we have users from 1 to 1 000 000 , and users between 20 000 and 100 000 will be hard deleted we will have a gap. Databases are ready for those cases to happen a lot.
Let's use a simple formula for migration. Find the highest id in all the tables across all schemas and use it as our offset number. Then we need to add a tenant_id column to all tables, that are used per schema.
The formula
def offset_record(practice_id)
# 10_000 as our arbitrary number in these case, yours will be different
# tenant_id will be the id of our new column
10_000 * tenant_id
end
Formula for calculating offsets
Let's create a migration that will add the practice_id
column to tables.
class AddTenantIdToTables < ActiveRecord::Migration[5.2]
def change
add_column :users, :tenant_id, :integer
add_column :customers, :tenant_id, :integer
...
end
end
Now we are left with the task of filling those ids based on schemas. We could approach it this way.
SCHEMA_MODELS = [
Users,
Clients,
].freeze
ActiveRecord::Base.transaction do
Tenant.all.each do |tenant|
Multitenancy.switch(tenant.domain) do
update_model_tables(tenant_id: tenant.id)
end
end
end
end
end
def update_model_tables(tenant_id:, )
SCHEMA_MODELS.each do |model|
model.update_all(tenant_id: tenant_id)
end
end
Filling practice id for each tenant
Running this will seed our database with correct tenant_ids that will be helpful in migration.
What about our images?
Yes, you are right; if we don't figure something out, we will break them. Let's migrate images too. In this guide, I am assuming that we are using Carrierwave
.
Our uploaders should be ready for this change. That's how BaseUploader should look like.
class BaseUploader < CarrierWave::Uploader::Base
# our base uploader to keep a code a little bit dry
OFFSET = 10_000
def current_tenant_domain
Multitenancy.current
end
def tenant_id
Tenant.find_by(domain: current_tenant_domain).id
end
def migration_offset
tenant_id * OFFSET
end
def model_new_id
model.id + migration_offset
end
end
And this will be our typical uploader. We will have many of them, but I think you will figure out what to do next.
class CVUploader < BaseUploader
def store_dir
"tenants/#{current_tenant_domain}/#{model.class.to_s.underscore}/#{model.id}/#{mounted_as}"
end
before :store, :swap_out_store_dir
def swap_out_store_dir(_test)
self.class_eval do
def store_dir
"tenant/#{current_tenant_domain}/#{model.class.to_s.underscore}/#{model_new_id}/#{mounted_as}"
end
end
end
end
Now let's write a script to migrate everything. At this point, I should add that you should have a testing environment with production backup and production s3 bucket clone , don't believe you can do it on your first approach. It's a complicated process; I'm trying to make it as easy as possible.
MODELS_AND_FIELDS = {
cv_uploader: [:file]
}.freeze
ActiveRecord::Base.transaction do
Tenant.find_each do |tenant|
Apartment::Tenant.switch(tenant.domain) do
puts "<<<<<<<<<<<<< Tenant #{tenant.name} START"
MODELS_AND_FIELDS.each do |key, tab|
puts " <<<<<<<< Model #{key} START"
key.to_s.classify.constantize.find_each do |record|
puts record.id
tab.each do |resource_name|
if record.public_send(resource_name).present? && record.public_send(resource_name).file.exists?
record.public_send(resource_name).cache!
record.public_send(resource_name).store!
end
end
end
puts " <<<<<<<< Model #{key} FINISHED"
end
end
puts "<<<<<<<<<<<<< Tenant #{tenant.name} FINISHED"
end
After migration passes, you need to change uploaders so that they won't use the before_store callbacks.
This code should be deleted:
before :store, :swap_out_store_dir
def client_new_id
model.client_case.client.id + migration_offset
end
def swap_out_store_dir(_test)
self.class_eval do
def store_dir
"practices/#{current_practice_domain}/client/#{client_new_id}/#{model.class.to_s.underscore}/#{model_new_id}/#{mounted_as}"
end
end
end
Migrate offset-ids for Rails models
The code is straight-forward. All my examples are in code, but obviously, all should be in separate classes. It's just for more fundamental understanding.
SCHEMA_BASED_MODELS = [
Document,
Task,
].freeze
IRREGULAR_COLUMNS = ['third_api_account_id', ''tenant_id',].freeze
def call
# I am disabling triggers so won't get any messages about breaking constraints
disable_db_triggers
tenants.each do |tenant|
Rails.logger.info "Migrating ids for tenant #{tenant.domain}"
Apartment::Tenant.switch(tenant.domain) do
update_ids_for_models(tenant)
end
Rails.logger.info "Finished migrating ids for tenant #{tenant.domain}"
end
ensure
enable_db_triggers
end
private
def tenants
@tenants ||= Tenant.all
end
def update_ids_for_models(tenant)
SCHEMA_BASED_MODELS.each do |model|
columns = columns_for_model(model)
model.update_all(sql_update_all(columns, tenant.id))
model.update_all(tenant_id: tenant.id)
end
end
def columns_for_model(model)
model.columns.select do |column|
column.name.match?(/^(.+_id|id){1}$/)
end.map(&:name) - IRREGULAR_COLUMNS
end
def sql_update_all(columns, tenant_id)
# generates update all query
columns.map do |column|
"#{column} = #{column} + #{offset_id(tenant_id)}"
end.join(',')
end
def offset_id(tenant_id)
# simple formula
10_000 * tenant_id
end
def disable_db_triggers
ActiveRecord::Base.connection.execute("SET session_replication_role = 'replica';")
end
def enable_db_triggers
ActiveRecord::Base.connection.execute("SET session_replication_role = 'origin';")
end
Creating a temporary schema dump
Now let's merge those schemas into one, called temporary.
We need to create our temporary schema first Apartment::Tenant.create('temporary')
TABLE_NAMES = ['documents',
'invitations',
'messages',
'notifications',
].freeze
PUBLIC_TABLE_NAMES = ['tags',
'taggings',
'super_admins',
'users'].freeze
schemas_sql = "SELECT nspname AS schema_name FROM pg_namespace WHERE NOT (nspname ~ '^pg_' or nspname = 'information_schema' or nspname = 'public' or nspname = 'temporary')"
schemas = ActiveRecord::Base.connection.execute(schemas_sql).to_a.map do |row|
row['schema_name']
end
# again to avoid problems with constraints
ActiveRecord::Base.connection.execute("SET session_replication_role = 'replica';")
TABLE_NAMES.each do |table_name|
columns_sql = <<~SQL
select column_name
from INFORMATION_SCHEMA.COLUMNS
where table_name = '#{table_name}' AND table_schema = 'name of primary schema, with proper data structure';
SQL
columns = ActiveRecord::Base.connection.execute(columns_sql).to_a.map do |row|
row['column_name']
end.uniq
schemas.each do |schema|
columns_joined = columns.join(', ')
insert_into_sql = <<~SQL
INSERT INTO temporary.#{table_name} (#{columns_joined}) (
SELECT #{columns_joined} from "#{schema}".#{table_name}
)
SQL
ActiveRecord::Base.connection.execute(insert_into_sql)
end
end
PUBLIC_TABLE_NAMES.each do |table_name|
columns_sql = <<~SQL
select column_name
from INFORMATION_SCHEMA.COLUMNS
where table_name = '#{table_name}' AND table_schema = 'primary schema, with good data structure';
SQL
columns = ActiveRecord::Base.connection.execute(columns_sql).to_a.map do |row|
row['column_name']
end.uniq
columns_joined = columns.join(', ')
insert_into_sql = <<~SQL
INSERT INTO temporary.#{table_name} (#{columns_joined}) (
SELECT #{columns_joined} from "public".#{table_name}
)
SQL
ActiveRecord::Base.connection.execute(insert_into_sql)
end
# and back to normal session replication role
ActiveRecord::Base.connection.execute("SET session_replication_role = 'origin';")
Okay, we have all schemas merged into a temporary one. Let's create dump.
pg_dump --username postgres -n temporary database_production > ~/temporary.sql
We are after the hardest steps, almost over. Let's add our new gem to Gemfile
gem 'activerecord-multi-tenant'
And let's update our wrapper. This migration will take place during multiple deploys; the maintenance window is recommended. If you can't allow yourself for a maintenance window, you will have to create particular triggers in the database that will sync data.
class Multitenancy
def self.switch(tenant)
MultiTenant.switch(tenant) do
yield
end
end
def self.switch!(tenant)
MultiTenant.current_tenant = tenant
end
def self.current
MultiTenant.current_tenant
end
The database has to be recreated now, commands to use in Postgres. Remember about backups before this step:
dropdb -U postgres database_production
createdb -U postgres database_production
Now let's restore the database from our dump.
cat temporary.sql | psql -U postgres -d database_production
Oh, but we don't have public schema now. Don't worry, I thought about everything. Let's login into psql console and do renaming.
psql -U postgres -d database_production
DROP SCHEMA public;
ALTER SCHEMA temporary RENAME to public;
Now you can enjoy your new multitenancy free from Apartment. It's 2020, and I hope this will help so many companies stuck with the old solution.
Please share this article and also my message.
I am the PTSD Engineer, and this is my story.
My story of struggling with PTSD
Drugs harm reduction guide:
Links to social media:
This is very amazing post thanks for sharing this. https://mcdvoice.me/