Software Architecture 7 min read

Multi-Tenant Database Architecture Patterns for SaaS Platforms

How we made tenant isolation a compile-time property in a Scala SaaS backend, using an implicit QueryContext that the repository layer refuses to function without.

Multi-Tenant Database Architecture Patterns for SaaS Platforms

When we sat down to design the data layer for the SaaS backend, the multi-tenancy question came first and pre-empted almost every other decision. Database per tenant, schema per tenant, or shared database with tenant_id on every row — that choice fixes your migration story, your connection pool sizing, your support workflows, and your blast radius for one bad query. We went with the cheapest option of the three and spent the saved complexity budget on making the boundary something the compiler enforces rather than something a developer remembers under deadline.

The shape of the answer is a single sealed trait every read method demands as an implicit. The interesting work isn’t the trait — it’s that you can’t construct one accidentally and you can’t query without one.

Why row-level

Database-per-tenant gives the strongest isolation but blows up connection pools and turns every migration into a fleet operation. Schema-per-tenant inherits most of the migration pain without much of the safety upside. Shared database with row-level isolation is cheap and operationally simple, and the only honest objection to it is that it’s only as safe as the discipline of the person writing the query. We took that objection seriously enough to make the discipline structural.

QueryContext as the seam

sealed trait QueryContext {
  def tenantIdOpt: Option[Long]
  def shouldIncludeDeleted: Boolean
  def queryingService: QueryingService
  def accessLevel: AccessLevel
}

case class TenantScopedQueryContext(
  tenantId: Long,
  queryingService: QueryingService,
  accessLevel: AccessLevel = AccessLevel.UserLevel,
  shouldIncludeDeleted: Boolean = false
) extends QueryContext {
  override def tenantIdOpt: Option[Long] = Some(tenantId)
}

case class AllScopedQueryContext(
  queryingService: QueryingService,
  accessLevel: AccessLevel = AccessLevel.SystemLevel,
  shouldIncludeDeleted: Boolean = false
) extends QueryContext {
  override def tenantIdOpt: Option[Long] = None
}

There are two flavours. TenantScopedQueryContext carries a specific tenant — that’s what most request handlers run with, derived from the authenticated session. AllScopedQueryContext skips the tenant filter and is reserved for admin tools, batch jobs, and analytics. The split matters because the second one is dangerous and we want it visible at every call site that uses it.

AccessLevel rides on the context to encode “who’s allowed to do this” — UserLevel can only see its own tenant, AdminLevel can target any single tenant, SystemLevel is the only path that legally skips the filter. QueryingService is the audit field, an enum of WebApi, BackgroundWorker, MigrationScript, AdminTool. Every metric and log line gets tagged with it, which matters when something looks weird at 3am and we need to know whether the noisy queries came from the API or from a misbehaving job.

The compile-time check

The repository layer is where the implicit becomes load-bearing. Every read method requires it:

abstract class ReadRepository[M <: BaseModel](collection: MongoCollection[M]) {
  def findById(id: String)(implicit queryContext: QueryContext): Future[Option[M]] =
    applyQueryContext(
      new ReadQuery(collection).where(BaseModel.IdField, Operator.Equal, id)
    ).fetchOne

  def findAll(implicit queryContext: QueryContext): Future[Seq[M]] =
    applyQueryContext(new ReadQuery(collection)).fetchList

  protected def applyQueryContext(query: ReadQuery[M])(
    implicit queryContext: QueryContext
  ): ReadQuery[M] = query
}

Models split into TenantScopedModel, OptionalTenantScopedModel, and GlobalModel, each with its own repository subclass. The tenant-scoped one is the interesting case:

class TenantScopedRepository[M <: TenantScopedModel](
  collection: MongoCollection[M]
) extends ReadRepository[M](collection) {

  override protected def applyQueryContext(query: ReadQuery[M])(
    implicit queryContext: QueryContext
  ): ReadQuery[M] = queryContext.tenantIdOpt match {
    case Some(tenantId) =>
      query.where(TenantScopedModel.TenantIdField, Operator.Equal, tenantId)

    case None if queryContext.accessLevel == AccessLevel.SystemLevel =>
      query

    case None =>
      throw new SecurityException(
        "Tenant-scoped query requires either tenantId or SystemLevel access"
      )
  }
}

Three branches, no fourth. Tenant id present, the filter goes on. Tenant id absent and access is SystemLevel, no filter. Anything else throws — not as a security boundary on its own, but as a tripwire so a missing branch in some future refactor doesn’t silently start leaking rows.

The compile-time piece is what you don’t write. A handler that forgets the implicit looks like this:

def brokenQuery(userId: String): Future[Seq[Order]] = {
  orderRepository
    .where(Order.UserIdField, Operator.Equal, userId)
    .fetchList  // compile error: no implicit QueryContext in scope
}

That doesn’t run. It doesn’t build. There’s no path through the type system that lets you query a tenant-scoped table without naming a tenant or explicitly claiming system access.

Cross-tenant work, deliberately visible

Most of the code is single-tenant. Some isn’t — support engineers look at a customer’s data when there’s a ticket, analytics jobs aggregate across tenants, monthly report jobs walk every tenant in a loop. We wanted each of those to be visible in code review and in logs, not papered over with a generic “admin” flag.

Support gets its own context type that requires a reason string:

case class SupportContext(
  supportUserId: String,
  targetTenantId: Long,
  reason: String
) extends QueryContext {
  override def tenantIdOpt: Option[Long] = Some(targetTenantId)
  override def queryingService: QueryingService = QueryingService.AdminTool
  override def accessLevel: AccessLevel = AccessLevel.AdminLevel
  override def shouldIncludeDeleted: Boolean = true
}

Constructing a SupportContext is the line a code reviewer pays attention to. The reason string ends up in an audit log so there’s a paper trail of who looked at what and why.

For jobs over all tenants, the pattern is to start with AllScoped to enumerate, then narrow to a per-tenant scope inside the loop:

def sendMonthlyReports(): Future[Unit] = {
  implicit val ctx: AllScopedQueryContext = AllScopedQueryContext(
    queryingService = QueryingService.BackgroundWorker,
    accessLevel = AccessLevel.SystemLevel
  )

  for {
    allTenants <- tenantRepository.findAll
    _ <- FutureUtil.groupedSequence(allTenants, concurrentPoolSize = 10) { tenant =>
      implicit val tenantCtx: TenantScopedQueryContext = TenantScopedQueryContext(
        tenantId = tenant.id,
        queryingService = QueryingService.BackgroundWorker
      )
      generateAndSendReport(tenant.id)
    }
  } yield ()
}

The narrowing matters. By the time we’re generating a report, the in-scope implicit is single-tenant — anything called from inside generateAndSendReport gets the same compile-time guarantee as a normal request handler.

Indexes follow the access pattern

Row-level isolation is only cheap if the queries are cheap, and that means every compound index on a tenant-scoped table starts with tenant_id:

CREATE INDEX idx_orders_tenant_user    ON orders(tenant_id, user_id);
CREATE INDEX idx_orders_tenant_created ON orders(tenant_id, created_at);

An index on (user_id) alone looks fine in review and is wrong for our access pattern — every real query filters by tenant_id first, and an index that doesn’t know that costs a sort or a scan. We caught one of these by accident and now we lint for it: any index on a TenantScopedModel table whose leading column isn’t tenant_id gets flagged.

And the boring rule

The tenant id never comes from the client. It comes from the authenticated session:

// Wrong — tenant id from request body
val tenantId = request.body.as[GetOrdersRequest].tenantId

// Right — tenant id from authenticated session
val tenantId = request.user.tenantId

None of the compile-time scaffolding above is worth anything if the input to the constructor is attacker-controlled. This isn’t novel; it’s just the line that breaks if you let it.

What this is

The pattern works for us because the tenants are similar enough in shape that one shared database is genuinely the right answer, and the team is small enough that conventions don’t drift across groups. If a single tenant were going to become most of the data, we’d be looking at a hybrid — shared-everything stops working the moment one tenant can starve the rest. And under regulatory rules that mandate physical isolation, none of this matters; the architecture would have been chosen for us.

What I’d take from this into a differently-shaped system is the seam, not the table layout. There’s a single type every read method demands, and that type can’t be constructed accidentally — every path that produces one is either a request handler reading from an authenticated session, a support tool with an audit reason, or a system context that we deliberately call out. The compiler handles the boring half of code review. That’s been worth more, day to day, than anything else in the stack.

Back to Blog

Related Posts

View All Posts »

Some Decisions Aren't Decisions

Someone senior pushed back on why we'd isolated our callback servers instead of just scaling the API vertically. I stopped arguing mid-explanation — not because he was right, but because I couldn't put words to defaults my team had long since stopped questioning.

We Tripled the Test Suite. Then Everything Else Had to Change.

I started the quarter trying to raise test coverage on one monorepo. I finished it having also rewritten the test pipeline, the coverage gate, the deploy workflows, and most of the backend's dependency stack. None of it was on the ticket. All of it followed from the ticket.