Database Engineering 10 min read

Type-safe SQL in Scala with Doobie

A Doobie abstraction layer that survives schema migrations — explicit column ordering, type-safe operators, and tenant-scoped queries by default.

Type-safe SQL in Scala with Doobie

We had a Scala service backed by MySQL where every model went through Doobie. Doobie itself is fine — it’s a thin functional wrapper over JDBC. The problem isn’t Doobie. The problem is what you build on top of it.

The raw pattern is sql"select id, user_id, amount from transactions" glued together with string interpolation. That works until someone runs SELECT * somewhere, or someone does an ALTER TABLE that shifts column positions, or someone forgets to add the team_id clause on a multi-tenant query. None of those break the compile. They break in prod, quietly, with the wrong data going into the wrong fields.

The abstraction layer in this post is what we landed on after enough of those near-misses. It’s not an ORM. It’s a thin set of traits and a query builder that makes the dangerous things harder to write than the safe things.

The migration that ruins your week

The pattern that started this whole exercise is the SELECT * problem. You have a table:

CREATE TABLE transactions (
  id BIGINT PRIMARY KEY,
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  amount DECIMAL(10,2)
);

And a case class whose field order matches:

case class Transaction(
  id: Long,
  createdAt: DateTime,
  updatedAt: DateTime,
  amount: BigDecimal
)

Now imagine the migration adds a column in the middle — ADD COLUMN user_id BIGINT AFTER id. SELECT * happily returns five columns in DB order; the case class still has four fields in its old order; Doobie’s Read[Transaction] maps positionally. The query compiles. The query runs. created_at ends up reading user_id, and your amount column is now whatever was in updated_at. Nothing throws. The data is just wrong.

You can argue that no one should ever ADD COLUMN ... AFTER. They will. You can argue no one should SELECT *. They will. The cheapest defense is making the column list explicit and making the explicit version the easy default.

The model trait

Every persisted entity extends a base trait. The trait does two useful things: it gives every model a uniform partitionKey for caching/logging, and it forces the timestamp fields to live at the type level so the framework can manage them.

abstract class BaseMySqlModel extends Product {
  def partitionKey: String
  def entityType: EntityType
}

abstract class IdBasedMySqlModel extends BaseMySqlModel {
  def mySqlId: Long
  def createdAt: DateTime
  def updatedAt: DateTime

  override def partitionKey: String = s"${entityType.modelStr}-$mySqlId"
}

Extending Product matters — the diff machinery further down uses productElementNames and productIterator to compare old and new objects field-by-field. Case classes give us that for free.

Companion objects carry the schema

Each model has a companion object that owns the table name, the JSON codecs, and — crucially — columnOrdering. That sequence is the source of truth for which columns the SELECT clause emits and in which order:

abstract class BaseMySqlModelCompanionObject[T <: BaseMySqlModel, IdType](
  val TableName: String
) {
  implicit val reads: Reads[T]
  implicit val writes: Writes[T]

  def columnOrdering: Seq[F[_, _]]

  def extractId(element: T): IdType
  def setIdAndReturn(id: IdType, model: T): T

  def preCreateInMemoryChanges(t: T): T
  def preCheckForUpdates(oldObj: T, updatedObj: T): Unit

  def fieldsToAddForUpdates: Seq[UpdatableField]
}

F[A, B] is our typed column reference — A is the Scala type, B is the type as it appears in queries (often the same, but Option[T] columns differ). Each F knows the column name, the table it belongs to, and how to render itself in select vs. non-select positions.

A concrete model wires it together:

case class Transaction(
  mySqlId: Long,
  userId: Long,
  createdAt: DateTime,
  updatedAt: DateTime,
  amount: BigDecimal,
  entityType: EntityType = EntityType.Transaction
)

object Transaction extends IdBasedMySqlModelCompanionObject[Transaction]("transactions") {
  val UserId: F[Long, Long] = F("user_id", TableName)
  val Amount: F[BigDecimal, BigDecimal] = F("amount", TableName)

  override val columnOrdering: Seq[F[_, _]] = Seq(
    Id, UserId, CreatedAt, UpdatedAt, Amount
  )

  implicit val reads: Reads[Transaction] = Json.reads[Transaction]
  implicit val writes: Writes[Transaction] = Json.writes[Transaction]

  override def extractId(t: Transaction): Long = t.mySqlId
  override def setIdAndReturn(id: Long, model: Transaction): Transaction =
    model.copy(mySqlId = id)
}

columnOrdering has to match the case class field order. That’s a constraint we test for explicitly (see below) — it’s the one piece of redundancy in the design and the test is what keeps it honest.

Safe and unsafe selection

The query builder exposes two select methods. One emits SELECT *, one emits an explicit column list derived from columnOrdering:

// Breaks when columns shift. Kept around for ad-hoc debugging.
queryBuilder
  .where(Transaction.Id, Operator.Equal, transactionId)
  .selectAllUnsafe[Transaction]
  .runSingle

// What everyone uses.
queryBuilder
  .where(Transaction.Id, Operator.Equal, transactionId)
  .selectAllExplicit[Transaction](Transaction)
  .runSingle

The implementation of selectAllExplicit is the whole point of the exercise — it’s a few lines:

def selectAllExplicit[A](
  companionObject: BaseMySqlModelCompanionObject[_, _]
)(implicit a: Read[A]): SelectQueryExecutor[A] = {
  val columnFragments = companionObject.columnOrdering.map(_.fNameForSelectClauses)
  val fragment = fragments.comma(NonEmptyList.fromListUnsafe(columnFragments.toList))
  queryExecutorCreator.createSelectQueryExecutor(getCompleteFragment(fragment).query[A])
}

That’s it. It generates SELECT id, user_id, created_at, updated_at, amount FROM transactions instead of SELECT *, and the positional Read[Transaction] decoder lines up with both the case class and the column list because we control both.

The Unsafe name on the other method is deliberate. People will avoid the word.

Type-aware fragment interpolation

Doobie’s fr interpolator wants types with Put instances. When you’re building queries dynamically — a generic UPDATE generator, for instance — you don’t have the type at compile time, you have an Any. The escape hatch is a runtime dispatch that hands off to the right Put:

def interpolate(any: Any): Fragment = any match {
  case str: String => fr"$str"
  case long: Long => fr"$long"
  case int: Int => fr"$int"
  case bool: Boolean => fr"$bool"
  case bigDecimal: BigDecimal => fr"$bigDecimal"
  case Some(value) => interpolate(value)
  case None => Fragment.const("NULL")
  case value if DoobieGetsAndPuts.putMap.contains(value.getClass) =>
    implicit val put: Meta[value.type] =
      DoobieGetsAndPuts.putMap(value.getClass).asInstanceOf[Meta[value.type]]
    toPutOps[value.type](value).fr
  case _ =>
    throw new IllegalArgumentException(s"Unsupported type: ${any.getClass}")
}

This isn’t beautiful — runtime type dispatch on Any never is — but it lives in exactly one place. New domain types register their Meta in DoobieGetsAndPuts.putMap and become usable everywhere:

object DoobieGetsAndPuts {
  implicit val dateTimeMeta: Meta[DateTime] =
    Meta[Timestamp].timap(ts => new DateTime(ts.getTime))(dt => new Timestamp(dt.getMillis))

  implicit val entityTypeMeta: Meta[EntityType] =
    Meta[String].timap(EntityType.fromString)(_.modelStr)

  implicit val jsonMeta: Meta[JsValue] =
    Meta[String].timap(Json.parse)(Json.stringify)

  val putMap: Map[Class[_], Meta[_]] = Map(
    classOf[DateTime] -> dateTimeMeta,
    classOf[EntityType] -> entityTypeMeta,
    classOf[JsValue] -> jsonMeta
  )
}

Diff-based UPDATE

The other half of the framework is diffFragment. Given the old object and the new object, it walks the case class fields, finds the ones that changed, and emits column = newValue clauses for each. The fieldsToAddForUpdates set covers managed columns like updated_at = NOW() that should always be updated regardless of user changes:

def diffFragment(
  oldObj: T,
  newObj: T,
  shouldUpdateDefaultFieldsForEmptyUserDefinedUpdates: Boolean
): Option[Fragment] = {
  preCheckForUpdates(oldObj, newObj)

  val defaultUpdateFragments = fieldsToAddForUpdates
  val fieldsToIgnore = defaultUpdateFragments.map(_.columnName).toSet

  val userDefinedUpdateFragments = newObj
    .productElementNames
    .zip(oldObj.productIterator)
    .zip(newObj.productIterator)
    .flatMap { case ((name, oldVal), newVal) =>
      if (oldVal != newVal) {
        val dbName = convertCamelToSnake(name)
        if (fieldsToIgnore.contains(dbName)) None
        else Some(Fragment.const(s"$dbName =") ++ interpolate(newVal))
      } else None
    }
    .toList

  if (userDefinedUpdateFragments.isEmpty &&
      !shouldUpdateDefaultFieldsForEmptyUserDefinedUpdates) {
    None
  } else {
    val allFragments = userDefinedUpdateFragments ++
                       defaultUpdateFragments.map(_.getFragment)
    Some(fragments.set(NonEmptyList.fromListUnsafe(allFragments)))
  }
}

Returning Option[Fragment] is the small detail that paid off most often — when nothing changed, callers get None and can skip the round-trip entirely instead of issuing a no-op UPDATE.

The camelCase-to-snake_case translation is mechanical, with a small override map for the cases where the database disagrees with the convention — mySqlId maps to id, etc. The Opt suffix gets dropped so userIdOpt reads as user_id:

def convertCamelToSnake(name: String): String = {
  val caseClassName = if (name.endsWith("Opt")) name.dropRight(3) else name
  CaseFormat.LOWER_CAMEL.to(CaseFormat.LOWER_UNDERSCORE, caseClassName)
}

override def overriddenMap: Map[String, String] = Map(
  "mySqlId" -> "id",
  "jsonData" -> "data"
)

Operators carry their arity

The where method is overloaded by operator kind. Each operator type knows how many arguments it takes, so IsNull can’t accidentally be called with a value and Equal can’t be called without one:

sealed trait BinaryOperator extends Operator {
  def getFragment[T: Put](f: F[T, _], value: T): Fragment
}

sealed trait UnaryOperator extends Operator {
  def getFragment[T](f: F[T, _]): Fragment
}

sealed trait ManyOperator extends Operator {
  def getFragment[T: Put](f: F[T, _], value1: T, remaining: Iterable[T]): Fragment
}

sealed trait ThreeOperator extends Operator {
  def getFragment[T: Put](f: F[T, _], low: T, high: T): Fragment
}

Concrete operators go in their own case objects:

case object Equal extends BinaryOperator {
  override def getFragment[T: Put](f: F[T, _], value: T): Fragment =
    f.fNameForNonSelectClauses ++ fr"= $value"
}

case object IsNull extends UnaryOperator {
  override def getFragment[T](f: F[T, _]): Fragment =
    f.fNameForNonSelectClauses ++ Fragment.const("IS NULL")
}

case object In extends ManyOperator {
  override def getFragment[T: Put](f: F[T, _], value1: T, remaining: Iterable[T]): Fragment =
    f.fNameForNonSelectClauses ++ fr"IN (" ++
    fragments.comma(NonEmptyList(fr"$value1", remaining.map(v => fr"$v").toList)) ++
    Fragment.const(")")
}

The ManyOperator signature insisting on value1 plus remaining: Iterable[T] is deliberate — IN () is a SQL error, so we make it impossible to construct an empty list at the type level.

In use, queries read close to the SQL they emit:

Transaction
  .getDefaultQuery(queryContext, queryExecutor)
  .leftJoin(
    User.getBaseQuery(queryExecutor),
    Transaction.UserId,
    User.Id,
    shouldSearchForeignTable = false
  )
  .where(Transaction.Amount, Operator.GreaterThan, BigDecimal(100))
  .where(Transaction.CreatedAt, Operator.Between, startDate, endDate)
  .orderBy(Transaction.CreatedAt, SortOrder.Desc)
  .withPageNoAndPageSize(pageNo = 1, pageSize = 50)
  .selectAllExplicit(Transaction, User)
  .runList

Tenancy by default

The other class of bug we wanted out of the way is forgetting the team_id clause on a multi-tenant query. The fix is a query context that gets threaded through every call, plus a model trait that auto-injects the filter:

sealed trait MySqlQueryContext {
  def teamIdOpt: Option[Long]
  def shouldIncludeDeletedRecords: Boolean
}

case class TeamScopedQueryContext(
  teamId: Long,
  shouldIncludeDeletedRecords: Boolean = false
) extends MySqlQueryContext {
  override def teamIdOpt: Option[Long] = Some(teamId)
}

abstract class TeamScopedMySqlModelCompanionObject[T <: TeamScopedMySqlModel](
  override val TableName: String
) extends IdBasedMySqlModelCompanionObject[T](TableName) {
  val TeamId: F[Long, Long] = F("team_id", TableName)

  override def getDefaultQuery(
    qc: MySqlQueryContext,
    queryExecutorCreator: QueryExecutorCreator
  ): QueryBuilder = {
    val queryBuilder = super.getDefaultQuery(qc, queryExecutorCreator)
    qc.teamIdOpt match {
      case Some(teamId) => queryBuilder.where(TeamId, Operator.Equal, teamId)
      case None => queryBuilder
    }
  }
}

The AllScopedQueryContext exists for admin tooling that genuinely needs to read across tenants — and grepping for it in code review is how we caught a couple of unintentional cross-tenant queries before they shipped. If teamIdOpt is always wrapped in a context type, the choice to skip the filter has to be visible.

Soft deletes get the same treatment — a separate trait that adds deleted_at IS NULL to the default query unless the context opts in:

override def getDefaultQuery(qc: MySqlQueryContext, queryExecutorCreator: QueryExecutorCreator): QueryBuilder = {
  val queryBuilder = super.getDefaultQuery(qc, queryExecutorCreator)
  if (qc.shouldIncludeDeletedRecords) queryBuilder
  else queryBuilder.where(DeletedAtOpt, Operator.IsNull)
}

The test that holds it together

The whole design hinges on columnOrdering matching the case class. We pin that with one test per model:

"Transaction companion object" should {
  "have columnOrdering matching case class field order" in {
    val expectedFields = Seq("mySqlId", "userId", "createdAt", "updatedAt", "amount")
    val actualFields = Transaction.columnOrdering.map(_.columnName)

    actualFields should contain theSameElementsInOrderAs expectedFields.map(
      CaseFormat.LOWER_CAMEL.to(CaseFormat.LOWER_UNDERSCORE, _)
    )
  }
}

It’s the kind of test that exists purely to hold up an invariant the compiler can’t see. When someone adds a field and forgets to update columnOrdering, the test fails before CI ever touches a database.

Migrations are still annoying

The migration discipline that this design encourages is: add columns at the end of the table, add the F[_, _] reference, append it to columnOrdering. If you absolutely must ADD COLUMN ... AFTER, update columnOrdering in the same commit. Case class field order doesn’t matter for correctness — only columnOrdering does — but it matters for readability, so we keep them aligned anyway.

The framework doesn’t make migrations safe. It makes them visible. The columnOrdering line shows up as a diff in code review next to the Flyway script, and that’s enough.

What it cost

The downside is the boilerplate. Every model has a companion object that’s mostly mechanical. The F references repeat the column name. The columnOrdering list repeats the field list. We considered a macro, decided it wasn’t worth the debugging tax, and lived with the repetition. The test catches the one thing that goes wrong.

The other tradeoff is that interpolate(any: Any) is a runtime dispatch — it’s the seam where the type-safety guarantee breaks down. We kept it small and central; new domain types get added to one map; everything else routes through Put instances at compile time. Not a perfect story, but a confined one.

If you’re starting a Scala service today, Doobie plus a thin wrapper of this shape is still the move I’d make. ORMs do too much. Raw sql"" interpolation does too little. The middle is small and lives mostly in your repo, where you can read it.

Back to Blog

Related Posts

View All Posts »

We Tripled the Test Suite. Then Everything Else Had to Change.

I started the quarter trying to raise test coverage on one monorepo. I finished it having also rewritten the test pipeline, the coverage gate, the deploy workflows, and most of the backend's dependency stack. None of it was on the ticket. All of it followed from the ticket.