Software Architecture 11 min read

Designing Type-Safe Query DSLs in Scala

Notes on the Scala query DSL we built over MongoDB to stop renames from silently breaking queries at runtime.

Designing type-safe query DSLs in Scala

The first time a field rename took down a query in production, the diff looked harmless. Someone renamed price to basePrice somewhere in the model, the unit tests passed, the integration tests passed, and a dashboard quietly started returning empty arrays a few hours later. The query string still referenced "price". There was no compile error to catch, because there was nothing for the compiler to check. It was just text inside a BsonDocument.

That’s the case for moving queries into the type system. The MongoDB Scala driver is happy to accept this:

collection.find(
  BsonDocument("""{
    "userId": "12345",
    "status": { "$in": ["active", "pending"] },
    "createdAt": { "$gte": { "$date": "2024-01-01T00:00:00Z" } }
  }""")
)

Field name typos go undetected until runtime. A string where an int was expected gives you a cryptic Bson error. Rename a field on the case class and the JSON literal sails through unchanged. None of the IDE features that make refactoring tolerable — autocomplete, jump-to-definition, find-usages — work on text inside a triple-quoted string.

This post is the shape of the DSL we ended up with. It’s MongoDB-specific, and it’s not novel — Slick, Quill, and Phantom solve adjacent problems for other stores — but it’s small enough to build in-house and worth understanding from the bottom up.

Field references with types

The foundation is a field reference parameterised by both the model it belongs to and the value type it holds:

case class Field[Model, FieldType](
  name: String,
  converter: StrConverter[FieldType]
) {
  def path: String = name

  def toBson(value: FieldType): BsonValue =
    converter.toBson(value)
}

object Field {
  def apply[M, T](name: String)(implicit conv: StrConverter[T]): Field[M, T] =
    new Field[M, T](name, conv)
}

Model is a phantom type — it never appears at runtime, but it lets us refuse to compile a query that mixes Product fields into an Order query. FieldType is what carries the type-safety we actually wanted: when you say where(Product.PriceField, GreaterThan, x), the compiler insists x is a BigDecimal.

The conversion to BSON is delegated to a StrConverter type class, instances of which we summon implicitly:

trait StrConverter[T] {
  def fromString(str: String): T
  def toString(value: T): String
  def toBson(value: T): BsonValue
  def fromBson(bson: BsonValue): T
}

object StrConverter {
  implicit val stringConverter: StrConverter[String] = new StrConverter[String] {
    override def fromString(str: String): String = str
    override def toString(value: String): String = value
    override def toBson(value: String): BsonValue = BsonString(value)
    override def fromBson(bson: BsonValue): String = bson.asString().getValue
  }

  implicit val longConverter: StrConverter[Long] = new StrConverter[Long] {
    override def fromString(str: String): Long = str.toLong
    override def toString(value: Long): String = value.toString
    override def toBson(value: Long): BsonValue = BsonInt64(value)
    override def fromBson(bson: BsonValue): Long = bson.asInt64().getValue
  }

  implicit val dateTimeConverter: StrConverter[DateTime] = new StrConverter[DateTime] {
    override def fromString(str: String): DateTime = DateTime.parse(str)
    override def toString(value: DateTime): String = value.toString()
    override def toBson(value: DateTime): BsonValue = BsonDateTime(value.getMillis)
    override def fromBson(bson: BsonValue): DateTime = new DateTime(bson.asDateTime().getValue)
  }

  implicit def optionConverter[T](implicit inner: StrConverter[T]): StrConverter[Option[T]] =
    new StrConverter[Option[T]] {
      override def fromString(str: String): Option[T] = Some(inner.fromString(str))
      override def toString(value: Option[T]): String = value.map(inner.toString).getOrElse("")
      override def toBson(value: Option[T]): BsonValue =
        value.map(inner.toBson).getOrElse(BsonNull())
      override def fromBson(bson: BsonValue): Option[T] =
        if (bson.isNull) None else Some(inner.fromBson(bson))
    }
}

Field definitions live on the model’s companion object. One file, one schema:

case class Product(
  id: String,
  name: String,
  price: BigDecimal,
  category: String,
  createdAt: DateTime,
  tagsOpt: Option[List[String]]
)

object Product {
  val IdField = Field[Product, String]("_id")
  val NameField = Field[Product, String]("name")
  val PriceField = Field[Product, BigDecimal]("price")
  val CategoryField = Field[Product, String]("category")
  val CreatedAtField = Field[Product, DateTime]("created_at")
  val TagsOptField = Field[Product, Option[List[String]]]("tags")

  // Nested field access
  val PricingBaseField = Field[Product, BigDecimal]("pricing.basePrice")
}

The dot-path string for nested fields is the one place we couldn’t push everything into the type system — Mongo’s nested syntax is just a string, and chasing it deeper than this got expensive in compile time without much payoff.

Operators

Three shapes cover most of what we need from Mongo: binary (field op value), unary (field op), and many (field op [values]). Sealed traits make the dispatch on where(...) total:

sealed trait Operator

sealed trait BinaryOperator extends Operator {
  def mongoOp: String
  def toBson[T](field: Field[_, T], value: T): BsonDocument
}

case object Equal extends BinaryOperator {
  override def mongoOp: String = "$eq"
  override def toBson[T](field: Field[_, T], value: T): BsonDocument =
    BsonDocument(field.path -> field.toBson(value))
}

case object GreaterThan extends BinaryOperator {
  override def mongoOp: String = "$gt"
  override def toBson[T](field: Field[_, T], value: T): BsonDocument =
    BsonDocument(field.path -> BsonDocument(mongoOp -> field.toBson(value)))
}

case object LessThan extends BinaryOperator {
  override def mongoOp: String = "$lt"
  override def toBson[T](field: Field[_, T], value: T): BsonDocument =
    BsonDocument(field.path -> BsonDocument(mongoOp -> field.toBson(value)))
}

case object GreaterThanOrEqual extends BinaryOperator {
  override def mongoOp: String = "$gte"
  override def toBson[T](field: Field[_, T], value: T): BsonDocument =
    BsonDocument(field.path -> BsonDocument(mongoOp -> field.toBson(value)))
}

sealed trait UnaryOperator extends Operator {
  def mongoOp: String
  def toBson[T](field: Field[_, T]): BsonDocument
}

case object Exists extends UnaryOperator {
  override def mongoOp: String = "$exists"
  override def toBson[T](field: Field[_, T]): BsonDocument =
    BsonDocument(field.path -> BsonDocument(mongoOp -> BsonBoolean(true)))
}

case object IsNull extends UnaryOperator {
  override def mongoOp: String = "$eq"
  override def toBson[T](field: Field[_, T]): BsonDocument =
    BsonDocument(field.path -> BsonNull())
}

sealed trait ManyOperator extends Operator {
  def mongoOp: String
  def toBson[T](field: Field[_, T], values: Seq[T]): BsonDocument
}

case object In extends ManyOperator {
  override def mongoOp: String = "$in"
  override def toBson[T](field: Field[_, T], values: Seq[T]): BsonDocument =
    BsonDocument(
      field.path -> BsonDocument(
        mongoOp -> BsonArray.fromIterable(values.map(field.toBson))
      )
    )
}

case object NotIn extends ManyOperator {
  override def mongoOp: String = "$nin"
  override def toBson[T](field: Field[_, T], values: Seq[T]): BsonDocument =
    BsonDocument(
      field.path -> BsonDocument(
        mongoOp -> BsonArray.fromIterable(values.map(field.toBson))
      )
    )
}

The where method on the query then has three overloads — one per operator shape — and the compiler picks the right one based on whether you passed a value, a list of values, or nothing.

The query builder

ReadQuery is immutable and builder-shaped. Every method returns a new instance with one more filter, sort, or limit attached. We were using Scala 2.12 with Future-based MongoDB driver calls, so execution methods return Future[…]:

class ReadQuery[Model](
  collection: MongoCollection[Model],
  filters: Seq[BsonDocument] = Seq.empty,
  sortOpt: Option[BsonDocument] = None,
  limitOpt: Option[Int] = None,
  skipOpt: Option[Int] = None
) {

  def where[T](field: Field[Model, T], operator: BinaryOperator, value: T): ReadQuery[Model] =
    new ReadQuery(
      collection,
      filters :+ operator.toBson(field, value),
      sortOpt,
      limitOpt,
      skipOpt
    )

  def where[T](field: Field[Model, T], operator: UnaryOperator): ReadQuery[Model] =
    new ReadQuery(
      collection,
      filters :+ operator.toBson(field),
      sortOpt,
      limitOpt,
      skipOpt
    )

  def where[T](
    field: Field[Model, T],
    operator: ManyOperator,
    value1: T,
    remaining: T*
  ): ReadQuery[Model] =
    new ReadQuery(
      collection,
      filters :+ operator.toBson(field, value1 +: remaining),
      sortOpt,
      limitOpt,
      skipOpt
    )

  def orderBy[T](field: Field[Model, T], ascending: Boolean = true): ReadQuery[Model] =
    new ReadQuery(
      collection,
      filters,
      Some(BsonDocument(field.path -> BsonInt32(if (ascending) 1 else -1))),
      limitOpt,
      skipOpt
    )

  def limit(n: Int): ReadQuery[Model] =
    new ReadQuery(collection, filters, sortOpt, Some(n), skipOpt)

  def skip(n: Int): ReadQuery[Model] =
    new ReadQuery(collection, filters, sortOpt, limitOpt, Some(n))

  def fetchOne: Future[Option[Model]] = {
    val query = buildFindIterable
    query.limit(1).first().toFuture().map(Option(_))
  }

  def fetchList: Future[Seq[Model]] = {
    val query = buildFindIterable
    query.toFuture()
  }

  def fetchAll: Future[Seq[Model]] = fetchList

  def count: Future[Long] = {
    val filter = combineFilters
    collection.countDocuments(filter).toFuture()
  }

  private def buildFindIterable: FindIterable[Model] = {
    val filter = combineFilters
    var query = collection.find(filter)

    sortOpt.foreach(sort => query = query.sort(sort))
    limitOpt.foreach(limit => query = query.limit(limit))
    skipOpt.foreach(skip => query = query.skip(skip))

    query
  }

  private def combineFilters: BsonDocument = {
    if (filters.isEmpty) {
      BsonDocument()
    } else if (filters.size == 1) {
      filters.head
    } else {
      BsonDocument("$and" -> BsonArray.fromIterable(filters))
    }
  }
}

Multiple where calls combine with $and. We never had a need to expose $or at the top level — the few places that wanted it were rare enough to drop down to raw BsonDocument for that one query.

At the call site:

Product.repository
  .where(Product.CategoryField, Equal, "electronics")
  .fetchList

Product.repository
  .where(Product.PriceField, GreaterThan, BigDecimal(100))
  .where(Product.PriceField, LessThan, BigDecimal(1000))
  .orderBy(Product.PriceField, ascending = true)
  .fetchList

Product.repository
  .where(Product.CategoryField, In, "electronics", "computers", "gadgets")
  .fetchList

Product.repository
  .where(Product.TagsOptField, IsNull)
  .fetchList

Product.repository
  .where(Product.CategoryField, Equal, "electronics")
  .where(Product.PriceField, GreaterThanOrEqual, BigDecimal(50))
  .where(Product.TagsOptField, Exists)
  .orderBy(Product.CreatedAtField, ascending = false)
  .limit(100)
  .fetchList

Aggregations

Aggregation pipelines have more shapes than find, so the builder is wider but the same idea — accumulate stages, hand them to the driver at the end:

class AggregateQueryBuilder[Model](
  collection: MongoCollection[Model],
  stages: Seq[BsonDocument] = Seq.empty
) {

  def matchStage(filters: BsonDocument): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$match" -> filters)
    )

  def groupBy[T](
    field: Field[Model, T],
    accumulators: (String, BsonDocument)*
  ): AggregateQueryBuilder[Model] = {
    val groupDoc = BsonDocument(
      "_id" -> BsonString(s"$$${field.path}")
    ) ++ BsonDocument(accumulators: _*)

    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$group" -> groupDoc)
    )
  }

  def addFields(fields: (String, BsonValue)*): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$addFields" -> BsonDocument(fields: _*))
    )

  def sort[T](field: Field[Model, T], ascending: Boolean = true): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument(
        "$sort" -> BsonDocument(field.path -> BsonInt32(if (ascending) 1 else -1))
      )
    )

  def limit(n: Int): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$limit" -> BsonInt32(n))
    )

  def skip(n: Int): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$skip" -> BsonInt32(n))
    )

  def lookup(
    from: String,
    localField: String,
    foreignField: String,
    as: String
  ): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument(
        "$lookup" -> BsonDocument(
          "from" -> BsonString(from),
          "localField" -> BsonString(localField),
          "foreignField" -> BsonString(foreignField),
          "as" -> BsonString(as)
        )
      )
    )

  def execute[R](implicit decoder: Decoder[R]): Future[Seq[R]] = {
    collection.aggregate[R](stages).toFuture()
  }
}

$lookup is the one place we chose not to push into the type system. Cross-collection joins want a second Field[OtherModel, _], and at that point you’re modelling foreign keys, which is a different post. The string-typed from/localField/foreignField is the escape hatch, and we accepted it.

Product.repository
  .aggregate
  .groupBy(
    Product.CategoryField,
    "count" -> BsonDocument("$sum" -> BsonInt32(1)),
    "avgPrice" -> BsonDocument("$avg" -> BsonString("$price"))
  )
  .sort(Product.CategoryField, ascending = true)
  .limit(10)
  .execute[CategoryStats]

Element match for nested arrays

Querying inside arrays needs $elemMatch. We didn’t model it as an operator on Field — it takes a sub-document, not a single value, and the type machinery to express “predicate on the element type of a list field” wasn’t worth the compile-time cost. We exposed a couple of helpers per model:

case class Order(
  id: String,
  customerId: String,
  items: List[OrderItem],
  createdAt: DateTime
)

case class OrderItem(
  productId: String,
  quantity: Int,
  price: BigDecimal
)

object Order {
  val ItemsField = Field[Order, List[OrderItem]]("items")

  def itemsWithProductId(productId: String): BsonDocument =
    BsonDocument(
      "items" -> BsonDocument(
        "$elemMatch" -> BsonDocument(
          "productId" -> BsonString(productId)
        )
      )
    )

  def itemsWithQuantityGreaterThan(quantity: Int): BsonDocument =
    BsonDocument(
      "items" -> BsonDocument(
        "$elemMatch" -> BsonDocument(
          "quantity" -> BsonDocument("$gt" -> BsonInt32(quantity))
        )
      )
    )
}

These are the seam where we re-enter the typed world via raw documents. Rare enough to be acceptable.

Pagination

Two pagination shapes ended up in the codebase. Skip/limit for pages we knew were small, cursor-based for anything that could grow:

sealed trait PaginationStrategy
case class Skip(offset: Int) extends PaginationStrategy
case class PageNumber(page: Int, pageSize: Int) extends PaginationStrategy {
  def toSkip: Skip = Skip((page - 1) * pageSize)
}

def fetchPage(strategy: PaginationStrategy, pageSize: Int): Future[Seq[Product]] =
  strategy match {
    case Skip(offset) =>
      Product.repository
        .orderBy(Product.CreatedAtField, ascending = false)
        .limit(pageSize)
        .skip(offset)
        .fetchList

    case PageNumber(page, size) =>
      fetchPage(PageNumber(page, size).toSkip, size)
  }

def fetchAfterCursor(
  lastSeenId: Option[String],
  pageSize: Int
): Future[Seq[Product]] = {
  val query = Product.repository

  val withCursor = lastSeenId match {
    case Some(id) =>
      query.where(Product.IdField, GreaterThan, id)
    case None =>
      query
  }

  withCursor
    .orderBy(Product.IdField, ascending = true)
    .limit(pageSize)
    .fetchList
}

Skip/limit gets ugly past a few thousand documents — Mongo still walks the skipped rows. Cursor-based is the default we’d reach for now.

Repositories

The DSL doesn’t replace repositories; it makes them shorter:

class ProductRepository(collection: MongoCollection[Product]) {
  def findById(id: String): Future[Option[Product]] =
    new ReadQuery(collection)
      .where(Product.IdField, Equal, id)
      .fetchOne

  def findByCategory(category: String, page: Int, pageSize: Int): Future[Seq[Product]] =
    new ReadQuery(collection)
      .where(Product.CategoryField, Equal, category)
      .orderBy(Product.CreatedAtField, ascending = false)
      .limit(pageSize)
      .skip((page - 1) * pageSize)
      .fetchList

  def findExpensiveProducts(minPrice: BigDecimal): Future[Seq[Product]] =
    new ReadQuery(collection)
      .where(Product.PriceField, GreaterThanOrEqual, minPrice)
      .orderBy(Product.PriceField, ascending = false)
      .fetchList

  def searchByName(query: String): Future[Seq[Product]] =
    collection
      .find(BsonDocument("name" -> BsonDocument("$regex" -> BsonString(query), "$options" -> BsonString("i"))))
      .toFuture()

  def countByCategory(category: String): Future[Long] =
    new ReadQuery(collection)
      .where(Product.CategoryField, Equal, category)
      .count
}

searchByName drops out of the DSL on purpose — $regex with options didn’t justify a typed wrapper, and the one place we use it is internal-tools-only.

What we got, what we paid

The DSL is worth building on a long-lived collection that several people touch and rename over time. It’s not worth building for a single-purpose service with three fixed queries that won’t change — there, the string queries are honest and the indirection costs more than it saves.

The cost is mostly compile time. Implicit resolution for StrConverter instances is fine, but stack a few of them inside Option[List[Foo]] and Scala 2.12 starts taking longer than I’d like. We capped nested generics at one level for that reason. In Scala 3 the implicit machinery is faster and a lot of this would be cleaner with given and using, but we never ported it.

The benefit, for us, was the boring kind: renames stopped being scary. The compiler tells you which queries broke when you change a field, and the diff for adding a new query is short enough to read in code review without squinting at JSON.

Back to Blog

Related Posts

View All Posts »

We Tripled the Test Suite. Then Everything Else Had to Change.

I started the quarter trying to raise test coverage on one monorepo. I finished it having also rewritten the test pipeline, the coverage gate, the deploy workflows, and most of the backend's dependency stack. None of it was on the ticket. All of it followed from the ticket.

Some Decisions Aren't Decisions

Someone senior pushed back on why we'd isolated our callback servers instead of just scaling the API vertically. I stopped arguing mid-explanation — not because he was right, but because I couldn't put words to defaults my team had long since stopped questioning.