Designing type-safe query DSLs in Scala

The first time a field rename took down a query in production, the diff looked harmless. Someone renamed price to basePrice somewhere in the model, the unit tests passed, the integration tests passed, and a dashboard quietly started returning empty arrays a few hours later. The query string still referenced "price". There was no compile error to catch, because there was nothing for the compiler to check. It was just text inside a BsonDocument.

That’s the case for moving queries into the type system. The MongoDB Scala driver is happy to accept this:

collection.find(
  BsonDocument("""{
    "userId": "12345",
    "status": { "$in": ["active", "pending"] },
    "createdAt": { "$gte": { "$date": "2024-01-01T00:00:00Z" } }
  }""")
)

Field name typos go undetected until runtime. A string where an int was expected gives you a cryptic Bson error. Rename a field on the case class and the JSON literal sails through unchanged. None of the IDE features that make refactoring tolerable — autocomplete, jump-to-definition, find-usages — work on text inside a triple-quoted string.

This post is the shape of the DSL we ended up with. It’s MongoDB-specific, and it’s not novel — Slick, Quill, and Phantom solve adjacent problems for other stores — but it’s small enough to build in-house and worth understanding from the bottom up.

Field references with types

The foundation is a field reference parameterised by both the model it belongs to and the value type it holds:

case class Field[Model, FieldType](
  name: String,
  converter: StrConverter[FieldType]
) {
  def path: String = name

  def toBson(value: FieldType): BsonValue =
    converter.toBson(value)
}

object Field {
  def apply[M, T](name: String)(implicit conv: StrConverter[T]): Field[M, T] =
    new Field[M, T](name, conv)
}

Model is a phantom type — it never appears at runtime, but it lets us refuse to compile a query that mixes Product fields into an Order query. FieldType is what carries the type-safety we actually wanted: when you say where(Product.PriceField, GreaterThan, x), the compiler insists x is a BigDecimal.

The conversion to BSON is delegated to a StrConverter type class, instances of which we summon implicitly:

trait StrConverter[T] {
  def fromString(str: String): T
  def toString(value: T): String
  def toBson(value: T): BsonValue
  def fromBson(bson: BsonValue): T
}

object StrConverter {
  implicit val stringConverter: StrConverter[String] = new StrConverter[String] {
    override def fromString(str: String): String = str
    override def toString(value: String): String = value
    override def toBson(value: String): BsonValue = BsonString(value)
    override def fromBson(bson: BsonValue): String = bson.asString().getValue
  }

  implicit val longConverter: StrConverter[Long] = new StrConverter[Long] {
    override def fromString(str: String): Long = str.toLong
    override def toString(value: Long): String = value.toString
    override def toBson(value: Long): BsonValue = BsonInt64(value)
    override def fromBson(bson: BsonValue): Long = bson.asInt64().getValue
  }

  implicit val dateTimeConverter: StrConverter[DateTime] = new StrConverter[DateTime] {
    override def fromString(str: String): DateTime = DateTime.parse(str)
    override def toString(value: DateTime): String = value.toString()
    override def toBson(value: DateTime): BsonValue = BsonDateTime(value.getMillis)
    override def fromBson(bson: BsonValue): DateTime = new DateTime(bson.asDateTime().getValue)
  }

  implicit def optionConverter[T](implicit inner: StrConverter[T]): StrConverter[Option[T]] =
    new StrConverter[Option[T]] {
      override def fromString(str: String): Option[T] = Some(inner.fromString(str))
      override def toString(value: Option[T]): String = value.map(inner.toString).getOrElse("")
      override def toBson(value: Option[T]): BsonValue =
        value.map(inner.toBson).getOrElse(BsonNull())
      override def fromBson(bson: BsonValue): Option[T] =
        if (bson.isNull) None else Some(inner.fromBson(bson))
    }
}

Field definitions live on the model’s companion object. One file, one schema:

case class Product(
  id: String,
  name: String,
  price: BigDecimal,
  category: String,
  createdAt: DateTime,
  tagsOpt: Option[List[String]]
)

object Product {
  val IdField = Field[Product, String]("_id")
  val NameField = Field[Product, String]("name")
  val PriceField = Field[Product, BigDecimal]("price")
  val CategoryField = Field[Product, String]("category")
  val CreatedAtField = Field[Product, DateTime]("created_at")
  val TagsOptField = Field[Product, Option[List[String]]]("tags")

  // Nested field access
  val PricingBaseField = Field[Product, BigDecimal]("pricing.basePrice")
}

The dot-path string for nested fields is the one place we couldn’t push everything into the type system — Mongo’s nested syntax is just a string, and chasing it deeper than this got expensive in compile time without much payoff.

Operators

Three shapes cover most of what we need from Mongo: binary (field op value), unary (field op), and many (field op [values]). Sealed traits make the dispatch on where(...) total:

sealed trait Operator

sealed trait BinaryOperator extends Operator {
  def mongoOp: String
  def toBson[T](field: Field[_, T], value: T): BsonDocument
}

case object Equal extends BinaryOperator {
  override def mongoOp: String = "$eq"
  override def toBson[T](field: Field[_, T], value: T): BsonDocument =
    BsonDocument(field.path -> field.toBson(value))
}

case object GreaterThan extends BinaryOperator {
  override def mongoOp: String = "$gt"
  override def toBson[T](field: Field[_, T], value: T): BsonDocument =
    BsonDocument(field.path -> BsonDocument(mongoOp -> field.toBson(value)))
}

case object LessThan extends BinaryOperator {
  override def mongoOp: String = "$lt"
  override def toBson[T](field: Field[_, T], value: T): BsonDocument =
    BsonDocument(field.path -> BsonDocument(mongoOp -> field.toBson(value)))
}

case object GreaterThanOrEqual extends BinaryOperator {
  override def mongoOp: String = "$gte"
  override def toBson[T](field: Field[_, T], value: T): BsonDocument =
    BsonDocument(field.path -> BsonDocument(mongoOp -> field.toBson(value)))
}

sealed trait UnaryOperator extends Operator {
  def mongoOp: String
  def toBson[T](field: Field[_, T]): BsonDocument
}

case object Exists extends UnaryOperator {
  override def mongoOp: String = "$exists"
  override def toBson[T](field: Field[_, T]): BsonDocument =
    BsonDocument(field.path -> BsonDocument(mongoOp -> BsonBoolean(true)))
}

case object IsNull extends UnaryOperator {
  override def mongoOp: String = "$eq"
  override def toBson[T](field: Field[_, T]): BsonDocument =
    BsonDocument(field.path -> BsonNull())
}

sealed trait ManyOperator extends Operator {
  def mongoOp: String
  def toBson[T](field: Field[_, T], values: Seq[T]): BsonDocument
}

case object In extends ManyOperator {
  override def mongoOp: String = "$in"
  override def toBson[T](field: Field[_, T], values: Seq[T]): BsonDocument =
    BsonDocument(
      field.path -> BsonDocument(
        mongoOp -> BsonArray.fromIterable(values.map(field.toBson))
      )
    )
}

case object NotIn extends ManyOperator {
  override def mongoOp: String = "$nin"
  override def toBson[T](field: Field[_, T], values: Seq[T]): BsonDocument =
    BsonDocument(
      field.path -> BsonDocument(
        mongoOp -> BsonArray.fromIterable(values.map(field.toBson))
      )
    )
}

The where method on the query then has three overloads — one per operator shape — and the compiler picks the right one based on whether you passed a value, a list of values, or nothing.

The query builder

ReadQuery is immutable and builder-shaped. Every method returns a new instance with one more filter, sort, or limit attached. We were using Scala 2.12 with Future-based MongoDB driver calls, so execution methods return Future[…]:

class ReadQuery[Model](
  collection: MongoCollection[Model],
  filters: Seq[BsonDocument] = Seq.empty,
  sortOpt: Option[BsonDocument] = None,
  limitOpt: Option[Int] = None,
  skipOpt: Option[Int] = None
) {

  def where[T](field: Field[Model, T], operator: BinaryOperator, value: T): ReadQuery[Model] =
    new ReadQuery(
      collection,
      filters :+ operator.toBson(field, value),
      sortOpt,
      limitOpt,
      skipOpt
    )

  def where[T](field: Field[Model, T], operator: UnaryOperator): ReadQuery[Model] =
    new ReadQuery(
      collection,
      filters :+ operator.toBson(field),
      sortOpt,
      limitOpt,
      skipOpt
    )

  def where[T](
    field: Field[Model, T],
    operator: ManyOperator,
    value1: T,
    remaining: T*
  ): ReadQuery[Model] =
    new ReadQuery(
      collection,
      filters :+ operator.toBson(field, value1 +: remaining),
      sortOpt,
      limitOpt,
      skipOpt
    )

  def orderBy[T](field: Field[Model, T], ascending: Boolean = true): ReadQuery[Model] =
    new ReadQuery(
      collection,
      filters,
      Some(BsonDocument(field.path -> BsonInt32(if (ascending) 1 else -1))),
      limitOpt,
      skipOpt
    )

  def limit(n: Int): ReadQuery[Model] =
    new ReadQuery(collection, filters, sortOpt, Some(n), skipOpt)

  def skip(n: Int): ReadQuery[Model] =
    new ReadQuery(collection, filters, sortOpt, limitOpt, Some(n))

  def fetchOne: Future[Option[Model]] = {
    val query = buildFindIterable
    query.limit(1).first().toFuture().map(Option(_))
  }

  def fetchList: Future[Seq[Model]] = {
    val query = buildFindIterable
    query.toFuture()
  }

  def fetchAll: Future[Seq[Model]] = fetchList

  def count: Future[Long] = {
    val filter = combineFilters
    collection.countDocuments(filter).toFuture()
  }

  private def buildFindIterable: FindIterable[Model] = {
    val filter = combineFilters
    var query = collection.find(filter)

    sortOpt.foreach(sort => query = query.sort(sort))
    limitOpt.foreach(limit => query = query.limit(limit))
    skipOpt.foreach(skip => query = query.skip(skip))

    query
  }

  private def combineFilters: BsonDocument = {
    if (filters.isEmpty) {
      BsonDocument()
    } else if (filters.size == 1) {
      filters.head
    } else {
      BsonDocument("$and" -> BsonArray.fromIterable(filters))
    }
  }
}

Multiple where calls combine with $and. We never had a need to expose $or at the top level — the few places that wanted it were rare enough to drop down to raw BsonDocument for that one query.

At the call site:

Product.repository
  .where(Product.CategoryField, Equal, "electronics")
  .fetchList

Product.repository
  .where(Product.PriceField, GreaterThan, BigDecimal(100))
  .where(Product.PriceField, LessThan, BigDecimal(1000))
  .orderBy(Product.PriceField, ascending = true)
  .fetchList

Product.repository
  .where(Product.CategoryField, In, "electronics", "computers", "gadgets")
  .fetchList

Product.repository
  .where(Product.TagsOptField, IsNull)
  .fetchList

Product.repository
  .where(Product.CategoryField, Equal, "electronics")
  .where(Product.PriceField, GreaterThanOrEqual, BigDecimal(50))
  .where(Product.TagsOptField, Exists)
  .orderBy(Product.CreatedAtField, ascending = false)
  .limit(100)
  .fetchList

Aggregations

Aggregation pipelines have more shapes than find, so the builder is wider but the same idea — accumulate stages, hand them to the driver at the end:

class AggregateQueryBuilder[Model](
  collection: MongoCollection[Model],
  stages: Seq[BsonDocument] = Seq.empty
) {

  def matchStage(filters: BsonDocument): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$match" -> filters)
    )

  def groupBy[T](
    field: Field[Model, T],
    accumulators: (String, BsonDocument)*
  ): AggregateQueryBuilder[Model] = {
    val groupDoc = BsonDocument(
      "_id" -> BsonString(s"$$${field.path}")
    ) ++ BsonDocument(accumulators: _*)

    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$group" -> groupDoc)
    )
  }

  def addFields(fields: (String, BsonValue)*): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$addFields" -> BsonDocument(fields: _*))
    )

  def sort[T](field: Field[Model, T], ascending: Boolean = true): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument(
        "$sort" -> BsonDocument(field.path -> BsonInt32(if (ascending) 1 else -1))
      )
    )

  def limit(n: Int): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$limit" -> BsonInt32(n))
    )

  def skip(n: Int): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument("$skip" -> BsonInt32(n))
    )

  def lookup(
    from: String,
    localField: String,
    foreignField: String,
    as: String
  ): AggregateQueryBuilder[Model] =
    new AggregateQueryBuilder(
      collection,
      stages :+ BsonDocument(
        "$lookup" -> BsonDocument(
          "from" -> BsonString(from),
          "localField" -> BsonString(localField),
          "foreignField" -> BsonString(foreignField),
          "as" -> BsonString(as)
        )
      )
    )

  def execute[R](implicit decoder: Decoder[R]): Future[Seq[R]] = {
    collection.aggregate[R](stages).toFuture()
  }
}

$lookup is the one place we chose not to push into the type system. Cross-collection joins want a second Field[OtherModel, _], and at that point you’re modelling foreign keys, which is a different post. The string-typed from/localField/foreignField is the escape hatch, and we accepted it.

Product.repository
  .aggregate
  .groupBy(
    Product.CategoryField,
    "count" -> BsonDocument("$sum" -> BsonInt32(1)),
    "avgPrice" -> BsonDocument("$avg" -> BsonString("$price"))
  )
  .sort(Product.CategoryField, ascending = true)
  .limit(10)
  .execute[CategoryStats]

Element match for nested arrays

Querying inside arrays needs $elemMatch. We didn’t model it as an operator on Field — it takes a sub-document, not a single value, and the type machinery to express “predicate on the element type of a list field” wasn’t worth the compile-time cost. We exposed a couple of helpers per model:

case class Order(
  id: String,
  customerId: String,
  items: List[OrderItem],
  createdAt: DateTime
)

case class OrderItem(
  productId: String,
  quantity: Int,
  price: BigDecimal
)

object Order {
  val ItemsField = Field[Order, List[OrderItem]]("items")

  def itemsWithProductId(productId: String): BsonDocument =
    BsonDocument(
      "items" -> BsonDocument(
        "$elemMatch" -> BsonDocument(
          "productId" -> BsonString(productId)
        )
      )
    )

  def itemsWithQuantityGreaterThan(quantity: Int): BsonDocument =
    BsonDocument(
      "items" -> BsonDocument(
        "$elemMatch" -> BsonDocument(
          "quantity" -> BsonDocument("$gt" -> BsonInt32(quantity))
        )
      )
    )
}

These are the seam where we re-enter the typed world via raw documents. Rare enough to be acceptable.

Pagination

Two pagination shapes ended up in the codebase. Skip/limit for pages we knew were small, cursor-based for anything that could grow:

sealed trait PaginationStrategy
case class Skip(offset: Int) extends PaginationStrategy
case class PageNumber(page: Int, pageSize: Int) extends PaginationStrategy {
  def toSkip: Skip = Skip((page - 1) * pageSize)
}

def fetchPage(strategy: PaginationStrategy, pageSize: Int): Future[Seq[Product]] =
  strategy match {
    case Skip(offset) =>
      Product.repository
        .orderBy(Product.CreatedAtField, ascending = false)
        .limit(pageSize)
        .skip(offset)
        .fetchList

    case PageNumber(page, size) =>
      fetchPage(PageNumber(page, size).toSkip, size)
  }

def fetchAfterCursor(
  lastSeenId: Option[String],
  pageSize: Int
): Future[Seq[Product]] = {
  val query = Product.repository

  val withCursor = lastSeenId match {
    case Some(id) =>
      query.where(Product.IdField, GreaterThan, id)
    case None =>
      query
  }

  withCursor
    .orderBy(Product.IdField, ascending = true)
    .limit(pageSize)
    .fetchList
}

Skip/limit gets ugly past a few thousand documents — Mongo still walks the skipped rows. Cursor-based is the default we’d reach for now.

Repositories

The DSL doesn’t replace repositories; it makes them shorter:

class ProductRepository(collection: MongoCollection[Product]) {
  def findById(id: String): Future[Option[Product]] =
    new ReadQuery(collection)
      .where(Product.IdField, Equal, id)
      .fetchOne

  def findByCategory(category: String, page: Int, pageSize: Int): Future[Seq[Product]] =
    new ReadQuery(collection)
      .where(Product.CategoryField, Equal, category)
      .orderBy(Product.CreatedAtField, ascending = false)
      .limit(pageSize)
      .skip((page - 1) * pageSize)
      .fetchList

  def findExpensiveProducts(minPrice: BigDecimal): Future[Seq[Product]] =
    new ReadQuery(collection)
      .where(Product.PriceField, GreaterThanOrEqual, minPrice)
      .orderBy(Product.PriceField, ascending = false)
      .fetchList

  def searchByName(query: String): Future[Seq[Product]] =
    collection
      .find(BsonDocument("name" -> BsonDocument("$regex" -> BsonString(query), "$options" -> BsonString("i"))))
      .toFuture()

  def countByCategory(category: String): Future[Long] =
    new ReadQuery(collection)
      .where(Product.CategoryField, Equal, category)
      .count
}

searchByName drops out of the DSL on purpose — $regex with options didn’t justify a typed wrapper, and the one place we use it is internal-tools-only.

What we got, what we paid

The DSL is worth building on a long-lived collection that several people touch and rename over time. It’s not worth building for a single-purpose service with three fixed queries that won’t change — there, the string queries are honest and the indirection costs more than it saves.

The cost is mostly compile time. Implicit resolution for StrConverter instances is fine, but stack a few of them inside Option[List[Foo]] and Scala 2.12 starts taking longer than I’d like. We capped nested generics at one level for that reason. In Scala 3 the implicit machinery is faster and a lot of this would be cleaner with given and using, but we never ported it.

The benefit, for us, was the boring kind: renames stopped being scary. The compiler tells you which queries broke when you change a field, and the diff for adding a new query is short enough to read in code review without squinting at JSON.

Designing Type-Safe Query DSLs in Scala

Designing type-safe query DSLs in Scala

Field references with types

Operators

The query builder

Aggregations

Element match for nested arrays

Repositories

What we got, what we paid

Related Posts

Multi-Tenant Database Architecture Patterns for SaaS Platforms

We Tripled the Test Suite. Then Everything Else Had to Change.

Some Decisions Aren't Decisions

Modernizing a Production Frontend: 10x Faster Builds in 2 Weeks