Designing type-safe query DSLs in Scala
The first time a field rename took down a query in production, the diff looked harmless. Someone renamed price to basePrice somewhere in the model, the unit tests passed, the integration tests passed, and a dashboard quietly started returning empty arrays a few hours later. The query string still referenced "price". There was no compile error to catch, because there was nothing for the compiler to check. It was just text inside a BsonDocument.
That’s the case for moving queries into the type system. The MongoDB Scala driver is happy to accept this:
collection.find(
BsonDocument("""{
"userId": "12345",
"status": { "$in": ["active", "pending"] },
"createdAt": { "$gte": { "$date": "2024-01-01T00:00:00Z" } }
}""")
)Field name typos go undetected until runtime. A string where an int was expected gives you a cryptic Bson error. Rename a field on the case class and the JSON literal sails through unchanged. None of the IDE features that make refactoring tolerable — autocomplete, jump-to-definition, find-usages — work on text inside a triple-quoted string.
This post is the shape of the DSL we ended up with. It’s MongoDB-specific, and it’s not novel — Slick, Quill, and Phantom solve adjacent problems for other stores — but it’s small enough to build in-house and worth understanding from the bottom up.
Field references with types
The foundation is a field reference parameterised by both the model it belongs to and the value type it holds:
case class Field[Model, FieldType](
name: String,
converter: StrConverter[FieldType]
) {
def path: String = name
def toBson(value: FieldType): BsonValue =
converter.toBson(value)
}
object Field {
def apply[M, T](name: String)(implicit conv: StrConverter[T]): Field[M, T] =
new Field[M, T](name, conv)
}Model is a phantom type — it never appears at runtime, but it lets us refuse to compile a query that mixes Product fields into an Order query. FieldType is what carries the type-safety we actually wanted: when you say where(Product.PriceField, GreaterThan, x), the compiler insists x is a BigDecimal.
The conversion to BSON is delegated to a StrConverter type class, instances of which we summon implicitly:
trait StrConverter[T] {
def fromString(str: String): T
def toString(value: T): String
def toBson(value: T): BsonValue
def fromBson(bson: BsonValue): T
}
object StrConverter {
implicit val stringConverter: StrConverter[String] = new StrConverter[String] {
override def fromString(str: String): String = str
override def toString(value: String): String = value
override def toBson(value: String): BsonValue = BsonString(value)
override def fromBson(bson: BsonValue): String = bson.asString().getValue
}
implicit val longConverter: StrConverter[Long] = new StrConverter[Long] {
override def fromString(str: String): Long = str.toLong
override def toString(value: Long): String = value.toString
override def toBson(value: Long): BsonValue = BsonInt64(value)
override def fromBson(bson: BsonValue): Long = bson.asInt64().getValue
}
implicit val dateTimeConverter: StrConverter[DateTime] = new StrConverter[DateTime] {
override def fromString(str: String): DateTime = DateTime.parse(str)
override def toString(value: DateTime): String = value.toString()
override def toBson(value: DateTime): BsonValue = BsonDateTime(value.getMillis)
override def fromBson(bson: BsonValue): DateTime = new DateTime(bson.asDateTime().getValue)
}
implicit def optionConverter[T](implicit inner: StrConverter[T]): StrConverter[Option[T]] =
new StrConverter[Option[T]] {
override def fromString(str: String): Option[T] = Some(inner.fromString(str))
override def toString(value: Option[T]): String = value.map(inner.toString).getOrElse("")
override def toBson(value: Option[T]): BsonValue =
value.map(inner.toBson).getOrElse(BsonNull())
override def fromBson(bson: BsonValue): Option[T] =
if (bson.isNull) None else Some(inner.fromBson(bson))
}
}Field definitions live on the model’s companion object. One file, one schema:
case class Product(
id: String,
name: String,
price: BigDecimal,
category: String,
createdAt: DateTime,
tagsOpt: Option[List[String]]
)
object Product {
val IdField = Field[Product, String]("_id")
val NameField = Field[Product, String]("name")
val PriceField = Field[Product, BigDecimal]("price")
val CategoryField = Field[Product, String]("category")
val CreatedAtField = Field[Product, DateTime]("created_at")
val TagsOptField = Field[Product, Option[List[String]]]("tags")
// Nested field access
val PricingBaseField = Field[Product, BigDecimal]("pricing.basePrice")
}The dot-path string for nested fields is the one place we couldn’t push everything into the type system — Mongo’s nested syntax is just a string, and chasing it deeper than this got expensive in compile time without much payoff.
Operators
Three shapes cover most of what we need from Mongo: binary (field op value), unary (field op), and many (field op [values]). Sealed traits make the dispatch on where(...) total:
sealed trait Operator
sealed trait BinaryOperator extends Operator {
def mongoOp: String
def toBson[T](field: Field[_, T], value: T): BsonDocument
}
case object Equal extends BinaryOperator {
override def mongoOp: String = "$eq"
override def toBson[T](field: Field[_, T], value: T): BsonDocument =
BsonDocument(field.path -> field.toBson(value))
}
case object GreaterThan extends BinaryOperator {
override def mongoOp: String = "$gt"
override def toBson[T](field: Field[_, T], value: T): BsonDocument =
BsonDocument(field.path -> BsonDocument(mongoOp -> field.toBson(value)))
}
case object LessThan extends BinaryOperator {
override def mongoOp: String = "$lt"
override def toBson[T](field: Field[_, T], value: T): BsonDocument =
BsonDocument(field.path -> BsonDocument(mongoOp -> field.toBson(value)))
}
case object GreaterThanOrEqual extends BinaryOperator {
override def mongoOp: String = "$gte"
override def toBson[T](field: Field[_, T], value: T): BsonDocument =
BsonDocument(field.path -> BsonDocument(mongoOp -> field.toBson(value)))
}
sealed trait UnaryOperator extends Operator {
def mongoOp: String
def toBson[T](field: Field[_, T]): BsonDocument
}
case object Exists extends UnaryOperator {
override def mongoOp: String = "$exists"
override def toBson[T](field: Field[_, T]): BsonDocument =
BsonDocument(field.path -> BsonDocument(mongoOp -> BsonBoolean(true)))
}
case object IsNull extends UnaryOperator {
override def mongoOp: String = "$eq"
override def toBson[T](field: Field[_, T]): BsonDocument =
BsonDocument(field.path -> BsonNull())
}
sealed trait ManyOperator extends Operator {
def mongoOp: String
def toBson[T](field: Field[_, T], values: Seq[T]): BsonDocument
}
case object In extends ManyOperator {
override def mongoOp: String = "$in"
override def toBson[T](field: Field[_, T], values: Seq[T]): BsonDocument =
BsonDocument(
field.path -> BsonDocument(
mongoOp -> BsonArray.fromIterable(values.map(field.toBson))
)
)
}
case object NotIn extends ManyOperator {
override def mongoOp: String = "$nin"
override def toBson[T](field: Field[_, T], values: Seq[T]): BsonDocument =
BsonDocument(
field.path -> BsonDocument(
mongoOp -> BsonArray.fromIterable(values.map(field.toBson))
)
)
}The where method on the query then has three overloads — one per operator shape — and the compiler picks the right one based on whether you passed a value, a list of values, or nothing.
The query builder
ReadQuery is immutable and builder-shaped. Every method returns a new instance with one more filter, sort, or limit attached. We were using Scala 2.12 with Future-based MongoDB driver calls, so execution methods return Future[…]:
class ReadQuery[Model](
collection: MongoCollection[Model],
filters: Seq[BsonDocument] = Seq.empty,
sortOpt: Option[BsonDocument] = None,
limitOpt: Option[Int] = None,
skipOpt: Option[Int] = None
) {
def where[T](field: Field[Model, T], operator: BinaryOperator, value: T): ReadQuery[Model] =
new ReadQuery(
collection,
filters :+ operator.toBson(field, value),
sortOpt,
limitOpt,
skipOpt
)
def where[T](field: Field[Model, T], operator: UnaryOperator): ReadQuery[Model] =
new ReadQuery(
collection,
filters :+ operator.toBson(field),
sortOpt,
limitOpt,
skipOpt
)
def where[T](
field: Field[Model, T],
operator: ManyOperator,
value1: T,
remaining: T*
): ReadQuery[Model] =
new ReadQuery(
collection,
filters :+ operator.toBson(field, value1 +: remaining),
sortOpt,
limitOpt,
skipOpt
)
def orderBy[T](field: Field[Model, T], ascending: Boolean = true): ReadQuery[Model] =
new ReadQuery(
collection,
filters,
Some(BsonDocument(field.path -> BsonInt32(if (ascending) 1 else -1))),
limitOpt,
skipOpt
)
def limit(n: Int): ReadQuery[Model] =
new ReadQuery(collection, filters, sortOpt, Some(n), skipOpt)
def skip(n: Int): ReadQuery[Model] =
new ReadQuery(collection, filters, sortOpt, limitOpt, Some(n))
def fetchOne: Future[Option[Model]] = {
val query = buildFindIterable
query.limit(1).first().toFuture().map(Option(_))
}
def fetchList: Future[Seq[Model]] = {
val query = buildFindIterable
query.toFuture()
}
def fetchAll: Future[Seq[Model]] = fetchList
def count: Future[Long] = {
val filter = combineFilters
collection.countDocuments(filter).toFuture()
}
private def buildFindIterable: FindIterable[Model] = {
val filter = combineFilters
var query = collection.find(filter)
sortOpt.foreach(sort => query = query.sort(sort))
limitOpt.foreach(limit => query = query.limit(limit))
skipOpt.foreach(skip => query = query.skip(skip))
query
}
private def combineFilters: BsonDocument = {
if (filters.isEmpty) {
BsonDocument()
} else if (filters.size == 1) {
filters.head
} else {
BsonDocument("$and" -> BsonArray.fromIterable(filters))
}
}
}Multiple where calls combine with $and. We never had a need to expose $or at the top level — the few places that wanted it were rare enough to drop down to raw BsonDocument for that one query.
At the call site:
Product.repository
.where(Product.CategoryField, Equal, "electronics")
.fetchList
Product.repository
.where(Product.PriceField, GreaterThan, BigDecimal(100))
.where(Product.PriceField, LessThan, BigDecimal(1000))
.orderBy(Product.PriceField, ascending = true)
.fetchList
Product.repository
.where(Product.CategoryField, In, "electronics", "computers", "gadgets")
.fetchList
Product.repository
.where(Product.TagsOptField, IsNull)
.fetchList
Product.repository
.where(Product.CategoryField, Equal, "electronics")
.where(Product.PriceField, GreaterThanOrEqual, BigDecimal(50))
.where(Product.TagsOptField, Exists)
.orderBy(Product.CreatedAtField, ascending = false)
.limit(100)
.fetchListAggregations
Aggregation pipelines have more shapes than find, so the builder is wider but the same idea — accumulate stages, hand them to the driver at the end:
class AggregateQueryBuilder[Model](
collection: MongoCollection[Model],
stages: Seq[BsonDocument] = Seq.empty
) {
def matchStage(filters: BsonDocument): AggregateQueryBuilder[Model] =
new AggregateQueryBuilder(
collection,
stages :+ BsonDocument("$match" -> filters)
)
def groupBy[T](
field: Field[Model, T],
accumulators: (String, BsonDocument)*
): AggregateQueryBuilder[Model] = {
val groupDoc = BsonDocument(
"_id" -> BsonString(s"$$${field.path}")
) ++ BsonDocument(accumulators: _*)
new AggregateQueryBuilder(
collection,
stages :+ BsonDocument("$group" -> groupDoc)
)
}
def addFields(fields: (String, BsonValue)*): AggregateQueryBuilder[Model] =
new AggregateQueryBuilder(
collection,
stages :+ BsonDocument("$addFields" -> BsonDocument(fields: _*))
)
def sort[T](field: Field[Model, T], ascending: Boolean = true): AggregateQueryBuilder[Model] =
new AggregateQueryBuilder(
collection,
stages :+ BsonDocument(
"$sort" -> BsonDocument(field.path -> BsonInt32(if (ascending) 1 else -1))
)
)
def limit(n: Int): AggregateQueryBuilder[Model] =
new AggregateQueryBuilder(
collection,
stages :+ BsonDocument("$limit" -> BsonInt32(n))
)
def skip(n: Int): AggregateQueryBuilder[Model] =
new AggregateQueryBuilder(
collection,
stages :+ BsonDocument("$skip" -> BsonInt32(n))
)
def lookup(
from: String,
localField: String,
foreignField: String,
as: String
): AggregateQueryBuilder[Model] =
new AggregateQueryBuilder(
collection,
stages :+ BsonDocument(
"$lookup" -> BsonDocument(
"from" -> BsonString(from),
"localField" -> BsonString(localField),
"foreignField" -> BsonString(foreignField),
"as" -> BsonString(as)
)
)
)
def execute[R](implicit decoder: Decoder[R]): Future[Seq[R]] = {
collection.aggregate[R](stages).toFuture()
}
}$lookup is the one place we chose not to push into the type system. Cross-collection joins want a second Field[OtherModel, _], and at that point you’re modelling foreign keys, which is a different post. The string-typed from/localField/foreignField is the escape hatch, and we accepted it.
Product.repository
.aggregate
.groupBy(
Product.CategoryField,
"count" -> BsonDocument("$sum" -> BsonInt32(1)),
"avgPrice" -> BsonDocument("$avg" -> BsonString("$price"))
)
.sort(Product.CategoryField, ascending = true)
.limit(10)
.execute[CategoryStats]Element match for nested arrays
Querying inside arrays needs $elemMatch. We didn’t model it as an operator on Field — it takes a sub-document, not a single value, and the type machinery to express “predicate on the element type of a list field” wasn’t worth the compile-time cost. We exposed a couple of helpers per model:
case class Order(
id: String,
customerId: String,
items: List[OrderItem],
createdAt: DateTime
)
case class OrderItem(
productId: String,
quantity: Int,
price: BigDecimal
)
object Order {
val ItemsField = Field[Order, List[OrderItem]]("items")
def itemsWithProductId(productId: String): BsonDocument =
BsonDocument(
"items" -> BsonDocument(
"$elemMatch" -> BsonDocument(
"productId" -> BsonString(productId)
)
)
)
def itemsWithQuantityGreaterThan(quantity: Int): BsonDocument =
BsonDocument(
"items" -> BsonDocument(
"$elemMatch" -> BsonDocument(
"quantity" -> BsonDocument("$gt" -> BsonInt32(quantity))
)
)
)
}These are the seam where we re-enter the typed world via raw documents. Rare enough to be acceptable.
Pagination
Two pagination shapes ended up in the codebase. Skip/limit for pages we knew were small, cursor-based for anything that could grow:
sealed trait PaginationStrategy
case class Skip(offset: Int) extends PaginationStrategy
case class PageNumber(page: Int, pageSize: Int) extends PaginationStrategy {
def toSkip: Skip = Skip((page - 1) * pageSize)
}
def fetchPage(strategy: PaginationStrategy, pageSize: Int): Future[Seq[Product]] =
strategy match {
case Skip(offset) =>
Product.repository
.orderBy(Product.CreatedAtField, ascending = false)
.limit(pageSize)
.skip(offset)
.fetchList
case PageNumber(page, size) =>
fetchPage(PageNumber(page, size).toSkip, size)
}
def fetchAfterCursor(
lastSeenId: Option[String],
pageSize: Int
): Future[Seq[Product]] = {
val query = Product.repository
val withCursor = lastSeenId match {
case Some(id) =>
query.where(Product.IdField, GreaterThan, id)
case None =>
query
}
withCursor
.orderBy(Product.IdField, ascending = true)
.limit(pageSize)
.fetchList
}Skip/limit gets ugly past a few thousand documents — Mongo still walks the skipped rows. Cursor-based is the default we’d reach for now.
Repositories
The DSL doesn’t replace repositories; it makes them shorter:
class ProductRepository(collection: MongoCollection[Product]) {
def findById(id: String): Future[Option[Product]] =
new ReadQuery(collection)
.where(Product.IdField, Equal, id)
.fetchOne
def findByCategory(category: String, page: Int, pageSize: Int): Future[Seq[Product]] =
new ReadQuery(collection)
.where(Product.CategoryField, Equal, category)
.orderBy(Product.CreatedAtField, ascending = false)
.limit(pageSize)
.skip((page - 1) * pageSize)
.fetchList
def findExpensiveProducts(minPrice: BigDecimal): Future[Seq[Product]] =
new ReadQuery(collection)
.where(Product.PriceField, GreaterThanOrEqual, minPrice)
.orderBy(Product.PriceField, ascending = false)
.fetchList
def searchByName(query: String): Future[Seq[Product]] =
collection
.find(BsonDocument("name" -> BsonDocument("$regex" -> BsonString(query), "$options" -> BsonString("i"))))
.toFuture()
def countByCategory(category: String): Future[Long] =
new ReadQuery(collection)
.where(Product.CategoryField, Equal, category)
.count
}searchByName drops out of the DSL on purpose — $regex with options didn’t justify a typed wrapper, and the one place we use it is internal-tools-only.
What we got, what we paid
The DSL is worth building on a long-lived collection that several people touch and rename over time. It’s not worth building for a single-purpose service with three fixed queries that won’t change — there, the string queries are honest and the indirection costs more than it saves.
The cost is mostly compile time. Implicit resolution for StrConverter instances is fine, but stack a few of them inside Option[List[Foo]] and Scala 2.12 starts taking longer than I’d like. We capped nested generics at one level for that reason. In Scala 3 the implicit machinery is faster and a lot of this would be cleaner with given and using, but we never ported it.
The benefit, for us, was the boring kind: renames stopped being scary. The compiler tells you which queries broke when you change a field, and the diff for adding a new query is short enough to read in code review without squinting at JSON.