2017-06-09

An Interesting Use for Type Classes in Scala

This is a technical writeup of the major new feature in the longevity 0.23 release.

There are more than enough writeups on using type classes in Scala already. I normally wouldn't bother to write about it, but I will now, for three reasons. First, the usage of type classes in longevity seems unique to me; I haven't seen type classes used quite this way before. Second, I want to write this up as documentation for any future contributors to the longevity project. Third, I'd really like to get your feedback into what I've done here, and hear your advice about how I might further improve on things.

The Old Way

The best way to get started is to look at how things in longevity worked before I introduced the type classes. I'm going to gloss over a lot of stuff here, and even present code anachronistically, (i.e., present code features that never actually lived together in the same commit), so as to keep things simple and avoid bringing up any issues that are not necessary for understanding the case at hand.

Domain Model Elements

In longevity, we talk a lot about our domain model - these are all the Scala traits and case classes that represent the data we want to persist. We divide these into three categories: persistent objects, or the root objects we want to persist in a table or collection; components, or elements embedded in our persistent obejcts, and key values, or values used to look up our persistent objects. In the companion objects to our persistent classes, we also describe the details of the keys that we can use to look them up by key value. We construct all these types in the same package or subpackages, and annotate them with corresponding longevity annotations. Before type classes, this might have looked something like this:
package myModel

import longevity.model.annotations._

@persistent
case class User(username: Username, email: Email, fullname: Fullname)

object User {
  val keySet = Set(primaryKey(props.username), key(props.email))
}

@keyVal[User]
case class Username(username: String)

@keyVal[User]
case class Email(email: String)

@component
case class Fullname(last: String, first: String)
That probably looks quite intuitive to you, aside from the contents of the User companion object. The @persistent annotation extended the User companion object with longevity.model.PType[User], which contains various information about our persistent type that we need in order to do persistence, including creating the database schema, looking up users by key values, and looking up users by query. The PType has an abstract val keySet: Set[Key[User, _]] that we need to fill in. It also provides methods such as key and primaryKey to build the keys. Finally, the @persistent object creates a object propsinside object User, that contains properties, which we can use to reflectively describe the fields of a user. As you can see above, we made use of these properties to define our two keys.

Domain Model and Longevity Context

Once we get this far, we need to collect all our elements into what used to be called a longevity.model.DomainModel. We would do this like so:
import longevity.model.annotations.domainModel

package object myModel {
  @domainModel object MyDomainModel
}
The @domainModel annotation would scan the myModel package, and all its subpackages, to gather up artifacts describing all the persistents, components, and key values in our model. (What it actually scans for are companion objects that were made into PTypesCTypes, and KVTypes by the @persistent@component, and @keyVal annotation macros.) It would make MyDomainModel extend longevity.model.DomainModel, and pass in all the artifacts to the DomainModel constructor. We then passed on our domain model to a longevity.context.LongevityContext:
import longevity.context.LongevityContext

val myContext = LongevityContext(MyDomainModel)
The longevity context builds a handful of tools that you can use to work with your model. The most important of these is the Repo, which provides a complete set of persistence operations:
val repo: longevity.persistence.Repo = myContext.repo

The API Ugly


Now, I'd like to focus in on two repository methods that used to be annoyingly un-typesafe. Most of the other Repo methods displayed one of these two problems in an analogous way. I spent a lot of thought, time, and effort trying to improve this situation, but none of the standard OO techniques I am familiar with were any help. First, here's the method for inserting a new persistent object in the database:
Repo.create[P](p: P)(implicit c: ExecutionContext): Future[PState[P]]
One thing that glares at me right now is the hardcoding of Scala futures for an effect. My next major push will be to tackle this, allowing for other effects. Right now, I want to focus on the fact that P is completely untyped. Due to one of those anachronisms I mentioned before, it never looked quite so bad as it does above, because there was an extra level of indirection in the API that I have also removed. But even if it was hidden a bit, it was just as bad. This is really ugly! If you pass in any object that is not known by your domain model to be a persistent object, then you will get no compiler error, and some kind of runtime exception here.
It used to be that I had the @persistent annotation cause the User class to extend a no-longer existent longevity.model.Persistent trait. But I decided it was ugly and untenable to force user classes to extend a library trait, even when the trait is completely empty. And while forcing P <: Persistent provided some type safety, it did not actually make the method typesafe, because you could still pass in a Persistent that was not known by the domain model!
Let's look at the second repository method that was also causing me grief:
Repo.retrieve[P, V <: KeyVal[P, V]](v: V)(implicit c: ExecutionContext): Future[Option[PState[P]]]
Here, I faced as similar problem, but chose the stay with my original solution: The @keyValannotation would mark the Username class by making it extend longevity.model.KeyVal[User, Username]. So here, I was still forcing user classes to extend a library class. And it still did not actually provide type safety. Again, P might not have been a persistent class that the domain model knew about. And V <: KeyVal[P, V] might not have been associated with an actual key that the domain model knew about either. The reason why I chose to leave KeyVal in when I pulled Persistent, is that I had made the following error myself more than once:
val fopUser = repo.retrieve[User](user) // oops I meant username, not user
So I have to note I am glossing over one point here: The Repo.retrieve API is actually broken into two parts so as to not force longevity users from explicating the key value type when using retrieve. This is why the above line of code only has one type parameter, but there are two in the Repo.retrieve API as I wrote it a bit further up.

A "Phantom" Type for Domain Models

After some more exposure to type classes, and a good deal of thought, I finally came up with a way that I could overcome these problems. But it's going to take a few steps to get there. The first realization is that I am going to need a type to represent my domain model in type parameter positions in a number of signatures, so the old @domainModel object MyDomainModel wasn't going to do it any more. So I changed this object MyDomainModel into a trait MyDomainModel:
package myModel

import longevity.model.annotations.domainModel

@domainModel trait MyDomainModel
This annotation no longer augments the thing it annotates by making it a longevity.model.DomainModel. Instead, it sticks a couple of things into the companion object for MyDomainModel. One of those things is what I used to call the DomainModel - the thing that contains all the information about the classes in the model the longevity user wants to persist. I renamed this from DomainModel to ModelType because, type classes. DomainModel wasn't really appropriate any more, since this is now something that describes the model at the type level, instead of being an object that knew all the details of the domain model.
I make the ModelType take the model as a type parameter, and make it implicit, like so:
object MyDomainModel {
  implicit object modelType extends longevity.model.ModelType[MyDOmainModel](/* ... */)
}
The /* ... */ comment above is filled in with lists of persistent types, component types, and key value types that the @domainModel macro found while scanning the package and sub-packages. The LongevityContext factory method has changed from this (simplified):
object LongevityContext {
  def apply(domainModel: DomainModel): LongevityContext = ???
}
To this:
object LongevityContext {
  def apply[M](implicit modelType: ModelType[M]): LongevityContext[M] = ???
}
And when we call it like so:
val context = LongevityContext[MyDomainModel]()
The implicit ModelType[M] is easily found by the compiler in the MyDomainModel companion object.
One interesting thing here is that MyDomainModel is a sort of "phantom" type, as it is never actually instantiated. But it's quite different from what is normally called a phantom type in Scala, as described in many places, including here. Is MyDomainModel a phantom type? I don't think so. So what do we call it? I've just been calling it a marker type.

Evidence that a the Persistent is Part of the Domain Model

There's a second implicit object put into the companion object by the @domainModelannotation - the model evidence. The companion object actually looks something like this:
object MyDomainModel {
  implicit object modelType extends longevity.model.ModelType[MyDomainModel](/* ... */)
  private[myModel] implicit object modelEv extends longevity.model.ModelEv[MyDomainModel]
}
The "ev" in ModelEv here is short for "evidence". You might wonder, why not use just a single type class here, merging ModelType and ModelEv? There are two reasons for this. The first is initialization order - the ModelEv has to be used to construct the elements that are supplied to the constructor of the ModelType. The second is, it is important that the ModelEv is package private, as we will see. It's also important that the ModelType is not, because it needs to be found to construct the LongevityContext, as we saw above, and the context typically lives in another package.
We discussed earlier how the @persistent annotation doctors up the User companion object as a PType[User], or persistent type. In the new API, the PType takes a second type parameter for the domain model, and the PType constructor requires an implicit ModelEv. Something like this:
abstract class PType[M : ModelEv, P] {
  // ...
}
Because the ModelEv is private to the myModel package, PTypes can only be constructed in the same package. This is exactly where the PTypes are found by the macro that scans packages to find PTypes. So the PTypes that make it into the ModelType are now the exactly of PTypes that can actually be constructed. (With a couple of caveats - see the "Type Holes that Still Exist" section below.)

Persistent Evidence

Next step is to create another kind of evidence - this time for the persistent class. It's called longevity.model.PEv[M, P], and it is found as an implicit val within the PType[M, P]. The PEv constructor is private to package longevity.model, so users are unable to subvert the type system by creating their own persistent evidence. I next modify the signature for Repo.create as follows:
Repo.create[P](p: P)(implicit pEv: PEv[M, P], c: ExecutionContext): Future[PState[P]]
This causes a compiler error if the evidence is not found. If your persistent class is actually part of the model, then its companion object is a PType[M, P], and because of this, it has an implicit PEv[M, P] inside of it. And the Scala compiler will find it. We've managed to make the Repo.create method typesafe using type classes, where standard OO approaches failed us.

Evidence for a Key Value

Let's take a look at the old signature for Repo.retrieve again:
Repo.retrieve[P, V <: KeyVal[P, V]](v: V)(implicit c: ExecutionContext): Future[Option[PState[P]]]
Of course, the KeyVal[P, V] has to go. We do not want to force longevity users to extend their classes with library traits, even if they are just marker traits. So what do we need to make this typesafe? For one, we need to know that P is part of the domain model, as before. But we also need to know that V is a key value for an actual key. The Key itself, that lives within the PType, is perfect evidence for this. But there's a problem: We used to create keys as anonymous members of a set, like so:
object User {
  val keySet = Set(primaryKey(props.username), key(props.email))
}
These keys will never be found by the compiler via implicit resolution. So I changed the API so that the user has to write the following instead:
object User {
  implicit val usernameKey = primaryKey(props.username)
  implicit val emailKey = key(props.email)
}
longevity still needs the PType.keySet to do things like building out the database schema required to support the keys. But the keySet is now private, and it is constructed by reflecting over the PType, scanning for keys. (Unlike with the @domainModel scanning, which uses compile-time reflection, I'm using runtime reflection here for now. I'll probably change that to compile-time reflection soon.)
Now, keys can only be created using protected methods in the abstract class PType. So the keys that we can create for a persistent are exactly the keys that are found by the ModelType. (Again, with caveats. See the "Type Holes" section below.) So we can use the Keys themselves as evidence for the key value type V. I rewrote the Repo.retrievemethod to look like this:
Repo.retrieve[P, V](v: V)(
  implicit pEv: PEv[M, P],
  kvEv: Key[M, P, V],
  c: ExecutionContext): Future[Option[PState[P]]]
As before, the implicit PEv assures that the ModelType actually knows P as a persistent. The implicit Key assures that the ModelType actually knows about a Key of type V. Once again, we've solved a type safety problem with type classes that we were unable to solve with OO techniques.
(I'll mention again that I am glossing over a detail of the Repo.retrieve API that prevents longevity users from having to explicitly specify a value for the V type parameter.)

Type Holes that Still Exist

This type system is not perfect. I have come up with four ways so far that the type system could be subverted by the user, bringing on the runtime exception. Let's take a look.

Forging Model Evidence

Suppose a user creates a persistent class, but they put it outside the domain model package. Code like this:
@persistent[MyDomainModel]
case class Foo(bar: String, baz: String)
Will create or augment the Foo companion object to look like this:
object Foo extends longevity.model.PType[MyDomainModel, Foo] {
  object props {
    // ...
  }
}
You will recall that the constructor for abstract class PType[M, P] takes an implicit argument of type ModelEv[M]. If the Foo case class lives outside the myModel package, the ModelEv inside the MyDomainModel companion object will not be found, resulting in an "implicit not found" compiler error. A clumsy user might decide at this point to create a ModelEv[MyDomainModel] themselves, that is in the appropriate scope. Or they might even do something more crazy, like this:
@domainModel trait MyDomainModel

object MyDomainModel {
  // remember, the following commented out line is put in by the @domainModel annotation:
  // private[myModel] implicit object modelEv extends ModelEv[MyDomainModel]
  implicit val publicizedEvidence = modelEv
}
I'm not sure I can do anything to thwart these kinds of "workarounds". No matter what I do, there has to be some way for the user to create ModelEvs manually. (Remember that Scala macros always have to expand into compiling code, and so can always be rewritten by hand.) This will necessarily lead to opportunities for subversion of the approach to type safety described here.
I've spelt out the situation in the manual and the Scaladocs for ModelEv, and I'm hoping that's good enough.

The PType that Couldn't Be Found

It's also possible to create a PType that lives in the right package, but still cannot be found by the package scanning macro. This way, the user would have evidence for a persistent class that the ModelType does not know about. Here's one way:
import longevity.model.annotations.persistent

package object myModel {
  trait Unscanned {
    @persistent[MyDomainModel] case class Foo(bar: String, baz: String)
  }
}
For better or worse, even if my package scanning found the PType[MyDomainModel, Foo]generated here, it's going to create other problems down the line. The reflective techniques used by longevity to determine the structure of a Foo depend on the case class not being an inner class. I do plan on replacing those reflective techniques with a shapeless approach, and I'm not sure now if shapeless is going to be able to handle something like this either.
Even if migrating to shapeless makes reflecting over the shape of a Foo like this possible, it's still not going to help. Consider the Repo.retrieve method we have been discussing. It needs to create a P from a V. Where is it going to find an Unscanned to build the Foofrom? Shall we require that the key value class V is declared inside the same Unscannedtrait, so that I can build the Foo from the Unscanned that the key value comes from? This is getting needlessly complicated. I'm not dead set against at making this kind of thing work, but at the moment, it seems far beyond the call of duty for a persistence API.
Quick note to say that the manual is pretty explicit about the fact that this kind of thing won't work.

Exposing PType.key

We depend on the keys for a persistent being found by reflective methods within the PTypefor that persistent. This is mostly guaranteed by the fact that the only way you can create keys is via protected methods in the abstract class PType. But a user could easily work around this, for instance:
@persistent[MyDomainModel] case class Foo(bar: String, baz: String)

object Foo {
  def exposedKey[V : KVEv[MyDomainModel, Foo, ?]](
    keyValProp: Prop[Foo, V]): Key[MyDomainModel, Foo, V] = key(keyValProp)
}
The details of the signature for Foo.exposedKey do not matter - only the fact that it exactly replicates the signature for key. And exposedKey can be called from absolutely anywhere, creating keys that the ModelType will never be aware of. These keys could then be used as evidence in the call to Repo.retrieve, causing havoc. Maybe a database error percolating up about some column that doesn't exist.
I consider an approach like this employed by a user to be outright malicious, and I don't feel obligated to do anything about it.

The Key that Couldn't Be Found

Here's another way to produce a key that the ModelType doesn't know about:
object Foo {
  class Unscanned {
    val barKey = key(props.bar)
  }
  val unscanned = new Unscanned
}
I can almost imagine a user doing something like this without knowing that they were actually subverting the type system. Maybe if I make my reflective scanning more robust, I could catch cases like this?

Closing Thoughts on Type Holes

In general, the holes in the type system that I've managed to come up with all seem pretty extreme, edgy cases, and I'm not losing a lot of sleep over them. The thing that bothers me the most here is that there are so many of them. I can sort of ease my mind by imagining the things I might pull off if I, for instance, started writing code in my own project in package scala.collection. There are over a hundred private[collection] thingies I might start to mess around with.
All the same, I still wonder if there is a better way to approach all of this? Is there some way I could make all this not just typesafe for the responsible user, but typesafe iron clad?

What Makes this Usage of Type Classes Unique

In most uses of type classes that I have seen, the type class provides extra functionality for dealing with your types. Like the way Seq.sortBy and Seq.sorted take implicit scala.math.Ordering arguments. We're doing something quite different here. The type classes ModelEvPEv, and Key are all being used as evidence that model elements are built out in the correct locations, so that they can be found by reflective scanning techniques.
I suppose reflective scanning techniques themselves are a little oddball in Scala, but I can't think of any other sensible way to gather up the elements of your domain model. I've often considered that putting them into some kind of cake-like structure might help, but I am assuming that users do not want to have to declare the types they want to persist in some kind of cake. And while I've never dwelled on it for too long, I've also never figured out how a cake would prevent the need to go around collecting the persistent element types. A while back, longevity users were actually forced to list out all of their domain model elements when constructing the DomainModel (renamed to ModelType, as described here). But this was not satisfactory. It's an error-prone case of asking users to repeat themselves.
Can you think of a way I might pull this off without resorting to reflective scanning, and without forcing the user to manually list out their model types in some kind of collection?

4 comments:

  1. Just a quick note to say that the code for this work is in longevity master branch here: https://github.com/longevityframework/longevity

    It will appear in the longevity 0.23 release, which should be out by Monday at the latest. This means the user manual on the website does not reflect these changes yet:

    http://longevityframework.org/manual/

    ReplyDelete
    Replies
    1. Release is out: http://scabl.blogspot.com/2017/06/longevity-release-023-use-type-classes.html

      Delete