Saturday, October 18, 2008

Two-Minutes Scala Rules Engine

Scala is really seductive. To drive my learning of this JVM programming language, I thought I should go beyond the obvious experiments and try to tackle a subject dear to my heart: building a general purpose rules engine.

Of course, the implementation I am going to talk about here is naive and inefficient but the rules engine is just a pretext here: this is about Scala and how it shines. And it is also a subset of what Scala is good at: I mainly scratch the surface by focusing on pattern matching and collections handling.

The fact representation I use is the same than the one used by RuleML: a fact is a named relationship between predicates. For example, in "blue is the color of the sky", color is the relationship between blue and sky. The rules engine matches these facts on predefined patterns and infers new facts out of these matches.

In Scala, I naturally opted for using a case class to define a fact, as the pattern matching feature it offers is all what is needed to build inference rules:
Normally you do not need to implement your own hashcode functions, as Scala's default ones are very smart about your objects' internals, but there is a known issue with case classes that forced me to do so.

For rules, I opted to have them created as objects constrained by a particular trait:
The method of note in this trait is run which accepts a list of n facts and returns zero (None) or one (Some) new fact ; with n equals to the value of the arity function.

So how is the list of facts fed to the rule run method created? This is where the naive and inefficient part comes into play: each rule tells the engine the number of facts it needs to receive as arguments when called. The engine then feeds the rule with all the unique combinations of facts that match its arity.

This is an inefficient but working design. The really amazing part here is how Scala shines again by providing all what is required to build these combinations in a concise manner. This is achieved thanks to the for-comprehension mechanism as shown here:
Notice how this code is generic and knows nothing about the objects it combines. With this combination function in place, the engine itself is a mere three functions:
With this in place, let us look at the facts and rules I created to solve the classic RuleML discount problem (I added an extra customer). Here is the initial facts definition:
And here are the two rules:
With this in place, calling:
produces this new fact base with the correct discount granted to good old Peter Miller but not to the infamously cheap John Doe:
This is truly a two minutes rules engine: don't ship it to production! Thanks to the standard pattern matching and collection handling functions in Scala, this was truly a bliss to develop. On top of that, Scala's advanced type inference mechanism and very smart and explicit compiler take strong typing to another dimension, one that could seduce ducks addicts.

So why does Scala matter? I believe the JVM is still the best execution platform available as of today. It is stable, fast, optimized, cross-platform and manageable. An alternative byte-code compiled language like Scala opens the door to new approaches in software development on the JVM.

Closing note: an interesting next step will consist in parallelizing the engine by using Scala actors. But this is another story...

3 comments:

Arrgh said...

Just in the past few days I've been playing with Alloy as a way of thinking about dependency analysis and I immediately realized that a lot of the syntax it uses could credibly be replaced with valid Scala programs.

David Dossot said...

I did not know Alloy, thanks for the pointer.

Your remark raises an interesting point: is there any value in building a "neutral" representation of rules, like RuleML, if you can write them in a concise manner directly in an executable language like Scala?

Arrgh said...

There's definitely a lot of value in having an abstract syntax that simultaneously satisfies, to the extent feasible, human and machine readers, while retaining as much extensibility as is appropriate for the language's use cases. XML has turned out to be pretty good for the machine readers and extensibility, but not so much for human readers.

Scala case classes are a reasonable encoding for a lot of very simple problem domains, but when you need to start adding in constraints you often end up either cooking up your own half-baked constraint language that's tightly coupled to your domain model, or your domain model starts to accrete so many constraint annotations that the domain model's meaning is obscured.

Here are some things I think you might be interested in:

Microsoft's "M" language

JVSTM, which is primarily a Java STM library, but also has some interesting ideas for writing domain models in an extremely simple syntax, then layering on declarative constraints in a later code generation phase that tries hard not to break code-level extensibility.