-
Get a monthly update on best practices for delivering successful software.
In my discussions with James Taylor about using wizards to write rules, I was reminded of some cases where it was necessary to create a wizard for this purpose. They all shared two things in common: the need for justifications and non-shortcircuited OR's. Let me give you an example.
Let's suppose you are designing a system that decides whether or not to recommend prisoners for early release. Your facts inlcude the prisoners' criminal records, the details of their behavior behind bars, recommendations by prison officials and the district attourney, enrollment in treatment and rehabilitation programs, and mitigating circumstances such as aged parent or infirm spouses or children. You formulate the state's sentencing and release guidelines as business rules and implement them in your vendor platform of choice. When a feed of data comes from the department of corrections, you add it to your database and run it through your rule system which issues recommendations to a board charged with making the final decision. This board makes a judgement call based on the information you provide them.
When they look at your recommendation, they don't just want to see a yes or no, they want to see an explanation or justification. Something like "Prisoner Joe Smith, #54-56, is not recommended for release because he had at least 3 incidents of misbehavior in the last 6 months and was convicted of at least a Class C felony." Ideally, you'd want to spell out exactly what the issues were:
Prisoner: Joe Smith
Number: 54-56
Recommended for Early Release: No
Reasons: At least 3 incidents of misbehavior in the last 6 months (stabbing of guard, 4/5/2006; assault of fellow prisoner, 4/22/2006; contraband in cell, 5/17/2006) AND convicted of at least a Class C felony (Criminal Possession of Stolen Property, 6/11/2004)
Now as a board member, I can look at the prisoner and see that while he was convicted of a nonviolent crime, he looks to have been violent while in prison. If he had just been flagged for contraband, maybe I might consider releasing him.
OK, so justification is useful in this context, as it is in many other fields, like healthcare, insurance underwriting and claims, criminal science, etc. How do we produce a justification with our rule engine? The sad truth is that none of the commercial vendors of forward-chaining BRE's will do that for you automatically. You have to roll your own. One approach is to use the Fact Harvest pattern to construct and build a ReleaseRecommendation object. This fact would tell us whether a particular prisoner was recommended for release (true or false) and all of the conditions that contributed to this recommendation. It might contain a list of all of the misbehavior's over the last 6 months and the crime for which he was convicted.
Here we come to the first reason for using a wizard to write rules: if you are testing a particular condition in the 'IF' part of your rule and then updating your ReleaseRecommendation object with the details of that same condition in the 'THEN' part of the rule, you're essentially recapitulating the condition in the action part of the rule. Aside from being a pain in the neck, it's also a maintenance problem. Every time you update a condition, you have to make sure to update the corresponding action. If you don't, Charles Manson gets released by accident -- or, rather, because of your bug. It voilates the DRY principle (Don't Repeat Yourself). One way of handling this is to have your wizard generate the rule for you, with both the condition and action parts. Now you make one change and the rule is generated in a way that is guaranteed to maintain consistency.
The second reason for using a wizard has to do with the short-circuited OR of the title. If our rule above actually tests for
has had at least 3 incidents of misbehavior in the last 6 months OR is a drug addict
then what happens with our rule? If you've programmed much, you know that most languages use what is known as "short-circuited OR," i.e. if you are testing A OR B and A evaluates to true, then B is never tested. If you are depending on side effect of B (such as the building of a list of incidents, etc.), you can forget about it ever happening. The 'is a drug addict' part of the condition is never tested. To make things worse, if we are constructing a justification, we have to test in the action which parts of the OR clause in the condition actually evaluated to true. Imagine now a complex condition with lots of OR, NOT's, AND's and parentheses. Yuck.
A way around this mess is to break apart the OR clause into seperate rules and accumulating them in the ReleaseRecommendation object. The drawback, of course, is that now we have to create several rules where before we had only one. Again, generating these rules with a wizard is a more or less elegant way out.
If anyone has a niftier solution to justification or non-short-circuited OR, I'm more than happy to hear about it.
Update 1: Michael Chermside correctly points out in the comments that it would be more elegant to have a justification engine that worked directly with the BRE's internals. I agree. Unfortunately, none of the forward-chaining BRE's I've worked with have very good support for justification or supply hooks into their internals to support the building of a justification engine through AOP or something equivalent. If you work with an open source engine like JBoss Rules, you can build in your own justification logic.
Related posts:
Topics: Best Practices, Business Rules Engines
A more elegent solution would be for the engine that builds your justification to be driven by the actual data your BRE is working from. These rules contain “or” clauses, and it’s perfectly appropriate for the BRE to follow only one branch of the “or” (to short-circuit). But it is NOT appropriate for the justification engine to do so, and it need not.
Comment by Michael Chermside, Thursday, July 27, 2006 @ 8:40 am
Engines like Jess, CLIPS, even JRules and Drools all have their OR conditional element as non shortcircuiting (both sides of the or are actually effectively split into 2 rules – so both consequences could fire). I always thought it was a side effect, but you have provided the best non-short circuiting explanation I have yet seen, thanks !
Comment by Michael Neale, Friday, July 28, 2006 @ 9:15 am
Just to add a bit more information to the discussion. I’ve seen procedural rule engines which have the OR short circuit issue. To the best of my knowledge, CLIPS generates 2 rules for an OR disjunction. The same for JESS and Drools3. On the benefits of using rule generation, I like using rule generation because it makes testing and validation easier. If users are constrained to specific set of patterns, the ruleset validation can be more focused, which should make it easier. Within those patterns, an user can write all the rules they need, not all the rules they “want”. There are many rules an user might “want” to write, but many of them may be garbage or poorly organized.
In my bias thinking, part of the job of the IDE is to help users write good rules by providing good guidelines. If a rule IDE uses a DSL based on the terms of that given domain, an user might not even realize all the rules they need to write are covered by a dozen templates.
peter
Comment by Peter Lin, Friday, July 28, 2006 @ 9:40 am
JRules? I remember running into the short-circuited OR problem with JRules not too long ago and again in their .NET version. I can go back and confirm on the .NET version. Can someone test on a recent version of JRules?
Comment by Dietrich Kappe, Friday, July 28, 2006 @ 4:11 pm
I think there are 2 types of or:
Foo(a equals b) or Foo(a equals c)
and
Foo(a equals b | c)
(excuse my pidgeon Jrules).
In the former, 2 rules are “generated”, in the latter, it may be a short circuit.
But I don’t know from recent experience (no longer use Jrules), perhaps they have optimised it to make OR conditional elements short circuiting (I was thinking of it at one point, as it does confuse people).
Comment by Michael, Wednesday, August 2, 2006 @ 5:24 am
Jess, CLIPS et. al. avoid short-circuiting, and split rules internally into separate rules. There is a problem, though. This approach will often result in duplicate productions being placed on the agenda. This is because a particular set of tuples may match on both sides of the OR – i.e., the same set matches the two internal rules created from that OR. This logically compromises the basic refraction strategy common to almost all production engines, and is therefore generally unwelcome. A few engines have built-in de-duplication features within their agendas to get around this problem. This is a fairly elegant solution. It avoids short-circuiting, maintains all side-effects, but avoids breaking the engine’s refraction strategy.
De-duplication on the agenda appears to be fairly rare. One reason is that engines that consume LISP-like ordered and unordered tuples generally have no very convincing way of determining duplicates. Typically, there is no strong identity at the tuple level. One engine I work with which does support de-dupe is only able to do so because it consumes facts as objects rather than relational tuples. Event when used directly over relational data sets or a database connection, it represents individual data rows using objects. It (in part) uses OO identity to identify duplicates.
Comment by Charles Young, Tuesday, August 8, 2006 @ 9:47 am