-
Get a monthly update on best practices for delivering successful software.
Previously on Agile Ajax: I wrote about a complicated problem that went beyond ActiveRecord. I resolved to add a database column with pre-generated results to make the search logic easier. Commenters questioned my reluctance to use SQL directly.
I wanted to come back to this topic and talk about the process of creating and maintaining the pre-generated data. To reiterate the problem, I needed a report that would display:
The solution being pursued is to add a "most_recent_relay" field to the database and use that in a two-phase database search to display the report.
Turned out it was very easy to adapt the report code to use the new field -- just a question of adding a criteria for swimmer_id being in the already found list and the most_recent_relay field being true.
It was a bit more interesting to manage the new field. ActiveRecord callbacks are an obvious choice for the code, since they are automatically triggered when a new record is created or an existing one is saved. The basic idea is to check whether the record being saved is the most recent relay, and if so, mark it and then make sure that all other relays for the same user are correctly unmarked.
There are just a couple of problems that I had to work around.
The first one is that saving any other object in the middle of one object's callback triggers a new set of callbacks for the other object. Since those callbacks will pass through the same method, infinite loops are pretty easy to stumble into.
Secondly, changing an objects values in it's after_save callback is problematic -- it seems to change the database, but not the in-memory object, and does not return the object you expect.
Finally, I also wanted the code to be stable in the face of previous errors -- in other words, the current state of the most_recent_relay flag was not considered when determining the new most recent relay. This is to make migration and data recovery easier.
I built this test-first. My test cases, sketched out, and roughly in the order I wrote them were:
I should say that by the time I got to the fourth and fifth tests, i expected them to pass, but was guarding against regressions in refactoring.
The code itself was a bit tricker, as I stumbled through attempts to do the whole thing in the before filter, resulting in a few infinite loops, then tried to do the whole thing in the after filter, causing different problems.
Eventually I settled on a two phase process -- I should say, I'm not prepared to defend this code as the final word on the topic. The before filter determines if the current record is the most recent relay.
before_save :update_if_most_recent_relay
def update_if_most_recent_relay
return true unless relay?
current_mrr = user.most_recent_relay(event)
self.most_recent_relay = (current_mrr.nil?) ||
(current_mrr == self) ||
(race_date > current_mrr.race_date)
end
end
The callback asks the user for their most recent relay for the given event -- very roughly, here's that code, using some named scopes that you can probably figure out.
def most_recent_relay(event) records.relays.for_event(event).by_race_date.limited_to(1) end
At that point Rails saves the record, but now there could be other relay records for that user in an invalid state -- I covered that in an after save callback:
after_save :clean_up_relays
def clean_up_relays
return true unless relay?
records = user.records.relays.by_event(event).by_race_date
if !records.first.most_recent_relay
records.first.update_attributes(:most_recent_relay => true)
end
records[1..-1].each do |record|
if record.most_recent_relay
record.update_attributes(:most_recent_relay => false)
end
end
true
end
What this does is grab all relays for the user and the event, sorted by date. If the most recent one is incorrectly tagged false, switch it. If any other record is incorrectly tagged true, switch it. Those records go through the same filter, but would not trigger cascades, because correctly tagged records don't get resaved. This method is stable even if the most_recent flag gets messed up somewhere (or if a date gets edited).
Performance is much better than the per-user method in the last post, though since pagination is still happening after the database, large calls are still a problem that we're working on. Write performance is obviously a hair slower, but not something that a user would notice.
The best part about this filter setup is that the rake task to initially populate the most recent field is super simple:
task :reset_relays => :environment do
Record.relays.each do |record|
record.save
end
end
Touching the record triggers the filters, and the filters do the right thing.
I'm now wondering if, with the new database field in hand, it makes sense to try for a single SQL method -- it's still probably requiring a subquery, but it's probably easier with the new field.
Related posts:
Topics: Ruby on Rails