Agile Ajax

Resolved: Should schema.rb be included in your source control?

formidable_opponent.jpg

This issue provoked a comment in two separate recent posts, and hey, when the people talk in vast numbers like... well, two, we respond.

As you know, Bob, there are two ways a standard Rails application tracks your database schema. The first is the collected set of your migrations, which can all be run with rake db:migrate. In theory, the set of migrations run sequentially results in your database schema. Rails also maintains an automatically generated file schema.rb, which is updated when migrations are run. This file is a Ruby script representing the current snapshot of the database and can be loaded into a blank database using the command rake db:schema:load.

The question that arises is which of these two methods -- migrations or schema.rb -- should be considered the ultimate source of truth about the application's expected database schema? The answer has implications for your group practices. If you consider schema.rb to be the final source, then you will want that file in source control, and you are less likely to place effort into ensuring your migrations are always runnable from scratch. If you consider the migrations to be the final source, then you probably don't want schema.rb in source control but you do always want to know that the migrations run.


I've gone back and forth on this myself. Here's what I wrote in my book:

There are a couple of other derived files that are worth addressing -- the database configuration files. Specifically the database setup file database.yml and the working schema file schema.rb. Although schema.rb is a generated file created by Rails during the database migration process, no less a personage than David Heinemeier Hansson himself said that the file should be placed in source control anyway to allow somebody checking out the code for the first time to set the database up in one shot without walking through all the database migrations. Who am I to argue? You can leave schema.rb alone.

Digression: Have I mentioned that the book is out in an edition for the Amazon Kindle? Seems like a small handful of people have purchased that version, which is currently a whopping $2.92 cheaper than the print version on Amazon. That seems... like not much of a discount to me for not having a paper version. Kindle readers -- if you are out there, how does the book look on the device? End of digression

Anyway, a couple of weeks ago in the code audit post I wrote that I would consider it a big problem if the code base I was investigating couldn't run the migrations from scratch. Truth be told, I wasn't even thinking about the schema.rb file, I was thinking that a development team that had let the migrations break was probably sloppy enough to let other things break. The comment on that got me thinking a bit.

As I see it, the arguments go like this:

Migrations should be the source of database truth because:

  • They are a direct record of developer intentions.
  • You can include data in a migration set, to seed the database with semi-static values. Although, having had some issues with this being fragile, I'd consider putting the data someplace else, and having a rake task that updated the schema, then added the data. Ideally, that'd be someplace where tests could get at it.
  • The schema file can get messed up due to developer source conflicts. In Rails 2.1, that's much less likely for migrations. I find this to be an occasional irritant with the schema file in practice.

Schema.rb should be the source of database truth because:

  • They take less time to run then migrations. I think you can make too much of this, actually, since on my projects I don't think I've ever had a migration set that took so long to load or that we ran so often that it was preventing work from getting done.
  • It's a direct snapshot of the needed schema rather than the meandering path that the migrations might take to the final goal. When provisioning a new machine, why should you care what the users table looked like six months ago?

Hrm. I was hoping to have a stronger conclusion at this point, but I'm not finding either set of arguments conclusive. I still think it's a bad sign for the project if the migrations have been allowed to decay, but I can see where it is easier and more straightforward to provision from schema.rb, especially where seeding the database isn't an issue. I still tend to provision new developer setups with migrations, although that's because I think it's valuable as a developer to see the migration trace. Using schema.rb might be a better choice on production, though.

Topics:

Comments: 19 so far

  1. schema.rb is the correct (and official, as told by DHH) answer. Migrations are for, just that, migrating the database to a new schema. If you are starting with a fresh database, there is nothing to migrate.

    If you are worried about migrations breaking as they age, the “decay” is easily solved by re-implementing the needed models in the migration file.

    Comment by John, Friday, July 18, 2008 @ 1:15 pm

  2. I do not believe that schema.rb should be the source. This scheme “may” work during development, but how do you update your production version if schema.rb is your source?

    Worse, if you use the generated schema.rb, it may be incomplete if for example you used execute to run some DB specific code (think trigger, stored procs, etc…;)

    Bernard Gallet

    Comment by Bernard Gallet, Friday, July 18, 2008 @ 1:25 pm

  3. @Bernard: If you need DB-specific code in your schema, then the solution is

    Rails::Initializer.run do |config|
    config.active_record.schema_format = :sql
    end

    which replaces schema.rb with development_structure.sql.

    ///ark

    Comment by Mark Wilden, Friday, July 18, 2008 @ 2:02 pm

  4. @Bernard: schema.rb is used as the source when creating a database, migrations are used when the DB with an earlier schema and containing data already exists. Foreign keys are dealt with the RedHillOnRails Core plugin.

    “[schema.rb] is the authoritative source for your database schema”, “running all the migrations from scratch … is a flawed and unsustainable approach” - schema.rb

    Andrew

    Comment by Andrew, Friday, July 18, 2008 @ 2:09 pm

  5. Seriously, why is this an argument? Do whatever works for you or your team. The customers, folks using your site, will never know the difference.

    Comment by Stephen Waits, Friday, July 18, 2008 @ 2:21 pm

  6. The other developers I work with don’t even want database.yml checked into source control. Oh, and we don’t have migrations either.

    Comment by pete, Friday, July 18, 2008 @ 3:31 pm

  7. I think the schema.db is like a database test.

    The database after the migrations should == schema.db. Any descripencies could be the result of some improper migrations, or changes made to the deployed database not reflected in the development schema.

    That being said, I don’t employ this as a test currently. :-)

    Comment by Allen, Friday, July 18, 2008 @ 5:13 pm

  8. We ran into the situation where a migration used a model class directly (to change the data). Eventually, we deleted that model, which made that migration fail. At that point, schema.rb was the only way to ‘create’ the database.

    Migrations are meant to be run over time, as your code changes. That is, a migration should be run with the same source set that it was intended to be run with. Be very careful running them with newer code (i.e., running old migrations with a fresh up to date copy of source).

    Comment by Darren, Friday, July 18, 2008 @ 5:26 pm

  9. I don’t really think either are definitive, but the migrations are more important than the schema.rb. For us, at least, the data is the most important part. We use the production backups to restore the ‘current’ state of the database, including the base database structure, and migrations to move that ‘current’ state into the state the branch we are working in expects. I would never consider using schema.rb or migrations to recreate a database from scratch, because it’d be missing the most important part — the data.

    Comment by Justin Weiss, Friday, July 18, 2008 @ 6:51 pm

  10. Bernard, to move a production box between versions of your schema, you would run migrations like anywhere else, but as a tool to get a new production box deployed the schema:load task is a very useful tool.

    When it comes to seeding your database, I tend to use take tasks, since they can be more easily be maintained, and don’t depend on a nine month old migration still working with what may be radically different models. I just don’t see the value in making sure tour migrattions still run long after the purpose they were written for has passed.

    Apologies for any typos - it seems my iPod doesn’t want me to be able to see what I’m typing

    Jon

    Comment by Jon Wood, Friday, July 18, 2008 @ 6:59 pm

  11. One argument in favor of migrations over schema.rb, raw SQL statements called with execute don’t seem to maintain their special arguments when they travel from migration form to schema form.

    For example, we have a database that uses memory tables to hold data that only needs to persist for about 10 minutes, but if we were to do a db:schema:load, the tables would be created as standard INNODB tables that live on the file system.

    Comment by Jared, Friday, July 18, 2008 @ 7:21 pm

  12. I’ve written really simple rake task to build schema.rb file from migrations. It creates temporary database, runs all migrations in it and dumps the database schema back into schema.rb

    We needed this because we do a lot of our experimental development in branches and therefore development database can contain things you don’t want to make their way to production right away.

    We have a system where each new client get’s its own database schema and we don’t want the user wait while migrations are building up their schema on registration. This is where schema.rb really pays off.

    Comment by Priit, Saturday, July 19, 2008 @ 5:57 am

  13. One of the things that I found frustrating about rails is trying to get a good grasp of the database schema by figuring out multiple migration files. Which is why I found the plugin automigrations to be so useful.

    I believe the incremental nature of migrations has some advantages but it kinda falls in an area of “should be handled by SCM”. Looking at increments makes it very difficult to grasp the whole picture, and as development grows a lot of defunct migrations get piled up which serve no purpose other than causing errors during migrations.

    It would be interesting to see if one could come up with a solution that provided several of the advantages of migrations (data entry, raw SQL) into a solution that does not pile up incremental files.

    Maybe if migrations could be flushed at certain stable points. And the schema.rb becomes a single migration file at that point.

    Comment by darkstego, Saturday, July 19, 2008 @ 3:40 pm

  14. @Darren: We never use an external model class from inside a migration because of the fact that it might go away. If we need to access a model, we define the model class inside of the migration class where is is used. This allows to always be able to run our migrations from scratch, even if an old model has gone away. I treat schema.rb as a generated file.

    Comment by Stephen Veit, Sunday, July 20, 2008 @ 8:31 am

  15. I’m a fan of auto-migrations. http://errtheblog.com/posts/65-automatically

    As for being able to see the progression of the DB, isn’t that what source control is for?

    Comment by Brennan, Sunday, July 20, 2008 @ 7:32 pm

  16. We do what Justin does - recreate from production DB.
    Migrations aren’t feasible to keep clean (or even restore from if they were) once you have 700 of them.
    schema.rb would then seem appropriate, but
    - it doesn’t include static data (a separate rake task would need to be kept up to date for this)
    - more often than not it results in conflicts. Ideally every developer should have exactly the same DB but in practice this does not work for us. Copying tables for back up purposes, quickly switching on to a branch with migrations, subtlely different encoding settings/MySQL revisions, etc.

    For a long time we tried keeping schema.rb in source control, but we never once restored from it, and it was the #1 source of conflicts when commiting, so now we’ve ignored it.

    Comment by Xavier Shay, Sunday, July 20, 2008 @ 11:45 pm

  17. I consider the db/schema.rb to be the the official database structure, however, I also verify that my migrations can execute properly with an empty database. It is not that difficult to keep the migrations working even when removing models from your code.

    Comment by Matt, Monday, July 21, 2008 @ 10:20 am

  18. For those of you who mention that your migrations will fail after modifying the model/removing models, I have two comments:

    1) In order to maintain that your migrations work with a clean DB, you can setup cruise to always start with a clean DB. I always configure cruise to run “rake db:migrate:reset” before it runs the tests. If the migrations have been broken, cruise breaks and you can fix the migrations.

    2) I do not think that you should be doing any sort of manipulation in the migrations which would break if the model changes. That means that loading fixtures or what-not in the migration should be avoided; the only data manipulation should be when you need to massage data to fit into the new schema (i.e. migrating existing data in your DB to fit the new schema). Make sure that your ‘down’ does this migration in reverse! If you really need to load up fixtures or something outside of pure data migration (loading up default values is NOT data migration!), then create separate rake tasks to do this. You can then customize your build to run these rake tasks as needed.

    I’ve seen far too many migrations break because there wasn’t proper separation of concerns in the migrations.

    Comment by Anthony Caliendo, Monday, July 21, 2008 @ 5:29 pm

  19. [...] Rappin has written a post at Pathfinder about whether or not schema.rb belongs in source control.  I’m going to put my two cents in and say it does.  The debate centers on the role of [...]

    Pingback by Antares Traders Blog » The schema.rb Source Control Debate, Wednesday, July 23, 2008 @ 2:40 pm

Leave a comment

Powered by WP Hashcash

About Pathfinder

  • We design and build extraordinary applications for companies looking to make the next great idea a reality.
  • learn more

Topics

WordPress

Comments about this site: info@pathf.com