Weblog entry #154 for Steve

Managing database updates
Posted by Steve on Fri 2 Feb 2007 at 03:17
Tags:

One thing I'm curious about is how people doing incremental development on "live" sites handle changes in their database schemas.

The code behind this site sees semi-regular updates to the database tables which are used, slowly working towards a better structure than the historically bad one.

This means that code is updated, tested locally, then pushed out to the live site.

There is no facility for showing differences in the tables, etc. Just a static dump which is updated and stored in CVS - which is useful for a fresh installation, but not for people upgrading from older copies of the code.

Generally my development looks like this:

  • Plan the development of a new feature.
  • Make the required database changes, using "alter table", et al.
  • Write a test case or two.
  • Write the code.
  • Commit the code + test case(s).
  • Test.
  • Update the live site from the CVS repository, manually running the upgrade SQL after running "cvs upgrade", flushing the memcache, and reloading Apache if there have been any new rewrite rules installed.

This works for me because I control both the test environment, the CVS repository, and the live site.

But for anybody else running the code they'll going to have to reverse engineer any upgrade statements from the static table dumps. Which may be non-trivial.

For example the upgrade I'm going to run tomorrow will look like this:

# give each user an ID.
ALTER TABLE users DROP PRIMARY KEY;
ALTER TABLE users ADD id int(11) NOT NULL AUTO_INCREMENT,  
  ADD PRIMARY KEY (id);

# Add the key on username, just because it is still heavily used.
ALTER TABLE users ADD KEY (`username`);

# create a generic preferences table.  Inefficient, but extensible.
CREATE TABLE `preferences` 
(
`id`        int(11) NOT NULL auto_increment,
 user_id    int(11),
 pref_name  varchar( 25 )  default '',
 pref_value varchar( 125 ) default '',
 PRIMARY KEY  (`id`),
 KEY( `user_id` )  ,
 KEY( `pref_name` )  
);

# migrate users stylesheet preferences.
insert into preferences (user_id,pref_name,pref_value)
   (SELECT id,"stylesheet", stylesheet from users WHERE stylesheet != '');
alter table users drop stylesheet;

What does that do? It:

  • Adds a new "ID" column to the users table.
  • Creates a "preferences" table which will be keyed on the new user id column.
  • Inserts the current stylesheet setting for each user who is using a custom CSS file
  • Removes the "stylesheet" column from the users table - where it should never have lived.

Why do I need the preferences table? So that I can store the users chosen text formatting option. Something I've been working on for a day or two. It is also the best place to store peoples advertising preferences, display options, etc.

I could imagine storing a database schema-version number somewhere and a collection of upgrade scripts - like Ruby on Rails does, but I wonder if there are other approaches I should investigate?

Any tips or pointers welcome.

 

Comments on this Entry

Posted by dkg (216.254.xx.xx) on Fri 2 Feb 2007 at 05:28
[ Send Message | View dkg's Scratchpad | View Weblogs ]
For one project i've worked on, we decided that the file containing the database schema could never be edited, just appended to.

this makes the schema difficult to read for folks who expect the current schema to be visible, but it actually makes the upgrades really easy: just feed the diff between revisions (without the diff markup) directly into the database engine on the live site.

If folks want to see a clean view of the current schema, that's something you can generate: just create an empty database, feed it the schema file, dump the schema, and drop the database. This strategy treats the schema history like any other source file under revision control: if you want a processed/compiled version you need to apply some post-processing to get it. You could even include a makefile to auto-generate it (assuming you have database creation privileges) before packing the code for publication.

[ Parent | Reply to this comment ]

Posted by Steve (80.68.xx.xx) on Fri 2 Feb 2007 at 10:43
[ Send Message | View Steve's Scratchpad | View Weblogs ]

That is a pretty neat solution, which I guess does the job!

That would suggest I have two files, the "initial" file which creates the tables, and then the "upgrade" file showing the changes which had been applied since that starting point.

(The upgrades should be careful not to disturb any existing data).

The only downside I see to this approach is that you really have to be careful that you don't run the upgrade.sql script when it no longer applies - hence your suggestion to use the diff.

eg:

Initial State
CREATE TABLE `users` (
  `id` INT(11) NOT NULL AUTO_INCREMENAT,
  `username` VARCHAR(25) NOT NULL DEFAULT  '',
);
Upgrade File
UPDATE USERS SET username = 'skx' + username WHERE id=1;

Running that more than once would be a bad thing.

I think having versioned schemas would solve that problem. Keep the pristine starting point, and then create a new upgrade file for each new revision - these could then be used to upgrade from X to schema X+N, and since it would be scripted it would be possible to ensure things weren't run more than once.

Steve

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Fri 2 Feb 2007 at 15:03
[ Send Message | View dkg's Scratchpad | View Weblogs ]
That would suggest I have two files,

As long as you only apply diffs, you can treat the whole thing as a single file, i think. The updates just append to the existing original schema.

Seen another way, the "original schema" is just an "update" to a blank database.

[ Parent | Reply to this comment ]

Posted by Steve (80.68.xx.xx) on Fri 2 Feb 2007 at 15:22
[ Send Message | View Steve's Scratchpad | View Weblogs ]

True .. I just dislike the idea of having to use diff - especially when considering upgrading multiple times.

Steve

[ Parent | Reply to this comment ]

Posted by mvanbaak (80.126.xx.xx) on Sat 3 Feb 2007 at 09:39
[ Send Message ]
We have 1 file with the original schema.
After that everytime we have to patch the database we create a file called: YYYYmmddNN.sql (2007020300.sql would be the first database patch today)
In a small settings table for our applications we also have a field called 'latest_patch'
If there are files newer than this field content we apply all the new files to the database first time a user accesses the page/app.
We have a script that first 'svn up' in the sql patches dir so we are sure the patches are applied before we 'svn up' the code so we know for sure we wont get sql errors because of missing fields.

Hope this helps.

[ Parent | Reply to this comment ]

Posted by Steve (62.30.xx.xx) on Sun 4 Feb 2007 at 22:08
[ Send Message | View Steve's Scratchpad | View Weblogs ]

That seems similar to the solution proposed by dkg - albeit using dates instead of just versions!

I've implemented this solution now, with a script db-migrate and a collection of migration files. So I guess we'll leave the table dump alone now, and see how things work out.

Steve

[ Parent | Reply to this comment ]

Posted by mvanbaak (80.126.xx.xx) on Sun 4 Feb 2007 at 22:27
[ Send Message ]
correct.
I simply like to have the information about when a database was last updated.

Thanks for the update about the db-migrate function. Keep up the good work!

[ Parent | Reply to this comment ]

Posted by redbeard (216.49.xx.xx) on Thu 19 Apr 2007 at 14:56
[ Send Message | View redbeard's Scratchpad | View Weblogs ]

Sorry I didn't check this out sooner... I'm way behind on my feeds :)

I don't now what your backend is written, but there is a PEAR package for PHP called MDB2_Schema that is designed to handle just this. I haven't used it in production yet (no time... we're doing something similar to what you're currently doing at the moment). I have played with it some, though.

Basically, you do the following.

  1. Take snapshots of the schema for every release
  2. Use MDB2_Schema to apply changes between releases

MDB2_Schema will actually grab the current schema from the database to update and figure out what needs to be done to bring it to the new schema.

Again, I don't know how well it works in practice, but the theory looks good.

Michael

[ Parent | Reply to this comment ]

Posted by Steve (80.68.xx.xx) on Thu 19 Apr 2007 at 14:59
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Thanks for the comment!

The Yawns backend is all perl-based, and I've been very happy with the way that the incremental updates have worked out.

Steve

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Quick Site Search