Schema API
From StatusNet
Contents |
[edit] The old
Current system introduced in 0.9: Plugin schema changes
[edit] Problems
- have to duplicate specifications between several ways:
- ColumnDef arrays for creating/update schema
- column type bitfields for DB_DataObject
- sequence key for DB_DataObject
- keys list for DB_DataObject
- keys list for Memcache_DataObject
- for core tables, also .sql files for MySQL and Postgres
- no way to specify compound indexes other than primary keys
[edit] Drupal-style
http://api.drupal.org/api/group/schemaapi/7
Good:
- explicit syntax for multi-column keys
- some canonical types to map from
- explicit syntax for specifying foreign keys
- gives enough info to generate tables and all the DB_DataObject metadata
- proven support for mysql, postgres, sqlite, and more.
Need to add:
- enum syntax
- possibly fiddle with timestamp vs datetime issues...
- auto-updating timestamps are used a lot
- couple of freak things where funny-size / ascii stuff is specified?
- foreign keys
- booleans
[edit] Example
ostatus_profile:
static function schemaDef()
{
return array(
'fields' => array(
'uri' => array('type' => 'varchar', 'length' => 255, 'not null' => true),
'profile_id' => array('type' => 'integer'),
'group_id' => array('type' => 'integer'),
'feeduri' => array('type' => 'varchar', 'length' => 255),
'salmonuri' => array('type' => 'varchar', 'length' => 255),
'avatar' => array('type' => 'text'),
'created' => array('type' => 'datetime', 'not null' => true),
'modified' => array('type' => 'datetime', 'not null' => true),
),
'primary key' => array('uri'),
'unique keys' => array(
'ostatus_profile_profile_id_idx' => array('profile_id'),
'ostatus_profile_group_id_idx' => array('group_id'),
'ostatus_profile_feeduri_idx' => array('feeduri'),
),
'foreign keys' => array(
// Ok think these examples are correct now:
'ostatus_profile_profile_id_fkey' => array('profile', array('profile_id' => 'id')),
'ostatus_profile_group_id_fkey' => array('user_group', array('group_id' => 'id')),
),
);
}
[edit] Work in progress
What's implemented:
- preliminary table defs for core
- dumpschema.php test script to check live schemas
- ./scripts/schemacheck.php notice # detect live structure of notice table and dump it
- ./scripts/schemacheck.php --diff notice # detect live structure of notice table and compare it against our core definition
- ./scripts/schemacheck.php --all # detect live structure of all core tables and dump them
- ./scripts/schemacheck.php --all -diff # detect live structure of all core tables and compare them against our core definitions
- schema.php mysqlschema.php pgsqlschema.php
- Still partially in conversion from the old system
- Basic detection of MySQL and PostgreSQL table info
- not all type conversions are correct yet
-
primary, multi-value, and foreign key info not yet retrieved for PG(done 2010-10-11) - field comments are not stored or retrieved on postgresql
- foreign key info not yet retrieved for MySQL (though these are not currently used live, our table defs do list some)
- auto-update timestamps, some other bits still need to be worked out
- Auto-conversion of old-style table defs for plugins
- needs further testing to make sure this actually works
- Table creation and update logic is incomplete, but table updating should theoretically work on PG once done (old code was unable to update individual fields on PG)
- 'Update buddy' script to run the updater code but producing an .sql file instead of live changes; this can be helpful at large-scale situations where some sites may need manual running of some code.
Still to do:
- finish all type conversions/definitions
- allow for 'enum' types
- allow consistent way to specify auto-update timestamps
- ensure primary keys & autoupdates are consistently set
- see if we can get prefix lengths on prefix indices (eg first 255 bytes of a blob field)
- do a supported-feature filter/strip step before schema comparison
- patch up installer/updater to use the schema functions for core tables
Higher-level work:
- hash the schema layouts and keep a list of last-known-state for each table
- then checkschema events can be much cheaper in the common case; instead of examining the live tables we can just pull a list of hashes out of a single table. If any don't match or are missing, we can run the full schema comparisons and udpate those tables.
- make sure checkschema functions currectly in background threads, when switching sites
- we want cheap checkschemas to be able to run on-demand in queue threads to aid in maintenance.
- add hook points to provide for specific data-conversion steps after certain actions?
- eg create this field: then fill it out with this data?