Creating a library of CRUD API functions for Drupal Entities [#79684]

I talked to chx and JonBob about the ideas brought up in this thread: http://lists.drupal.org/archives/development/2006-08/msg00719.html

We all agreed that this is a useful direction for Drupal to go in, however it's unrealistic to get it in before the code freeze, and has too much potential to break things everywhere after the code freeze. This issue is therefore marked as "postponed" so that we can look at this in detail in Brussels. In the meantime, the form pull changes can be a reasonable alternative for those hoping to build an install profile during this release.

However, it would be great to get some discussion going about overall strategies for how this should work, so we can crank out code in Brussels.

Note: hereforth "target" refers to a "thing" in Drupal, such as a user role, or a taxonomy term, or whatever. I would use "object" but that term means too many things. :P

Defining the API functions

Stratregy 1: Many little functions

The first way this could be done is by adding a lot of little functions:

create_user_role($properties)
read_user_role($id)
update_user_role($properties)
delete_user_role($id)
// repeat for taxonomy_term and block and...

You would call the functions like the following:

create_user_role(array('name' => 'administrator'));
delete_user_role(4);

Inside, each function would just call its SQL commands directly, like:

function create_user_role($properties) {
  db_query("INSERT INTO {role} (name) VALUES ('%s')", $properties['name']);
}

function delete_user_role($id) {
  db_query("DELETE FROM {role} WHERE rid = %d", $properties['rid']);
  db_query("DELETE FROM {user_roles} WHERE rid = %d", $properties['rid']);
  db_query("DELETE FROM {permission} WHERE rid = %d", $properties['rid']);
}

Strategy 2: One CRUD function per target

This approach makes

Or one function per target, where an $op (create/read/update/delete) is passed in, along with an array containing properties of the target:

user_role_crud($op, $properties)
taxonomy_term_crud($op, $properties)

And you would call them like:

user_role_crud('add', array('name' => 'administrator'));
user_role_crud('delete', array('rid' => 4));

Internally, these drupal_crud functions would be logic to handle what to do in each specific instance:

function user_role_crud($op, $properties) {
  switch ($op) {
    case 'create':
      db_query("INSERT INTO {role} (name) VALUES ('%s')", $properties['name']);
      break;
    case 'read':
      db_query("SELECT rid, name FROM {role} WHERE rid = %d", $properties['rid']);
  }
  ...
}

Strategy 3: One mega CRUD function which acts kind of as a query builder

Something like:

drupal_crud($op, $target, $properties)

Which you would call as:

drupal_crud('create', 'user_role', array('name' => 'administrator'));
drupal_crud('delete', 'user_role', array('rid' => 4));

This one's a bit over my head so I can't really envision how it looks on the inside. I think basically you'd switch on $target, then define the affected tables for that thing and how they relate, then call some generic SELECT/INSERT/UPDATE/DELETE statements. The idea here is there will be a lot of duplicated code for these operations, so abstract it out.

Defining input/output

Of paramount importance no matter which way we do this is all targets need to be able to be acted upon in a consistent way, and return predictable information. So as soon as you know something is a crud function, you know how to get from it the information you need, and what format that information will be stored in (an array, a string, etc.).

Currently, this is what I'm thinking. Very much open to suggestions here.

create

Parameters: An array of target properties.
Return value: The ID of the target that was just added, or FALSE if the query failed.

read

Parameters: An array consisting of an ID, corresponding to the primary key of the target.
Return value: An object containing the properties of the target, or FALSE if the query failed.

update

Parameters: An array of target properties.
Return value: The number of rows affected, or FALSE if the query failed.

delete

Parameters: An array consisting of an ID, corresponding to the primary key of the target.
Return value: The number of rows affected, or FALSE if the query failed.

Potential List of Targets

block
comment
file
locale
menu
node
node revision
role
taxonomy term
taxonomy vocabulary
url alias
user
watchdog event
other?

Remaining questions

Here are some remaining questions, some of which might be difficult to answer without actually trying to code this and see what happens:

Which approach outlined above is best in terms of defining the API? The second seems the most "Drupalish" but the third is kind of interesting if it doesn't incur too high a performance penalty.
Does this cover the gamut in terms of what we'll need for input/output values?
Are these return values sane?
Other targets?
Other?

Any input/insight anyone has would be great. :)

Comments

Comment #1

kkaefer commented 20 August 2006 at 19:02

*subscribe*

Comment #2

jonbob commented 20 August 2006 at 19:27

I like this direction a lot. Anything to promote standardization will lead to fewer trips to the API docs, fewer errors, and faster development. In addition, breaking the API up into smaller chunks (i.e. ensuring that these tasks can be done independent of the interface) allows for better unit tests. If we can get it right we could even have some small corner of the Drupal API that won't change radically from one release to the next. :-)

I generally prefer the listed strategy 2. I'm not sure if it totally holds up in this case, though, and #1 might be better for this reason.

The first issue is that the arguments needed are inconsistent. An API that needs to have a function like

function user_role_crud($op, $properties) {}

is problematic if the $properties required depend too much on the $op. In the strategy as proposed, the $properties are sometimes an array, sometimes a scalar. This makes for strange code; at the very simplest level you have a plural variable name that is used for singular things. Instead we could always force an array which happens to be always one element long for 'read' and 'delete'; this brings back memories of node_load(array('nid' => 52)) which we agreed was needless typing and simplified.

Do we allow for the possibility of the ID itself changing? If so, we'd need an additional parameter for "update" for the original ID so the new one could be in the array. Some systems allow for this; we probably don't have to since it doesn't make sense for most objects.

I strongly prefer user_role_crud() to drupal_crud_user_role() given the choice between the two. Contrib modules will be using this standard if adopted, and they shouldn't be injecting anything into the drupal_ namespace in my opinion.

At the moment, my vote is for strategy 1. Strategy 2 is fine if we resolve/live with the above issues. The third option adds complication with no benefit that I can see at the moment, but maybe I'm missing something. It looks like we'd need the individual functions anyway, as hooks, if we went this route.

Comment #3

webchick

she/they

English

Vancouver 🇨🇦

commented 20 August 2006 at 19:45

One clarification:

In the strategy as proposed, the $properties are sometimes an array, sometimes a scalar.

Sorry, I meant for $properties to _always_ be an array, and instead of passing in a bunch of them like:

$node = array(
  'title' => 'Foo',
  'body' => 'Bar',
  ...
)

You would pass in:

$node = array('nid' => 8);

So yes, this would involve some needless typing, though the plus side would be that at least it would be consistent throughout, as opposed to now when you have no idea what you need to pass in to retrieve details about something without reading API documentation. ;) An is_numeric check could potentially work, but I agree that would get icky with the various ops.

Comment #4

webchick

she/they

English

Vancouver 🇨🇦

commented 20 August 2006 at 19:50

Oops, and I forgot...

Very good point about the naming convention. I've updated the doc to reflect this.

Do we allow for the possibility of the ID itself changing? If so, we'd need an additional parameter for "update" for the original ID so the new one could be in the array. Some systems allow for this; we probably don't have to since it doesn't make sense for most objects.

Hm. I can't think of an instance where this would be used and not error prone. I'm going to say "no" for now, with the possibility that we could add in another API function sometime down the road for changing IDs if we so choose. Unless someone can think up a good use case for it being included with the rest of the CRUD stuff.

Comment #5

quicksketch

he/him

commented 20 August 2006 at 21:05

Does this cover the gamut in terms of what we'll need for input/output values?

I like method 2 the best, though the $properties param is going to get a little crazy at times. On the issue of scalar/array input, it's much less likely that a single scalar value will be sufficient to describe the desired task. Other than $op = 'delete', most queries take more than a single variable to perform their task. Bottom line, we're not going to be 'extra' typing like we were for node_load, since most cases we'll need to pass more than one value.

I don't like that $properties takes on different usage depending on the context of the operation. Like in an insert operation, it contains the values to be inserted. On a delete, it contains items to be matched in the WHERE clause. This seems pretty straight-forward until we get to an update operation. Now we've got values that need to be in both places (values to be updated and values that need to be matched in the WHERE clause).

I'm brainstorming solutions. Suggestions?

Comment #6

webchick

she/they

English

Vancouver 🇨🇦

commented 20 August 2006 at 21:45

Now we've got values that need to be in both places (values to be updated and values that need to be matched in the WHERE clause).

I don't think this should happen. Most UPDATEs that I can think of wll just be:

db_query("UPDATE {table} SET col1 ='%s', col2 = '%s', col3 = %d WHERE primary_key = %d", $propertes['col1'], $properties['col2'], $properties['col3'], $properties['id']);

Are there tables/targets you're thinking of where there is not an unchangable value which can be used to uniquely identify the record in question?

Rather than hard-coding every query, we could even do some kind of fancy "helper" function that would loop through the array properties, query the table definition to find out if each one should be a strings or int, and then write the SQL string dynamically. I swear I've seen code that does this in core someplace but I can't seem to remember where now...

Comment #7

webchick

she/they

English

Vancouver 🇨🇦

commented 20 August 2006 at 21:55

In code...


// Retrieve all details about a role
$role = user_role_read('read', array('rid' => 4));

// Give it a new name
$role->name = 'foobar';

// Update it in the table -- casting to array is kind of nasty :( yet people like the convenience of object syntax...
user_role_crud('update', (array)$role);

...

/**
 * CRUD for user roles.
 */
function user_role_crud($op, $properties) {
  switch ($op) {
    ...
    case 'update':
      db_query("UPDATE {role} SET name = '%s' WHERE rid = %d", $properties['name'], $properties['rid']);
  }
  ...
}

Comment #8

quicksketch

he/him

commented 20 August 2006 at 22:00

I ran through all the core modules and it looks like my concerns where unfounded, with one exception: the blocks module, which uses that crazy 'module'-'delta' scheme.UPDATE {blocks} SET visibility = %d, pages = '%s', custom = %d WHERE module = '%s' AND delta = '%s'Has there been any discussion on giving this table a primary key? Other than this one, we're in good shape.

Comment #9

webchick

she/they

English

Vancouver 🇨🇦

commented 20 August 2006 at 22:03

Block's actually ok because module and delta won't (shouldn't) change -- module specifies which module declared the block, and delta specifies which of the module's declared blocks it is.

Thanks for looking into that, Nate! :D

Comment #10

Crell commented 20 August 2006 at 22:22

Personally I'd prefer the "lots of little functions" approach in general. When doing that sort of API function in my non-Drupal code, I frequently find myself just stubbing it out with a call_user_func() call and passing on the parameters, because I hate working with long switch statements. :-) It would also follow the node system, which currently has a separate hook for every action (load, insert update, delete, view).

As for a helper function to dynamically build the insert/update statement, you may be thinking of this issue:

http://drupal.org/node/53488

Comment #11

webchick

she/they

English

Vancouver 🇨🇦

commented 20 August 2006 at 22:43

No, it was user_save. Although that's pretty freaking cool, too!

I'm cool either way, we just need to make sure that all these functions are a) named the same, b) take the same parameters, and c) return the same values.

Comment #12

jonbob commented 21 August 2006 at 02:29

While it's fine for specific cases to use some kind of auto-SQL magic, we shouldn't try to cover all the cases this way. In particular, many instances will have ramifications that need to be acted on. When you delete a role, for example, you may have to reassign roles to users that had it. There are also possible cases where the data may not be stored in a single table, even if it is a single conceptual "thing". For some gotchas when designing an ORM layer like this, see:

http://blogs.tedneward.com/PermaLink,guid,33e0e84c-1a82-4362-bb15-eb18a1...

I favor an approach where we make the common case easy, but leave the "write to the metal" approach available for cases that need it.

On a related note: do we have the API functions all work on one "thing" at a time? That certainly simplifies the API, and I think I'm in favor of that approach. But we could also extend these easily to work on multiples; read operations could return all targets that match the search criteria. We should make the expected behavior clear in this regard.

Comment #13

nickl commented 21 August 2006 at 05:44

I would also like to see CRUD seperated into its own files creating an abstraction layer which will more easily fascilitate porting to other databases or other data stores.

Comment #14

Chris Johnson commented 22 August 2006 at 03:10

I highly prefer strategy 1. JonBob's criticism of strategy 2 provides much of the reason. I also like functions to do just one thing whenever possible. Easier to remember, easier to optimize, easier to test.

Great topic and effort, Angie!

Comment #15

boris mann commented 22 August 2006 at 07:01

Yep, option 1 looks good. It may seem like many more little functions...but highly likely that each function will remain correct over time. Option 2 is really just option 1 with a wrapper...and then extra logic to figure out what you're trying to do.

Comment #16

adrian commented 22 August 2006 at 21:35

This is actually very similar to some of the stuff I had in mind for FAPI 3 (which is getting ever so badly named).

Basically you'd build the 'model' of say a node, and then (using standard naming, or defined callbacks) just build the CRUD for each of the models.
You can then have a form for any of them, and remix the functionality and use it in workflows etc.

Thought of introducing a 'model_' namespace, so you can define new 'things' either through naming convention, or through a similar system as the _forms hook. (ie: function model_node($node) , function model_node_read($nid), function form_node_update($fields), or in the case of system settings, you would only have the model_modulename_settings(), and it would use the same CRUD for all settings forms.

an additional interesting feature i had in mind was conversions. Say you know you have a node object, and you want to do something to the owner of that node (like demote user), you could have function model_user_load_from_node($node).

An example of that being useful is if you have an image thing, and a file thing, but there's an action which only takes files, you could just create a conversion function and have it work.

Having a full set of crud functions will make it far easier for us to build proper unit tests too =)

Comment #17

webchick

she/they

English

Vancouver 🇨🇦

commented 23 August 2006 at 00:15

Strategy 1 is the most straight-forward to implement and easiest to grasp, and probably also improves performance since you're calling the thing directly and not first going through a switch. So let's talk about that one.

What's the naming conventions for our functions? (Sorry I keep harping on this, but the sooner we figure that out, the sooner we can start implementing. ;))

#1:

user_role_create()
user_role_read()
user_role_update()
user_role_delete()

Obviously, this is the most straight-forward. However, this can also run you into problems with existing functions/hooks, especially with something like node_delete().

However, if "crud" is in the title of the function, it makes it easier to tell when you're looking at a contrib module that "this is a function that will do X crud operation on data," although it looks kind of, well, "cruddy." ;)

user_role_crud_crete()
user_role_crud_read()
user_role_crud_update()
user_role_crud_delete()

or:

curd_role_crete()
crud_user_role_read()
crud_user_role_update()
crud_user_role_delete()

(I think I like that one better, and all "crud" functions would be stored together in a list in an IDE)

Finally, "create", "read", "update", and "delete" are kind of obtuse terms and not used often in the Drupal API. Do we instead want to do something like "add", "load", "save", and "remove" ?

#4:

user_role_add
user_role_load
user_role_save
user_role_remove

These are certainly more "user friendly" words, but they also run into the same problem as the other non-namespaced functions (node_load()) and also aren't immediately obvious that they perform simple CRUD operations and not something else.

I'm not crazy about this idea, but bring it up because even Drupal developers who've been doing this a long time didn't know what the heck "CRUD" was when we were talking about this in IRC.

Comment #18

adrian commented 23 August 2006 at 11:38

how about :

read_user_role();
update_user_role();
create_user_role();
delete_user_role();

Similar to how we put things in the theme namespace now, and it also
reads as cleaner english =)

Comment #19

adrian commented 23 August 2006 at 11:44

Also, for stuff like node loading , where we do the loading from arrays :

read_user_role_from_array() // do the array('nid'=>12, 'user' => 12) etc thing.

Also, i think if we have a consistent CRUD mechanism, we finally have somewhere to put in an 'item' top level type

ie: the crud functions fire to add things into an 'items' table, with an iid | type | oid table.

Comment #20

webchick

she/they

English

Vancouver 🇨🇦

commented 23 August 2006 at 15:52

Duh. create_user_role() and their ilk make too much sense. :P I've updated the original posting with that, thanks. :)

I'm not sure how crazy I am though about read_user_role_from_array() though. I think it's imperative that the input/output of these functions all remain constant, so read_user_role() and read_user() should each have exactly the same signature, and return the same results. Not sure if that's totally possible, but that's what I'd like to shoot for.

I might experiment and go through user.module adding these functions for users, roles, and permissions and see what happens.

Comment #21

webchick

she/they

English

Vancouver 🇨🇦

commented 23 August 2006 at 16:01

One other nice thing...

naming functions as create_*, read_*, and delete_* doesn't run into any name-space conflicts with anything else in Drupal core.

One not-so-nice thing...

update_* certainly does, because of the update system. :( As long as there aren't any contrib modules called info_page or

However, update_* certainly does, because of the update system. :(

update_access_denied_page
update_convert_table_utf8
update_data
update_do_updates
update_do_update_page
update_finished_page	
update_fix_access_table	
update_fix_schema_version
update_fix_sessions
update_fix_system_table	
update_fix_watchdog
update_fix_watchdog_115
update_info_page	
update_progress_page	
update_progress_page_nojs
update_script_selection_form
update_selection_page
update_sql
update_update_page

While most of these are probably not that big of a deal (I can't really think of a contrib module that might want to crud stuff called "fix_watchdog_115") others of them could potentially conflict, such as "info_page." Do we not care about this, and just trust people not to name their targets in such a way that they'll conflict with core functions, or should we change update_* to save_* which is currently an unused namespace?

Comment #22

adrian commented 23 August 2006 at 16:37

If we are considering renaming update to save, could we also consider renaming read to load.

load / save is what we already use in Drupal. (ie: user_load, and user_save).

And what i meant with the from_array() thing, is by have a consistent mechanism for conversions, so we can
find out exactly what an object can be loaded from.

It's all a bit premature, but I'll explain at DrupalCon why I would like to see this.

Also, i see this working in a load('user', $param); format. where the load() code identifies what the input is ,
and then passes it off to the right function.

Thinking of how nodeapi works, if we provide hooks into the load() , read() etc functions, we can have all of nodeapi's
benefits across all drupal objects (nodes, users, terms, whatever).

Comment #23

jonbob commented 23 August 2006 at 18:08

-1 for adding new prefixes to the namespace.

Right now every Drupal function begins with the name of the module. This means no modules can have name conflicts. The only exceptions to this rule are the theme_ prefix, which is special because it is always invoked through the theme() mechanism, and a handful of core functions like t() and format_*. I really dislike further confusion in the namespace, and wonder where we would draw the line. You can do many more things to these "objects" than the four CRUD operations. Say you can "move" a foo. Then you'd have update_foo(), but foo_move()? I think the inconsistency makes the code less legible, not more legible.

So long as we are tied to the single global function namespace due to the language features we have decided to use, I plead that we keep module prefixing intact.

Comment #24

webchick

she/they

English

Vancouver 🇨🇦

commented 23 August 2006 at 18:47

Ok, how about this? :)

user_create_role()
path_read_url_alias()
node_update_revision()
taxonomy_delete_vocabulary()

1. Module namespace in tact.
2. Reads more "English-like." (although that last sentence certainly didn't ;))
3. No chance of collisions with existing function names (afaik).

Comment #25

jonbob commented 23 August 2006 at 19:11

That looks fine to me. The only weirdness will be for Things that are named the same as their module. We would have node_delete_node() and the like. Personally I feel that's a lesser evil.

Comment #26

adrian commented 23 August 2006 at 19:15

As I stated jonbob, I feel that we should have a theme() like function for each of the CRUD's.

ie: load_[module]_[thing].

Additionally, similarly to how theme functions work, the module still features in the function name.

Comment #27

nofue commented 23 August 2006 at 19:17

I'm not actually coding PHP, but reading this very interesting thread I missed some

whatever_query_by_property()

call. Maybe I don't understand the API, but wouldn't it be convenient to query let's say a user id by it's email-address, like in

user_query_by_email("xyz@abc.fg");

Comment #28

webchick

she/they

English

Vancouver 🇨🇦

commented 23 August 2006 at 19:29

@nofue: Interesting! I like that. However, I don't like the idea of weird "one-off" functions for things (querying users by email, querying nodes by title, etc.) because my goal is to make this interface simple and predicatable no matter what "thing" in Drupal you're dealing with. Less trips to the API docs to see what's available.

What we could do, however, is change the parameter of the read function to take an array. We could something like:

function user_read_user($properties) {
  $result = db_result(db_query("SELECT uid FROM {user} WHERE '%s'", drupal_parse_properties($properties)));
  return user_load(array('uid' => $result));
}

function drupal_parse_properties($array) {
  $conditions = array();
  foreach ($array as $key => $value) {
    $conditions[] = "$key = '$value'";
  }
  return implode(' AND ', $conditions);
}

And you'd call it like:

$john = user_read_user(array('email' => 'foo@example.com', 'name' => 'john'));

Maybe? Just throwing ideas out. ;)

Comment #29

kkaefer commented 23 August 2006 at 19:30

I have two suggestions:

Using objects for targets
```
$user = new user();
$user->load(1);
$user->status = 0;
$user->save();
```
For such things, classes do make perfect sense to me. If the object is created without a parameter (like in the example above), an empty user is created (i.e. uid = -1). If you just fill in values and run $user->save(), a new user is created. If you load a user, the properties of the object are set according to the values for the specified user. If you now run $user->save();, the user with the uid in the object is altered. You could also load a user by writing $user = new user(1);.
Overloaded load method
That means, I could run either $user = new user(1) to load the user with uid 1 or $user = new user(array('uid' => 1));. In the constructor/load method, the first type is expanded to an array, like the second type has. But here’s the clue: The array acts as a kind of “query builder”, meaning that you can specify any parameters that should act as a condition for the SQL query: array('status' => 0) retrieves all blocked users.
That implies also that a user object can contain more than one user at a time. If you alter values, all loaded users are affected. To unblock all blocked users, you could for example write:
```
$blocked_users = new user(array('status' => 0));
$blocked_users->status = 0;
$blocked_users->save();
```

Comment #30

adrian commented 23 August 2006 at 20:01

Hate to say it, but tim is correct. This is what objects were made for.

It's the activerecord pattern that ruby is so famous for. We still sit with a namespace issue though, in that
classes would have to be 'module'_'thing'. IE: node_node.

I'm also not sure we want to get on this slippery slope right now , since we would have to take things into consideration such as

$event = new event_node();
$event->field = '123';
event->create();

and also inheritance and the like, which I'm still not too sure i right for Drupal. Especially with our new 'spawn node types' code,
and the fact that what has been coded has very little to do with what is actually in the object (all our layers of AOP)

Comment #31

forngren commented 23 August 2006 at 20:13

+1 for this

(subscribe)

Comment #32

Crell commented 24 August 2006 at 02:18

Drupal core has an informal/semi-formal no-classes policy, IIRC. Changing that would be a major undertaking, especially as it would also mean redoing a ton of other data structures in Drupal. I simply don't see that happening. Plus, PHP 4's object system is quite lame. :-)

I think load_, save_, makes the most sense, and is most consistent with existing code. For namespace pollution, what all do we expect to include here? Users and Nodes already have _save() and _load() routines. (Actually there's a patch in queue that's trying to gut them, which is sadness. :-( ) Roles, OK. Just how many are we going to have that we need to worry about the namespace? Or are we really planning to have every contrib module able to setup their own new entity types? I thought everything is a node. :-)

And -1 to anything that puts "crud" into a function name. That's just ugly to have to write or explain to people.

Comment #33

webchick

she/they

English

Vancouver 🇨🇦

commented 19 January 2007 at 18:59

Version:	x.y.z	» 6.x-dev
Status:	Postponed	» Active

Marking this back to active again.

Comment #34

pwolanin commented 23 January 2007 at 02:17

My e-mail and at least a few others on the devel list related to this topic

http://lists.drupal.org/pipermail/development/2007-January/021964.html
http://lists.drupal.org/pipermail/development/2007-January/021979.html
http://lists.drupal.org/pipermail/development/2007-January/021983.html
http://lists.drupal.org/pipermail/development/2007-January/022008.html

Seem to be fairly convergent

Comment #35

boris mann commented 24 January 2007 at 06:26

Throwing my 2¢ in here. Webchick, I've been diving deep into install profile creation, where I want to create nodes, menus, and other Drupal-isms programmatically. drupal_execute is your friend....but it's also not your friend. Look at creating a menu, and then trying to get its menu ID!

CRUD++ -- and maybe we want to prototype them as wrappers (or SOMETHING) that can be used to help in install profile creation for the 5.x series.

Comment #36

Paul Natsuo Kishimoto commented 24 January 2007 at 19:06

*subscribing*

Comment #37

fgm

French

Paris, France

commented 24 January 2007 at 21:42

I like a variant strategy 3 a lot, especially as it opens the way to the introduction of uniform callbacks: consider this variant:

4 calls, for the 4 CRUD ops, plus formatting
- drupal_create('some_entity', array(<params>))
- drupal_retrieve(('some_entity', array(<params>))
- drupal_update('some_entity', array(<old params>, <new params>))
- drupal_delete(('some_entity', array(<params>))
note that a call to render is not necessary, since we happen to already have a function built on this model:
theme('some_entity', array(<params>)) which happens to be built on the same model
automatic callback available, PRE, and POST event
- module implementing such a CRUD type could register like
  drupal_register_callback('some entity', $when, $event)
  where $when is PRE | POST (or BEFORE | AFTER),
  and $event is (CREATE | RETRIEVE | UPDATE | DELETE).
- each of the 4 CRUD functions runs like:
  1. check whether a registered PRE callback exists for entity
  2. if so, invoke it
  3. do work
  4. check whether a registered POST callback exists for entity
  5. if so, invoke it
the PRE / POST callback mechanism might be extended using a result selection mechanism : typically, I think PRE callbacks could prevent the CRUD op from running by returning a specific value, and some results (failure) in the execution of the CRUD function could prevent the POST callback from running, or the callback should be different, or invoked with a status parameter mentioning failure.

Comment #38

fgm

French

Paris, France

commented 24 January 2007 at 21:57

One thing I forgot to mention is that callbacks can obviously be stacked: several registrations can be chained in a call array, and the CRUD method will fire the hooks in order: first PRE to last PRE, then last POST to first POST.

The reversal for POST callbacks is to take into account the fact that each callback is likely to expect a consistent situation, while invoking them in original order for POST callbacks would actually reverse the contexts (not sure I'm being very clear here ?).

Comment #39

RobRoy commented 24 January 2007 at 22:43

@Boris: Regarding drupal_execute() in install profiles, I'm with you there! There was a patch that got rolled back a while ago (by Moshe) that would allow any node/taxonomy/menu creation forms to return the newly created ID. This is needed if you're going to add a #submit handler after the node submit handler for example. You can't cleanly get the ID of the newly created node now.

I'm referring to http://drupal.org/node/94139 and http://drupal.org/node/68418. IMHO we should focus there first since even with a super API, we still need to have that functionality for forms.

Comment #40

Nick Lewis commented 25 January 2007 at 04:44

Here's an idea on returning the id of a taxonomy, node, or menu_id(from db_next_sequence) . Have a common function handle the sequences that our pgsql support requires anyway. for example

function _nick_plays_capt_obvious(&$values, $schema = array('#db_table' => 'da_table_name', '#db_primary_key' => 'dtnid')) {
$id = db_result(db_query("SELECT id FROM {sequences} WHERE name = '%s'", $schema['#db_table'].'_'.$schema['#db_primary_key'])) + 1; 
$required_form_key = $schema['#db_primary_key']; 
$values[$required_form_key] = $id;
}

One thing that is clear to me, is that we need to abstract our install profiles even further. I'm not sure the best answer is to actually require people to write *insert*, update, load, and delete crud functions like a hook. I think our install files. insert, update, delete, and load functions should CREATE the crudapi automatically.

I don't know how closely ya'll have looken into jaza's importexportapi, but it provides a really great starting place for thinking about how we can abstract everything so that we can perform this kind of magic automatically.A better approach might be to bind folks to writing out their database schema in such a way that any drupal module can execute crud functions. I know that sounds batsh#t crazy, but I've seen such possibilities in the still-beta importexport api. Its a lot easier for us to actually, finally, its bloody time, do something about hook_load, update, insert, delete, and form. Each one of those depends on each others complete agreement to operate properly -- why not think about putting a lot of that information in one place? If you ask me, it would be a hell of a lot easier to specify my inserts, and updates (and thus deletes, and loads) in the form api, than constantly having to rewrite all three hooks each time a new field gets added. I think that solving that problem would make this problem a lot easier.

But, I ramble... especially when I drink whiskey. Carry on!

Comment #41

bdragon commented 25 January 2007 at 05:32

subscribing

Comment #42

webchick

she/they

English

Vancouver 🇨🇦

commented 25 January 2007 at 05:49

I sat down tonight (when I went to make an install profile of my own) to setup a simple example with just user roles. I chose this because it's probably the simplest construct we have in Drupal: one table, two fields: rid (PK) and name.

My original approach was to do what was (to me) the simplest, which was:

user_create_role($name)
user_read_role($rid)
user_update_role($properties)
user_delete_role($rid)

I hit a number of lovely snafus:

Naming function parameters as user_create_role($name) is great when there are only one or two properties. But, it royally sucks if there are more than that, as is the case with the nodes system. The FAPI was built at least in part to get away from these functions with 11 billion arguments.
However, the alternate approach to that -- passing in something like an array for every operation, like we planned to do for update anyway -- means that now you most assuredly need to make many trips to the API docs to find out, "just what do I need to pass into this thing anyway?" This is why we have the gargantuan FAPI reference, and the same problem occurs with drupal_execute. Meh.
user_read_role($rid) is nice and all when you have the $rid. But, wouldn't it be nice to be able to get role properties by name, or user properties by e-mail, or...? Same with updating and deleting. However, adding this kind of flexibility made the inside of my function very scary... and that was just for a simple table with two columns; again, this will only become increasingly complicated with more complex objects like nodes.

I think we might be jumping the gun here... we should possibly be working on other issues such as programmatic schema creation and getting a Views "lite" query.inc file in core. Unless someone else has any really great ideas? I'm pretty sure this is what Adrian keeps trying to tell me. ;)

Alternately, a challenge: everyone take a stab at writing your own CRUD suite of functions for user roles. ;) Let's see what we can come up with.

Comment #43

pwolanin commented 25 January 2007 at 15:24

Hi webchick,

I'm trying to write some sandbox-type code to implement my ideas (I'll post when I get something working).

One thought I had for hook naming, was naming like:

hook_objectapi_X($obj)

where X= load, update, insert, submit, prepare, delete, etc.

So an implementation of this function might look like:


function mymodule_objectapi_load($obj) {

  switch($obj->class) { 
    case 'node':
      return db_fetch_array(db_query("SELECT * FROM {mymodule_node} WHERE nid = %d", $obj->nid));

     case 'comment':
        break;
     case 'user':
        break;
    case 'term'
       break;
  }
}

Thus, the taxonomy module (for example) could create a term table for every core or contrib module-defined object type (term_node, term_user, term_comment, etc)

Assuming you map the id column name to something more generic (e.g. rename nid to node_id, uid to user_id, etc), your ability to load taxonomy for any object becomes something like:


function taxonomy_obj_get_terms($obj, $key = 'tid') {
  static $terms;

  $id_name = $obj->class .'_id';
  $id = $obj->$id_name;

  if (!isset($terms[$obj->class][$id])) {
    $result = db_query(db_rewrite_sql("SELECT t.* FROM {term_". $obj->class ."} r INNER JOIN {term_data} t ON r.tid = t.tid INNER JOIN {vocabulary} v ON t.vid = v.vid WHERE r.". $id_name ."= %d ORDER BY v.weight, t.weight, t.name", 't', 'tid'), $id);
    $terms$terms[$obj->class][$id] = array();
    while ($term = db_fetch_object($result)) {
      $terms[$obj->class][$id][$term->$key] = $term;
    }
  }
  return $terms[$obj->class][$id];
}

function taxonomy_objectapi_load($obj) {

   $extra['taxonomy'] = taxonomy_obj_get_terms($obj);
   return $extra;
}

Comment #44

coreb commented 25 January 2007 at 16:10

+1, subscribing

Comment #45

pwolanin commented 26 January 2007 at 03:05

Ok, so don't try to actually run this code yet (it's neither complete nor tested), but take a look if you're interested:

http://cvs.drupal.org/viewcvs/drupal/contributions/modules/object_driver/

It's obviously still rough, but is suggestive of using an additional abstraction layer to create the table (per http://drupal.org/node/63049), and I hope shows why it's useful to be able to put the whole table definition in one big array.

It seems like with just a little work you could replace most or all of the core load/save functions (node, user, taxonomy, comment, etc).

Comment #46

Nick Lewis commented 27 January 2007 at 06:21

+1 X 1,000,000 -- better handling of loads, saves, deletes would make every drupal programmer 10 times more productive. Of course, I'm at an extreme that thinks that productivity trumps supporting php4.

Comment #47

kkaefer commented 27 January 2007 at 13:20

Inspired from all the posts here and on the mailing list, I fiddled something together. It’s basically an “entity API”: Modules can define their own entities and extend other module’s entities. For example, comment.module can extend the node entity with its own fields and tables (!). These tables are joined in the actual first/only query for the entity as opposed to hook_load() or hook_nodeapi(). This entity API can be used to load arbitrary “objects” from Drupal’s database.

Note that it is not necessary to change any data structures in your database.

http://cvs.drupal.org/…/sandbox/timcn/entities.php

It’s partially documented in the source code.

Comment #48

nedjo

he/him/his

English

commented 28 January 2007 at 20:03

This issue http://drupal.org/node/113435, General _load, _save, and _delete methods for all objects, is a parallel approach, but less ambitious. Rather than replacing our existing object APIs, it aims to the minimal reworking to make them consistent and provide a cross-object set of methods. The approach requires tweaks to but not fundamental rewriting of existing implementations.

I'm impressed with both pwolanin and kkaefer's approaches.

We seem to have a choice: (a) generalize and improve consistency in our existing APIs, as perhaps an interim step to later, bigger changes; (b) replace our existing APIs from scratch (this issue), or (c) do nothing.

(a) is not completely satisfying. It gets us somewhere, it gives us many of the benefits we're after, but it leaves much of the work undone. We have to be satisfied retaining a lot of legacy inconsistency between our different APIs (or we follow up with more patches to try to reduce this consistency).

(b) is more satisfying, but ambitious. Is it feasible for the 6.0 cycle? Looking at our existing API implementations, there's really a lot to rewrite there. And of course means a major break with existing code--comparable in scale to the Forms API. If we tried it for 6.0, this change would be big enough that it should really be one of the only major changes we attempt in this cycle. It would require sustained focus by the community.

We could, of course, aim for 7.0. And sometimes gradual steps are a timewaster--if we're going to break things, let's do it all at once!

Still, I wonder if we can try both (a) and (b) at the same time. This would look like: complete and apply (a), but structure it in such a way that it can be overridden by contrib implementations of (b). At its simplest, this could mean isolating the core drupal_save(), drupal_load(), drupal_invoke_entity(), etc. (or whatever we call them) methods in an include file then conditionally loading it as follows:


  // Allow for custom object handling methods.
  if (variable_get('drupal_entity_include', '') && file_exists(variable_get('drupal_entity_include', ''))) {
    require_once './' . variable_get('drupal_entity_include', '');
  }
  else {
    require_once './includes/entity.inc';
  }

Like e.g. drupal_mail(), this would allow contrib overrides of the core functionality. So, maybe, we could do both: get some major core improvements for 6.0 (without major breaks) while providing an incubation period where we could develop and mature one or more of the promising (b) approaches, possibly for inclusion in 7.0.

Comment #49

ximo commented 11 February 2007 at 04:44

+1 for this issue, and strategy 3.

I hope this will be ready for D6. As a gimmick one could call this the SLAD API (Save, Load, Add, Delete) ;) Seriously, I believe abbreviations are confusing to users, and I think we should avoid that. An alternative name might be "Actions API"?

Comment #50

fgm

French

Paris, France

commented 11 February 2007 at 17:47

Actually, although the summer 2007 time frame has been planned for D6, I don't think there is any specific reason for this period over another, possibly longer period: since D6 is breaking various things anyway, it could make sense to accept the breakage, and the fact that it needs more time to be done properly than just break some parts while leaving others undone.

Comment #51

nedjo

he/him/his

English

commented 12 February 2007 at 17:51

hook_user() and user_save() are major barriers since they pass the user object in two separate arguments and include a double pass by reference. Fixing this looks to be a necessary piece of improving or replacing our object handling methods. I've opened a new issue, http://drupal.org/node/118345. Tackling that would put is a good ways ahead toward the longer term goal of streamlining our object handling. Anyone willing/able to take it on?

Comment #52

pwolanin commented 16 April 2007 at 16:12

FYI, I worked the code in #45 into a working module for Drupal 5.x: http://drupal.org/project/object_driver

Comment #53

criznach commented 22 May 2007 at 18:34

Subscribing big time!

Comment #54

beginner commented 11 November 2007 at 15:20

Comment #55

pancho

UTC+2 🇪🇺 EU

commented 11 December 2007 at 07:44

Version:

6.x-dev

» 7.x-dev

I second nedjo and vote for a from-the-scratch approach aiming at D7.
Particularly I am convinced that Drupal should open up to using objects where it makes sense. By the end of 2008, PHP4 will play no big role anymore, so we can rely on the much better OOP support of PHP5.

Comment #56

yojoe commented 8 January 2008 at 13:10

Comment #57

Crell commented 8 January 2008 at 16:00

Additional thoughts on this front: http://groups.drupal.org/node/8001

Comment #58

DougKress commented 24 February 2008 at 19:24

I think there might be some lessons we could learn from RoR here.

Why not centralize all CRUD routines? It's fairly easy to create self-discovery of tables and fields - even cache the information - to dynamically build the appropriate insert, update and delete statements. We could have a few simple requirements for tables - such as integer identity fields, either auto_incrementing or standard sequence names and special "created_on" and "updated_on" timestamps.

This module would expose the following functions:

crud_save($table, $data, $type = null) (where $type is 'insert' or 'update' - leave blank for either)
crud_delete($table, $key)
crud_get($table, $key)

It could expose hooks - for security traps:
hook_crud_insert($table, $key, $data)
hook_crud_update($table, $key, $data)
hook_crod_delete($table, $key, $data)

I'm working on the beginnings of this module now and I'll be contributing it soon.

- Doug

Comment #59

webchick

she/they

English

Vancouver 🇨🇦

commented 25 February 2008 at 14:59

We have something like crud_save in D6 as drupal_write_record. It has a couple of problems though, particularly with tricky things like the node_revisions table.

I think also the first step to having "intelligent" functions like this is adding in referential integrity to our schema. Drupal atm doesn't have the concept of foreign keys, so you're stuck matching on field name, and vid could mean "node revision ID", "vocabulary ID", etc.