Enable loading and rendering into JSON, XML, etc.

nedjo - May 21, 2007 - 01:28
Project:Drupal
Version:7.x-dev
Component:base system
Category:feature request
Priority:normal
Assigned:Unassigned
Status:patch (code needs review)
Description

Our recent theme improvements bring us a lot closer to a clean separation of content and presentation. But rendering is still hard-coded to XHTML templates. Themed data - e.g., of page requests - are accessible only in their fully rendered output. The structured data are prepared for rendering, but we don't have a way to specify e.g. that a particular page request should be rendered in a different format.

A specific problem is how to fetch a page in JSON rather than rendered HTML format, to enable selective updating of specific page elements as opposed to full page refreshes.

The attached patch enables AJAX/AHAH requests through our regular page request handlers. The approach:

1. Enable override of the theme function in theme() to specify a different renderer.

Just before rendering, we call a new function, drupal_set_renderer(). This function calls hook_renderer(), which enables modules to specify a renderer. The theme hook and variables are passed, the variables by reference, so that any preparation of variables for output can be done.

2. Provide a single hook_renderer() implementation, system_renderer(). Here we handle jQuery requests for pages. If a page is being requested through a jQuery request and has reached the theme('page') stage, we can assume it's not the full page content that's wanted. Instead, we redirect rendering to JSON, through a new function, drupal_render_js(). Before sending to JS rendering, we filter the page variables so as to limit the size of the return value as well as preventing exposure of unfiltered data.

3. Move calls from index.php into a new function, drupal_page_content(), to make them reusable. For example, a module might wish to fetch a page render after resetting the value of $_GET['q'].

4. Have drupal_not_found, drupal_access_denied(), and drupal_site_offline() return content rather than directly rendering it, so that their content can be returned in JSON format via regular page requests.

5. Add a new argument, 'menu_result', to theme('page') calls. This value will be TRUE for regular page renders but for not found, access denied, and site offline renders will return the appropriate menu constant. This is useful e.g. in AJAX responses.

AttachmentSize
drupal-dynamic-load.patch9.22 KB

#1

nedjo - May 21, 2007 - 01:38

Here is a module designed for testing dynamic page loading as per this patch.

To use, unpack (first rename, removing the .txt extension), install and enable the module and then navigate to the page /dynamicloadtest.

You should get two links, one to an existing node on your site and one to a nonexistent one. Click the links to initiate a jQuery AJAX request.

The full result will appear in a prompt, after which the data are rendered below the link you clicked on.

On my testing, this works for existing nodes, but fails (no response) for the not found request, not sure why yet.

AttachmentSize
dynamicloadtest.tar_.gz_.txt1.34 KB

#2

Dries - May 21, 2007 - 07:23

Interesting! :)

Rails 2 has AJAX and REST support natively built in. That is, a AJAX and REST API is built at the same time as the application is built. In future versions of Rails, Atom will also be natively built in, and other formats might be too. I think that is the direction we have to take. In future, things like an AJAX API, a REST API and/or an Atom API will be assumed and we currently don't have the system in place to make that happen.

This patch is a step in the right direction. I'd like to take this one step further -- but maybe in follow-up patches.

(I haven't looked at the actual implementation yet.)

#3

moshe weitzman - May 21, 2007 - 13:20

Seems like a good idea. I read the code and it seems like it can be tightened up a little. i will look closer as time permits. thanks nedjo.

#4

Wim Leers - May 21, 2007 - 14:01

Very interesting patch. This will definitely open new use cases for Drupal.

Subscribing.

#5

Stefan Nagtegaal - May 21, 2007 - 14:07

Well, in the first place I would love to see such a thing hit the trunk!

Question remains, what the best way is to test this patch.
I'm not a JS-guru, but as I really want to make use of this functionality, perhaps you explain to me what the best way would be to properly test this?

#6

Eaton - May 21, 2007 - 14:08

I'd be interested in seeing how this might connect to http://drupal.org/node/144608 -- perhaps allowing specific nodes to be rendered as JSON/ATOM/etc as needed. the 'styles' concept might work for that, as well as passing in $options = array('renderer' => 'json')...

#7

nedjo - May 21, 2007 - 15:56

@Stefan Nagtegaal, for testing, see the test module I attached to comment #1.

@moshe, thanks, I'm keen to see any improvements you can come up with

@Eaton, yes. I included code to render nodes in JSON in my first draft of the patch but left it out as I didn't find an elegant approach. Part of the general issue is that the structured data we pass to theming generally includes a lot of data we don't want to expose (e.g., the full node object).

I've put the output overriding at a late stage, in the theme() function. Could we put it earlier? Could e.g. a JSON 'style' return a filtered array of node properties?

What is the performance hit of having an additional call that includes module invoking in every theme call? Can this be avoided? Should in fact the overrides be part of the original cached theme data?

#8

Eaton - May 21, 2007 - 16:32

@Eaton, yes. I included code to render nodes in JSON in my first draft of the patch but left it out as I didn't find an elegant approach. Part of the general issue is that the structured data we pass to theming generally includes a lot of data we don't want to expose (e.g., the full node object).

Absolutely. I think we might want to touch base and talk about this at some point -- between the 'node styles' patch, which would enable modules to explicitly request nodes in JSON format, etc... and the Node Rendering patch, which would allow them to be built without any unecessary chunks when that rendering mode is specified, I think we could do some really cool stuff.

This patch stands on its own without those pieces, but it's definitely an interesting nexus...

#9

bjaspan - May 21, 2007 - 17:43

subscribe

#10

nedjo - May 26, 2007 - 00:41

Refreshing the patch.

What I'm thinking we need to do here is build this off the menu system.

The question "what renderer should be used" probably depends on a combination of three factors:

1. What header was set in the page request (indicating the expected return format)?

2. What is the theme hook? For e.g. Javascript requests expecting , we don't want every theme function rendered in AJAX.

3. What is the relation of the theme hook to the page we currently are on? For example, at node/21 we may expect a return of the rendered node in JSON. We can't rely simply on the node theme hook, because other nodes may be rendered. What we need is, this is the primary target of the page request--a question the menu system is well placed to answer.

Rather than the way I've done this (call a hook with every theme call), we need a way for such a hook to be called once only, early in the page request, creating a registry that the theme system can consult to see if the current theme call should be rendered in a different format (and, possibly, the page request terminated at that point, since e.g. JSON can contain only a single rendered response).

Menu wisdom, anyone? Chx?

AttachmentSize
drupal-dynamic-load_0.patch9.27 KB

#11

bjaspan - May 26, 2007 - 01:56

I haven't looked at this patch much though I very much approve of the goal; my Magic Tabs module (see the New York Observer writeup comments) is a very poor approximation of where this patch could go.

Here's an offhand thought: why not have the request specify its desired output format as the first arg? So we register the paths json, xml, other_format, whatever as top-level paths. When they are called as (e.g.) json/rest/of/the/args it sets the global output format then re-invokes the menu handler for rest/of/the/args. If you do not happen to request a path registered as a format, you get the current default output format (XHTML).

#12

Dries - May 26, 2007 - 07:32

It's a complex issue to grasp (which makes is also an interesting problem to solve). :)

I'd just try to be 'extreme' and set a 'simple' goal. For example: I want node, comments, users, profiles and taxonomies to be exportable using XML. In my view, the location/path of the callback doesn't really matter. It could be 'http://example/node/42/xml', it could be 'http://example/node/42.xml' or even 'http://example/xml/node/42'. What matters at this point, is that we have a system in place that allows objects to describe themselves in XML (Atom, JSON, etc).

The core of the problem is not how it integrates with the menu system, but how Drupal objects can be declarative. The answer to that might be in the schema API -- and in particular, an extension to that.

It's exactly why I committed the schema API patch -- I've never been a fan of abstracting the abstraction layer but I do believe in the things the schema API enables us to do with regard to a data API.

Maybe this question gets you started: take a schema defintion from system.scheme and ask yourself "how can we translate a record of that schema to a correct XML blurb?", or specifically, "how can use the node schema definition to turn node 42 into a valid Atom message"?

At what callback that message becomes available, doesn't really matter at this point.

#13

bjaspan - May 26, 2007 - 12:14

Dries, I completely agree re: using the schema data structure to be able to load complete objects. I was working on this path in April until I was convinced to switch to getting the Schema API into core. (It turns out that none of the cool schema-driven features we want require the Schema API in core b/c the schema.module could just export the core tables as e.g. views does for node and other core tables. However, now that Schema API is in core it will get way more visibility, all other major modules will support it sooner, etc. etc.)

Look at http://drupal.org/node/136171#comment-229414 in the original schema issue. I talk about "loading nodes in a single query." That goal is basically isomorphic to "output any object in JSON/XML" and is a major step towards "incremental database migration." The code is still in schema.module in contrib. It's pretty broken but now that the dust has settled on getting Schema API into core I can revisit it.

This issue/patch, I thought, is primarily about the theme and menu issues. We already do have a somewhat data-driven node representation with $node->content and drupal_render. The ability to request *just* the rendered node contents (or rendered view contents, or whatever) instead of a fully rendered page is in itself quite valuable. Look at http://druplinars.com/seminars. The "All/Upcoming/Suggestions" tabs use a pretty hacky module custom called Magic Tabs that lets me set up static+dynamic tabs based on a defined set of menu paths. It would be great to make getting this kind of data and features out of Drupal easier without first having to tackle the harder problem a completely general schema structure query builder.

That said, I do agree with you, and I'm working on it. :-)

#14

nedjo - May 26, 2007 - 15:52

@bjaspan

why not have the request specify its desired output format as the first arg?

That's the approach I started out with, and is possible with no core patch. I've implemented it in the dynamicload module (part of Javascript Tools, but not part of the stable release yet.) It uses the pagearray helper module. Any page request to a path prepended with dynamicload/ will return a JSON representation of the page. (There is support in the jstools tabs.module for AJAX-loaded primary task tabs via dynamicload, similar in effect to Magic Tabs.) Printable module uses the same logic--prepend 'printable/' to any path and get a printable version, again through pagearray.

This works, but feels clunky. It requires initializing the menu system twice, once for the first prepended path and again after resetting the value of $_GET['q]. I was struck by the fact that we already have all the information we need without a prepended path: the requested page and (through the header) the fact that this is a jQuery request.

But, yes, requiring a header (or POST or GET variable) to specify return format is probably wrong. We need to support regular browser page requests, e.g. for an XML encoding, where we can't expect a header specific enough to answer the question "which particular encoding (e.g, Atom) are you expecting?" So a prepended path may indeed be the best. And by building in support in core, we should be able to avoid the need to initialize the menu system twice.

@Dries

The core of the problem is not how it integrates with the menu system, but how Drupal objects can be declarative. ... "how can we translate a record of that schema to a correct XML blurb"

Thanks, that is a good statement of goal.

My main concern with most of the AJAX and XML approaches I've seen so far is that they bypass our existing access control and rendering systems. This means either (a) the costly need to replicate what we already have, or (b) too often, a near complete lack of access or rendering control, opening up a Drupal install to various exploits and providing unsafe content.

Whatever the desired output format, we still need most or all of the following:

* User-level access control over what can be fetched
* The ability to fetch specific objects or lists of objects by id or other parameters
* Returned content that is filtered
* Various layers of themed rendering between the database calls and the returned content. (Seldom do we actually want raw data. Mostly we want it selectively prerendered, e.g. the node body rendered, but short of a full XHTML page render.)

This's why I'm looking first to the menu and theme systems. They already answer these needs. E.g., the menu system is what answers the question "What sort of access should the current user have for a given request?". Obviously we won't want to build a separate access control mechanism for each supported rendering format.

That said, leveraging the menu system shouldn't necessarily limit us to existing paths. We could build in e.g. a mapping whereby http://example.com/atom/term/32 maps to the menu item taxonomy/term/32 and its access controls.

Or (obviously not for D6!) we could completely redo our menu paths so that a path of the form

objectname/numeric_id

always maps to the object identified by numeric_id for which the primary table is objectname.

I'm struck by the fact that we already have RSS XML support in core, but not implemented in a way that particularly opens up the way for further XML encodings. Maybe our short term aim should be an approach that:

1. Opens up regular page requests to be returned in JSON (that's what the patch already does).
2. Refactors current RSS support in a generic way that follows the same logic as the JSON approach and facilitates parallel XML encodings.

A tall order however for under a week....

#15

nedjo - May 26, 2007 - 17:32

Hmm, taking a step further back though, we will want to support e.g. SOAP requests where - like XMLRPC - the request parameters are completely independent of the path. They are passed in through XML encoding (a raw POST).

Like XMLRPC, some XML encodings need to support a range of transactions beyond load/render (insert, update, delete).

The XMLRPC approach in core is to create an entirely parallel system. But, like our RSS implementation, this doesn't help much in expanding XML support, beyond creating a model that could be copied.

Can we come up with a single way of handling incoming requests that is flexible enough to recognize XMLRPC, RSS, and JSON requests - and, by extension, any other kind - and dispatch control and rendering appropriately?

My hunch is that the first step would be early in the menu process to enable modules to claim a request, based on a hierarchy of criteria (say, first a raw XML POST, then a request header, then the path).

#16

Stefan Nagtegaal - May 28, 2007 - 18:59

This needs a re-roll...

#17

Eaton - May 28, 2007 - 20:13

nedjo, I feel your pain re: the 'lots to do with less than a week until freeze'; we ran into the same problem with the node rendering patch. I'm beginning to wonder if we all need to spend some time to hammer out some sort of 'output type' or 'renderer' as a full-fledged Drupal core concept. Something that pages, nodes, comments, etc. could use for output. It seems like there is a lot of value to be found in making the concept expandable: pdf, email, rss, plain-text, html, xml, json... all those are potential output formats or rendering types.

Solving the problem of 'how a callback can say it wants content [foo] in format [bar] seems like something that needs to follow those other decisions, doesn't it..?

the foo.xml, foo.json, foo.rss, etc syntax seems like it makes sense for that...

#18

nedjo - June 8, 2007 - 14:14
Status:patch (code needs review)» patch (code needs work)

@eaton:

the foo.xml, foo.json, foo.rss, etc syntax seems like it makes sense for that...

Yes! What about this approach:

Rendering is determined firstly by the last path argument. On initial page load, the last path arg is evaluated. If it contains an extension, that is used as the content type to be returned, e.g., .xml, .js. The string before the extension determines the rendering format. (And if no extension, default to XHTML.) E.g.:

  • rss.xml: handler/renderer is RSS, return format is XML.
  • xmlrpc.xml: handler/renderer XMLRPC, return format is XML.
  • json.js: handler/renderer is JSON, return format is Javascript.

So the work in the short term would be to convert our existing JSON, RSS, and XMLRPC core support to use this approach.

#19

Grugnog2 - June 12, 2007 - 00:58

Subscribing

#20

Wim Leers - June 28, 2007 - 16:40

Any chance this will make it into D6?

#21

nedjo - June 29, 2007 - 16:03
Status:patch (code needs work)» postponed

This obviously isn't going in before code freeze.

For the JSON/AJAX component, I've been working in a contrib module, Dynamicload (part of Javascript Tools) on how to enable full AJAX loading for a Drupal site (no full page refreshes). There I've found a number of issues I didn't fully think through when I started this patch.

E.g., when loading a JSON representation of a page, how can we load new CSS and Javascript files that weren't on the original page? How can we merge in new Drupal.settings JSON data? For these, we need to preprocess the page data in specific ways, so simply overriding the output format of the existing theme('page') call isn't enough.

Altogether, this needs some discussion and better conceptualizing.

#22

moshe weitzman - September 11, 2007 - 19:40
Version:6.x-dev» 7.x-dev
Status:postponed» active

I have been doing some Mediawiki integration and they are pretty far along with their implementation of this. See http://www.mediawiki.org/wiki/API. Note the links on right hand side whic hdescribe further the features of the API.

If you download Mediawiki, api.php is in the root directory; thats the entrypoint.

Anyone gonna be speaking about this at DrupalCon? Such a juicy topic.

#23

Somes - September 17, 2007 - 16:36

Hi moshe could you explain further what going on in the API

nedjo has been doing some sterling work on fleshing out the problems on the architecture and hopefully his work on javascript behaviors can help with this post

http://drupal.org/node/114774#javascript-behaviors

cant wait for drupal 6 to get out the door so we can see some more posts here
keep up the good work

#24

sin - October 8, 2007 - 19:39

> how can use the node schema definition to turn node 42 into a valid Atom message

Imagine Drupal can dispatch GET request of a node page to node_view_text_html = node_view, node_view_text_plain, node_view_application_json etc. according to url prefix/suffix or request mime type text/html, text/plain, application/json etc. (it is more RESTfull and do not violate opaque URI rule). Default implementation of this hooks in node module should then iterate over node model field definitions (node Schema fields and CCK fields) and concatenate markup. theme() will be called only in node_view_text_html. Other hooks should incapsulate other markup's rules and possibly add type info from model (node schema definition). Then all responce must be wrapped correspondent to markup type to provide valid message.

#25

bjaspan - October 15, 2007 - 04:56

I found myself thinking about this topic recently and want to record my thoughts. I suspect I'm not the first to think them but I haven't seen them spelled out in exactly this way yet.

I started with: Why do menu callbacks return HTML? e.g.:

<?php
function myobj_page_view($obj_id) {
  return
theme('myobj', myobj_load($obj_id)); // very simplified, of course
}
?>

"Clearly," the menu callback should just return the data (myobj_load($obj_id)) or perhaps an array of theme-hook and data (array('myobj', myobj_load($obj_id)).

But returning the object itself isn't necessarily what we want. The menu callbacks are the business logic. They should turn data in the database into a nicely digested format for the renderer. This is what $node->content is: a nicely ordered list of data to be returned to the client, whether in HTML, JSON, XML, whatever.

So, perhaps menu callbacks should return a nested renderable content array along with its theme hook. Each element within the renderable content array would also contain its theme hook, a la $node->content. So:

<?php
function node_page_view($nid) {
 
$node = node_view($nid);
  return array(
'#theme' => 'node', '#value' => $node->content);
}
?>

The HTML version of #theme node (e.g. theme_html_node) would output the enclosing <div class="node ..."> and then call drupal_render(#value). #value ($node->content) of course could contain elements with plain markup, or #item_lists, or #tables, or whatever. The menu hook build up that data, tagging each #value with a theme hook name indicating its format, but not rendering any of it into any format.

This would seem to make it pretty easy to render into multiple formats. We select the format via some property of the request (path, headers, whatever). Say we choose JSON. Then, when rendering #theme node, it calls theme_json_node() (instead of theme_html_node), which happens to call drupal_render(), which calls theme_json_item_list(), or theme_json_table(), etc. There'd be a default theme_json that just outputs the #value as a string, or whatever the most useful default render is for that output format. And ditto for other output formats.

#26

bjaspan - October 15, 2007 - 05:04

I used node as an example in my previous comment but, in case it wasn't clear, I was suggesting that every menu callback return data in that format, basically the format $node->content is already in. Each element in the returned array is tagged with its #theme hook, and each output format is responsible for implementing the core set of standard theme hooks. Modules would still provide base HTML functions, and hopefully for most tagged-data formats (json, xml), a standard per-output format theme function would suffice for most cases, but if not the module would also have to provide a custom theme function per output format, not just HTML; ultimately, someone has to write the code to render the data into an output format. And, of course, a theme could override not just the HTML theme function but any output format theme function for a given hook.

I actually doubt I'm making any sense. Good night.

#27

moshe weitzman - October 15, 2007 - 13:47

yes, we discussed the idea of the whole page being an array that gets passed to drupal_render(). eaton and dries and others have nodded and agreed that it is a good idea. eaton has worked on node content and was trying to expand that to include the whole node (node links ...) but we ran out of time. we might want a parallel effort to do the page which shouldn't be too hard if you just define the page as a the set of regions (and lets make the center content a region already).

your notion of #theme is a little different from what we have today but it sounds like a good proposal to prepend an output format when looking for the proper theme function.

#28

moshe weitzman - October 15, 2007 - 14:20

Eaton's node_view refactor patch is at http://drupal.org/node/134478

#29

samuelwan - January 5, 2008 - 21:13

Subscribing

#30

sinasalek - January 27, 2008 - 06:01

Subscribing

#31

slantview - January 29, 2008 - 19:35

Subscribing

#32

Dave Cohen - January 30, 2008 - 18:35

This sounds like it could help produce facebook canvas pages. For those who don't know, they can be written in FBML, which is yet another XML document type.

In my work with facebook so far, I've made a theme to produce FBML, and this does a pretty good job. In fact, as I read in this thread about renderers, its not clear to me how they are different from themes. Is there some shortcoming of themes that makes a renderer necessary and if so, why not improve themes to solve the problem? How do I know, in the case of FBML, whether its best to make an FBML theme or an FBML renderer (or both)?

In producing FBML for facebook canvas pages, I've run into a problem I haven't seen mentioned here, so I will try to explain it. Let's say my node body contains:

link to <a href="internal:node/123">another node</a>

where the "internal:node/123" noramlly expands to something like /drupal/node/123. Now, when producing FBML, I need that link href to expand to something like "http://apps.facebook.com/myapp/node/123". This is tricky with Drupal, because the results of node filters are cached. I end up with two filter cache tables; one is used for normal pages and the other for facebook canvas pages. I mention that filter issue because with the content described in this thread it is likely to be an issue, too. For example, if I'm rendering a page as atom, do I want that link to be "/drupal/node/123/atom"? This is something to think about, perhaps the renderer needs to be known early in the request, and affect the way content is filtered.

Note also that with FBML, I would not want to have to append "/fbml.xml" to every local URL (I can tell by the incoming request that FBML is expected).

I'm just catching up with this, so please edify me if I'm missing any of the big picture.

#33

Barry Ramage - February 14, 2008 - 02:20

Subscribing

#34

Stefan Nagtegaal - February 14, 2008 - 14:54

@Nedjo: What does this need to get into the spotlight again? I would love to see this into core (D7 this time), but after reading the whole treath I'm unsure how I could help you...

#35

starbow - February 14, 2008 - 23:58

I bet this issue could also be extended to cover what I am trying to do over at #218830: Popups in Drupal 7: Plugable renderers for generating content

There might also be synergy with chx's: #218770: Drupal Pipes

#36

kbahey - February 15, 2008 - 02:52

Subscribe.

#37

Arancaytar - February 18, 2008 - 12:32

Subscribing.

#38

Rob Loach - February 25, 2008 - 20:04

This might overlap with the Services stuff in Drupal 7. JSON Server is a good example of an implementation of a web server implemented using the Services API.

#39

noahb - February 26, 2008 - 10:13

track

#40

nedjo - February 26, 2008 - 21:12
Title:Enable dynamic page loading and rendering into different formats (JSON, XML)» Enable loading and rendering into JSON, XML, etc.; adapt Services module to core?

The Services module indeed provides a strong model for what we're trying to achieve in this issue. Personally, I think a good next step would be to identify what a minimal adaptation of services into core could look like. It might include the following:

* services API
* XMLRPC server to replace existing xmlrpc.php
* JSON server to replace existing JSON request handling
* RSS server to replace existing RSS request handling
* some of the services (e.g., node.load) implemented in services module, probably relatively few to start with.

We would at least learn a lot by critiquing and analyzing this existing solution. At best, we would come out with a concrete plan for what refactoring is needed.

#41

Rob Loach - February 28, 2008 - 22:13

There should be two parts of this Services API:

Servers
The servers provide means of communication between the local server, and the external source requesting the data. They are the medium in which the data is communicated. This would include a XML-RPC server, a JSON server, and a RSS server, like nedjo mentioned. Contributed modules, as well as potential future core services, could provide a REST server, a SOAP server, an RDF server, and many others.
Services
The services would describe what data could be transported. The Data API will help us here, as it will (probably) provide all CRUD methods for any data we want to manipulate (users, taxonomy, nodes, blocks, etc). The drupal_load(), drupal_save() and drupal_delete() functions will help us load/manipulate information from any data source within the Drupal database, and pass it along a service through to the various servers available (XML-RPC, JSON, RSS, etc). The service I described here would end up being called the Data Service.

All of this should adhere to the user permissions system. This means that in order for services to provide administrative rights (to do things like save new nodes), they should authenticate with a session, and then hold onto their own session ID.

#42

Dimegga - February 29, 2008 - 01:35

suscribe

#43

BioALIEN - March 1, 2008 - 05:57

I somewhat agree with Rob Loach's comment about the Services API having two distinct parts. It would be great to have an incarnation of services.module in core. I'll be keeping a close eye on how this issue unfolds.

#44

Grugnog2 - March 1, 2008 - 22:58

I haven't really looked at the services module code in detail. I agree that it provides a useful functionality template but I am not sure it is necessarily the model we would want to use for some of the web services we might provide in Drupal core.

The reason for this is that after reading the O'Reilly REST book, it is clear that 'resource oriented' web services (primarily RESTful, be it XML, XHTML, JSON etc) have a fundamentally different model to 'remote procedure' based services (XML-RPC, SOAP etc).

I think the 'resource oriented' service types are best organized around our page creation and serving mechanism (with existing URLs as the 'base'). It seems that this can best be done through refactoring the rendering engine in a way that will allow us to (as much as possible) transparently provide multiple delivery formats (XML, XHTML, JSON etc) with basically zero additional code in each content module. In other words, we would ensure that there would be a 1-1 mapping between the data format returned by the module to Drupal core, and the output format (I think I need to put together and example to demonstrate what I mean here). REST libraries in other languages normally work by automatically translating the data object/array into the desired format - similar to how language native serialization works, but using open formats instead. I propose that we do just this, but enhance it by allowing per-format override callbacks to allow a module (or theme) to manually create this format's textual output. This could (I think) completely replace our standard theme functions for HTML output, and could also be used when more control is needed over the exact output structure for other formats (e.g. when you need to adhere to a standard XML format).

For the 'remote procedure' based services (which I would add, can also be RESTful, but are not resource oriented) there is a need to map the exposed remote API call to an internal function call. This obviously needs some coding for each remote call to be implemented correctly (and securely) and so I think needs a fundamentally different architecture to the resource oriented services. We already have something available for XML-RPC in Drupal core, although I am sure that there are techniques from services module that could be used to make this easier and more flexible.

#45

snelson - March 2, 2008 - 01:35

I was going over this whole thread, and was basically thinking exactly what you just wrote. We're talking about 2 different needs here. Personally, I prefer RESTful resource oriented services over RPC, but there still will be a need for both in Drupal. Ruby on Rails does it so nice and simple, it'd be great to have the same thing. For CRUD, it doesn't get much better ... especially if it were to support client libraries like Rails ActiveResource. But, for things that aren't crud based like sending an email, or hooking into various contrib module functionality, or even just being able to provide alternate web service protocols, we need something else. This is where Services can come in and replace XML-RPC. Services is essentially the same concept as existing XML-RPC, it just expands upon it, splitting the serving mechanism and service callbacks into separate swappable components, which allows us to do SOAP, AMFPHP, etc.

Really happy to see all this moving forward.

#46

Arancaytar - March 3, 2008 - 11:28

A minor code-related point.

+        // This construct ensures that we can keep a reference through
+        // call_user_func_array.
+        $args = array(&$variables, $hook);
+        foreach ($info['preprocess functions'] as $preprocess_function) {
+          if (function_exists($preprocess_function)) {
+            call_user_func_array($preprocess_function, $args);
+          }
+        }

I am not convinced that call_user_func_array is required here. The following should accomplish the same thing more elegantly, unless I'm missing something here:

<?php
foreach ($info['preprocess functions'] as $preprocess_function) {
  if (
function_exists($preprocess_function)) {
   
$preprocess_function(&$variables, $hook);
  }
}
?>

#47

steamedpenguin - March 6, 2008 - 22:43

Subscribing

#48

steamedpenguin - March 7, 2008 - 01:53

Wim Leers was kind enough to point out this thread in the comments to Centralized module to control feeds and feed types.

It seems that in some ways many people share an understanding of what this will accomplish. It would be great if someone can break down how this will make generating Atom and RSS feeds easier. Also, will this make my patch for a Syndicate module in core not necessary?

#49

Grugnog2 - March 7, 2008 - 17:07

I am working on this at the code sprint...anyone else wanna work on this let's get together. I am in a black SoC t-shirt ;)

#50

Rob Loach - March 7, 2008 - 21:55

A number of us at Drupalcon put together a quick design sketch of how the proposed API would look. There must, of course, be better ways of doing it. Any input would be greatly appreciated.

I unfortunately missed Boris' session regarding RDF, but any input from that crowd would be good. Seeing how the RDF module already implements its rdf_service method, it seems like both would seem fit for collaboration.

#51

Grugnog2 - March 7, 2008 - 21:59

Several of us discussed this, and various use cases that may want to use it at the sprint today - here are my notes.

Here is a small random sample of the use cases:
- Pulling structured content into another site - e.g. XML (possibly with XHTML rendered fields), but just the content - not the whole page
- Getting a representation of resource that can be PUT (or POSTed) back to edit the resource
- Jumping in on a form submit and offering a user a FAPI form in a modal dialog (e.g. confirmation of deletion)
- Updating the page following a FAPI submit or other operation
- Submitting a random form from javascript

There are several aspects that need to be considered for this to handle the use cases we discussed.
1. Data format (XHTML, XML, JSON, RSS, RDF...)
2. Part of page (Full page, a part of a page, data associated with a form)
3. Load vs. render (If you want to submit to forms you want 'load', if you want to show content to users you mostly want prerendered stuff)

Most of the use cases would probably break down something like this.
Page
- xhtml

Part page
- xml
- json
- xhtml

Form Data (input and output)
- xml
- json

We noted that the hook registry is really critical to implementing a lot of the more lightweight calls rapidly. We also discussed a lot of the workflow for AJAX/AHAH stuff, but this is out of scope for this issue.

Regarding URLs and format selection the preferred approach for most people is to integrate with our current schema, and build off that. The desired format could ideally be selected either through the HTTP Accept header, or a extension suffix. A lot of the details still need to be discussed and defined, but here are some examples:
- node/123 gives the page in the default format (normally XHTML). This is the canonical URL for the 'user-friendly' representation of this resource.
- node/123.xml gives the node in XML format with parts rendered where appropriate. The format definition would control if the full page or just the resource itself is included in the output (normally only XHTML would need the full page with blocks etc).
- node/123/edit is the canonical URL for the data representation of this resource (normally a HTML form by default).
- node/123/edit.xml is the XML representation of this resources data, that can be PUT (or POSTed, if we can't get PUT to work right!) in the same XML format to submit edits to this resource.
- node/123.dc.xml is a variant XML representation of this resource, normally to conform with an alternative representation (e.g. Dublin Core), or perhaps to request a subset of the entire resource.
- search is the canonical URL for a form. Each form that we want to access using APIs needs a URL - I think pretty much all core forms have this already.
- search.xml is an XML representation of this form (perhaps using WRDL), which could be submitted to and results would be returned in this format.

We could also provide alternative, more function oriented URLs, such as form/formid - perhaps as an intermediate step.

Implementation notes still to come....

#52

ezra-g - March 9, 2008 - 05:43

Subscribing.

#53

WorldFallz - March 11, 2008 - 19:28

subscribing

#54

nonsie - April 17, 2008 - 17:35

subscribing

#55

heydere - May 5, 2008 - 03:19

subscribing

#56

Gábor Hojtsy - May 8, 2008 - 12:59
Assigned to:nedjo» Anonymous
Status:active» patch (code needs review)

Here is a reroll of the latest patch. I am still working out the smaller details though and will post a more detailed writeup soon.

AttachmentSize
drupal_renderer.patch9.13 KB

#57

starbow - May 8, 2008 - 16:29

Wow, there is a lot of overlap between this approach and Quicksketch's proposed reworking of theme() over at: http://drupal.org/node/218830#comment-791590
(Which I used in my latest popup patch over at: http://drupal.org/node/193311#comment-835854)

#58

recidive - May 9, 2008 - 05:14

The cleanest way to tell the system what representation do you want is setting the Accept http header. This is what other systems use and is a recommendation for RESTful services.

Also, this is what your browser already does and is completely inline with other HTTP headers used for other purposes, e.g. Accept-Language for l12n and i18n and Accept-Encoding for output compression.

Comparing requests headers with response ones we have something like this:

Accept (e.g. text/xml,application/xml,...) + Accept-Charset (e.g. ISO-8859-1,utf-8...) -> Content-Type (e.g. text/html; charset=utf-8).

Accept-Encoding (e.g. gzip,deflate) -> Content-Encoding (e.g. gzip).

To make every jQuery ajax call request a JSON representation, we can use something like this:

jQuery.ajaxSetup({
  'beforeSend': function(xhr) {
    xhr.setRequestHeader('Accept', 'text/javascript');
  }
});

But to tell the system if this is an ajax call the history is a bit different, so maybe we can rely on X-Requested-With header or explore other possibilities.

#59

robertDouglass - May 10, 2008 - 18:55

Subscribe

#60

Gábor Hojtsy - May 14, 2008 - 22:02

After much thinking on it, here is a simplified version:

- it eliminates the special handling of page data, which was somewhat copied from theme preprocessing, and instead truly rely on the proven theme preprocessor
- keeps printing the page output, as by that time, we need to generate the output... it would make much more sense to keep structures before, but by this time, we generate the output and send it out, so no need to work with those arrays
- adds a new "success" parameter/variable to the page theme, so we don't need to try and make that up ourselves... those generating the page should know whether it was a success or not... also, the resulting HTTP response includes the error code (404, 403, etc), so no need to encode that ourselves in a special format

Still works fine for me. Please empty your cache before you test (after you apply the patch).

#61

Gábor Hojtsy - May 14, 2008 - 22:09

And the actual patch.

AttachmentSize
drupal_renderer_simplified.patch7.43 KB

#62

Crell - May 15, 2008 - 04:53

Subscribing. Need tie to test. *sigh*

#63

Gábor Hojtsy - May 15, 2008 - 14:09

Added XML rendering support with the XML-RPC formatting we built into Drupal already. This is a "generic" but well defined XML serialization format. It could be simpler, but this was already there and the XML generated should be easy to handle. There is no renderer trigger for that, I am looking for ideas on what should trigger that. (Accept: text/xml is something browsers send along, so it should not trigger the XML generation itself).

AttachmentSize
renderer_with_xml.patch7.69 KB

#64

nedjo - May 15, 2008 - 16:43
Title:Enable loading and rendering into JSON, XML, etc.; adapt Services module to core?» Enable loading and rendering into JSON, XML, etc.

Thanks Gábor for picking up this patch.

I haven't done any testing. From a quick read:

* The simplification looks to make sense.
* In rendering JSON we should use drupal_json(), either directly or in drupal_render_json().
* The $type argument in drupal_set_renderer() isn't used.
* If we're reusing the XMLRPC XML generation outside of XMLRPC then probably we should pull this into a separate include file, e.g., xml.inc.
* if ($hook == 'page') in system_renderer(): should this include 'maintenance_page'?

Yes, a key remaining issue is how to determine when a particular renderer should be triggered. We've had lots of suggestions: headers, special arguments (?renderer=xml), extensions (/node/21.xml), prepended path segments (xml/node/21). So far we're using a jQuery-set header to trigger JSON output. I suppose I favour those that don't require headers, as they present lower barriers. But which? I liked the extension suggestion when it was made.

I'm removing the Services module adaptation from the issue title. If that's done, it should be in a separate issue (referencing this one).

#65

nedjo - May 15, 2008 - 17:28

I'd prefer to avoid the use of exit() in the JSON and XML renderers. It feels like we're doing something wrong if we have to resort to explicitly terminating the page request.

#66

coupet - May 15, 2008 - 17:54

subscribing.

#67

Gábor Hojtsy - May 15, 2008 - 18:09

I am not sure we'd like to use the exact XML-RPC serialization, eg. Java does simpler formats, but that does not include type information, even not as much as this one. Although there you can rely on the data being loaded into an object, so it is properly casted from/to the define data types there. We don't have that thing to do here (and we are only managing output).

I was also wondering on maintenance_page, but I was not sure whether we intend to have that in XML or JSON. Although we will probably should, thinking of external applications, right.

Also agreed that services module integration is a separate issue, this one is "just data export", while services would be configurable, named endpoints, with possibly customized protocols.

#68

moshe weitzman - May 15, 2008 - 19:33

I also like using a file extension as trigger. So node/12.xml returns an xml doc.

#69

Crell - May 15, 2008 - 19:41

I actually favor the HTTP Header mechanism. HTTP headers exist for this sort of thing, let's use them. Pseudo-file-extensions are a hack.

There's also the envelope question, though. We need a way to differentiate between "HTML rendered version of a node as an HTML fragment" and "HTML rendered version of a node within an HTML page". For that, I see no good alternative to a custom HTTP header.

#70

moshe weitzman - May 15, 2008 - 20:04

I guess this isn't an either or question. We could honor an http header is sent, and if not fall back to file extension. The utility of file extension is that you don't need code to properly make a request. You can't easily alter http headers in a browser. I know some extensions/tools exist for this, but they aren't nearly as ubiquitous as an address bar. So, lets support both.

#71

recidive - May 15, 2008 - 21:14
Title:Enable loading and rendering into JSON, XML, etc.» Enable loading and rendering into JSON, XML, etc.; adapt Services module to core?

@Gàbor: "... I am looking for ideas on what should trigger that. (Accept: text/xml is something browsers send along, so it should not trigger the XML generation itself)."

Yes, but it is up to the application to weight what format it prefer to render, e.g. if a browser send 'Accept: text/xml, text/html, */*' the application can choose to serve 'text/html'. But if an client application requests 'Accept: text/xml' the system should serve a xml representation or return a 406 (Not Acceptable) error if a representation of this type is not available.

Some browsers Accept headers:

Internet Explorer 7: */*

Firefox 2.0.0.14 / Safari 3.1: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Firefox 3.0b5: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

More here: http://johnex.se/header_accept.php?cmd=list

As you can see, Firefox 3 moved to preferably accept text/html.

JQuery is well alware of Accept headers, e.g. if you call $.ajax({dataType: 'xml'...}) it will request with 'Accept: application/xml, text/xml, */*', and 'Accept: application/json, text/javascript, */*' for dataType = json.

@Gàbor: "I was also wondering on maintenance_page, but I was not sure whether we intend to have that in XML or JSON. Although we will probably should, thinking of external applications, right."

IMO, in this case we should issue an 503 (Service Unavailable) error along with a representation of the maintenance page on the requested format.

@Crell: "There's also the envelope question, though. We need a way to differentiate between "HTML rendered version of a node as an HTML fragment" and "HTML rendered version of a node within an HTML page". For that, I see no good alternative to a custom HTTP header."

I'm too think this should be a header, like 'X-Requested-With' that many Ajax libraries sets to 'XMLHttpRequest'. Maybe we should check for other values or abstract that too?

@moshe: "I guess this isn't an either or question. We could honor an http header is sent, and if not fall back to file extension."

I think that should be the reverse, if a file extension or query param is set, use that. If not, fall back to header. If no Accept header is found, then fall back to the default renderer (html).

#72

macgirvin - May 15, 2008 - 21:18
Title:Enable loading and rendering into JSON, XML, etc.; adapt Services module to core?» Enable loading and rendering into JSON, XML, etc.

As a person who routinely has to debug stuff like this, I'm not crazy about using http-headers to select content. It's very convenient to be able to go through a broad range of functionality using URLs and manipulating headers isn't as convenenient when things go wrong - which they do. Extensions can get confusing, as can anything tacked onto the end; because if you think of a URL like 'command +varargs' things always get added onto the right to provide more granular results, and in the most degradative case these are human phrases.

Which brings me back to the left. xml/node/21, atom/node/21, rss/node/21, json/node/21, mobi/node/21, print/node/21, lisp/node/21 - you get the idea; with the default representation being xhtml. This is fairly trivial to parse at a high level for a finite set of representations, and the URL accurately conveys what is being represented. If this parsing takes place before processing, it can be stripped out of the query string (and a global variable set) so that from a module perspective it's all node/21. They don't care until it's time to theme the data.

Atom publishing protocol might force how some of the underlying things are done, so it would be a good idea to consult that documentation as this progresses - mostly to avoid painting oneself into a corner. It's kind of an application wrapper itself around the site data and fully supporting it could have a bit of impact how the data and XML architecture should be best organized so it's tightly integrated rather than an awkward bolt-on package as it is with WordPress (for instance). Many other representations are purely structural renderings and don't have as much impact on how things are done.

#73

Dries - May 16, 2008 - 01:46

I agree with macgirvin that using Atom as a use case makes a ton of sense.

I also think we should support both headers as url-based format requests.

#74

recidive - May 16, 2008 - 03:03

Attached is a draft implementation of drupal_requested_renderer() function that returns a renderer based on a. and 'format' query argument or b. the Accept HTTP headers.

Two more thoughts:

- Can we use mod_rewrite to change from a extension to a query argument. E.g. from http://somesite.com/node/1.xml to http://somesite.com/?q=node/1&format=xml.

- Can we pass the 'format' argument as the first (optional) argument for a menu_callback? This would allow menu callbacks to respond to different requested formats.

AttachmentSize
accept_header.php_.txt2.28 KB

#75

Dave Cohen - May 16, 2008 - 20:27

regarding #72, there are two modules I know of which rely on values prepended to the path. They are i18n and Drupal for Facebook. Both use custom_url_rewrite in accomplishing this.

Custom_url_rewrite is handy to have, but its implementation does not support multiple modules. (It's not a hook, its a single function you can define in settings.php). So it would be a pain to combine two (or more) modules which rely on custom_url_rewrite. We have an opportunity here to improve this situation, perhaps completely replacing custom_url_rewrite with something more flexible. As we consider it, let's bear in mind...

  • Multiple prepended values need to work together, i.e. either atom/en/node/21 or en/atom/node/21.
  • If mobi/en/node/21 links to "internal:node/42" (using pathfilter style syntax), the link that's generated might need to be mobi/en/node/42. While rss/en/node/21 might link to xhtml/en/node/42. That is, some links need to be of the same type, while other do not.
  • The previous point applies to links that come from node bodies, and also blocks, breadcrumbs, etc. Currently, drupal caches the results of most input filters, making this sort of treatment impossible.
  • Whatever technique is used to introduce these prepended paths, third-party modules should be able to define their own additional ones.

I've given this some thought lately, because Drupal for Facebook has a complicated url rewrite scheme. Because a request for a facebook canvas page can come from facebook directly, or from a user's browser, AND because both fbml and iframe type pages need to be support, AND because a single drupal instance can support multiple facebook apps, I end up with paths like "fb_cb/1/fb_cb_type/iframe/node/21". What I mean is, there's more use cases for this than initially meets the eye.

#76

macgirvin - May 16, 2008 - 22:18

@Dave Cohen - thanks for the info, though it may pretty much invalidates my reasoning. (Which was basically that putting a representation on the left of the URL would be less messy and easier to implement than putting it on the right - so we wouldn't have to tag it.)

If we can't rely on having the pole position in the URL I agree that a 'format=xxx' tagged option somewhere in the path unfortunately makes the most sense. I would still prefer this as a pseudo path element rather than an option string so that it doesn't negate clean URLs completely... something like is done with the page=2 option.

Is there an issue for making custom_url_rewrite a proper hook? Though it isn't needed here, it's going to keep coming up. It was intended as a kind of administrator's pathauto swiss army knife IIRC, but now that we've got multiple modules using it they're all going to keep colliding until it's fixed.

#77

Dave Cohen - May 16, 2008 - 23:06

Is there an issue for making custom_url_rewrite a proper hook?

Not that I know of. I think in some cases custom_url_rewrite needs to do its thing before modules are loaded. Perhaps a new method should be introduced in addition to custom_url_rewrite than than trying to replace it.

Most of the issues I mentioned are relevant whether we use the pole position or not. For example, links from ?format=mobi pages would still want ?format=mobi appended to their URLs and I think there will be some challenges to doing that.

#78

Crell - May 16, 2008 - 23:55

Clean URLs only matter to the "HTML / page" use case, I think. There's no SEO advantage to an AHAH request or an XML-RPC request having a non-clean URL. So that point is moot.

Using the HTTP Accept header with an override from a GET parameter makes sense to me. That gives us both a "proper standard" mechanism and a "probably actually works in the real world" mechanism, and we can use whichever makes the most sense. Trying to cram it into the path on either end is going to be too messy.

Regarding passing the request type into the page handler, I disagree. We should allow a given path to have multiple request handlers depending on the request type. See: http://drupal.org/node/218830#comment-792841 (Early version, it's evolved a bit in my head since then but I haven't written it down.) That way node/$nid always returns a node; it could be HTML/page, HTML/fragment, HTML/json, json/json, SOAP, AMF, or or whatever else, all at the same path, depending on the headers and GET parameters. We also get envelope handling that way.

 
 

Drupal is a registered trademark of Dries Buytaert.