So we talked a lot about this but we haven't got a feature request for it so I'll just say it:

I'd like Hostmaster2 to be able to automatically configure the DNS when creating a website. Right now DNS is completely externalized (ie. ignored) but we should be able to manage "zone files".

This shold probably be done through hooks in the bind.module, but those currently present are really just form handlers and would need to be abstracted a bit more.

Note that some concerns have been expressed about hosting the DNS server on the same machine as the webserver and everything else: indeed, we should be careful at allowing the nameserver to be on a different machine. For this, we thought of a few mechanisms:

  • have a provision.module to write zone files that get rsync'd
  • the provision module queues changes to the zone files that are applied remotely through a simple shell script or similar package
  • some dns server that talks mysql

Comments

anarcat’s picture

I created an issue in the Bind module that blocks this:

http://drupal.org/node/326716

spiderman’s picture

Title: nameserver provisionning » nameserver provisioning
Assigned: Unassigned » spiderman

It looks like we're aiming for the first mechanism (rsync) at the moment, in conjunction with an abstracted double API to accomplish this elegantly.

First, Aegir will have a generic "DNS Server" node type, which will be associated with a site and handle management of the zonefile/RRs. Underneath that, there will be a "DNS API" which implements the actual distribution/maintenance of the zonefiles on behalf of Drupal/Aegir. Different "engines" will implement the integration with different types of DNS software (BIND, djbdns, MyDNS, etc.)

Initially, we will only support BIND, but the idea is to define an abstracted DNS management tool that Aegir can integrate with, independent of the underlying DNS software.

More to come on this shortly..

ac’s picture

subscribing

spiderman’s picture

Version: 5.x-0.1-alpha1 » 5.x-0.1-beta2
StatusFileSize
new81.19 KB
new16.23 KB

After much ado, I've got a very rough outline of the code required to achieve this feature ready for review. Please note that
this code is currently not working properly for me, and I'm only posting it for development review and debugging!

The attached patches have been rolled against the latest (D5) revisions of both the hosting and provision modules.

A quick outline:

* Minor patches to both provison and hosting incorporate the concept of a DNS server that is associated with each platform.
* The new hosting/dns_server/hosting_dns.module is the frontend node module that allows the admin to setup all the defaults for newly-created zones and records
* The new provision/dns_server/provision_dns.module provides an abstract implementation of DNS management facility for Aegir, and uses an "engine" (currently only provision/dns_server/engines/provision_bind.module exists) to do the heavy lifting
* Many of the provision hooks are not implemented, but I've made an attempt at the all-important "verify" hook, which is what is currently broken- it doesn't appear to run at all when I install a new site
* Once installed, you will need to:
* create a DNS server node, and setup appropriate defaults for it
* edit your platform node, to set the DNS server as your default for the platform
* add a line to your sudoers file similar to the apache2ctl one, except for the rndc utility which the aegir user will use to restart BIND

Also note that I've implemented a simple admin interface at admin/build/provision_dns, but it will only work if you break the secure permissions on the /var/aegir/config/named directory and files so that the web server can write to these directly.

Obviously this code will require some heavy revisions, but I'm hopeful that this is a good start to adding this feature to an increasingly powerful project :)

spiderman’s picture

StatusFileSize
new15.97 KB
new43.23 KB

with help from adrian and anarcat, i've managed to fiddle some of the core provision stuff and re-rolled this patch so it is now working (relatively speaking) on my testbed. attached are the newpatch files. same caveats apply as above, except you should no longer see segfaults ;)

testing and feedback hugely appreciated :)

spiderman’s picture

StatusFileSize
new44.28 KB
new22.74 KB

Here's a fresh patch with some bugfixes and a more complete implementation of the provision hooks. Important notes for making this work. Eventually, I'll figure out how to roll these things into the install/configuration process,but for now, you need to:

  1. Hit update.php to get the DNS-specific updates to provision tables
  2. Manually enable the provision_dns and provision_bind modules from the admin/build/modules page, and configure/save the defaults on the admin/settings/provision_bind page
  3. Create a DNS server node (node/add/dns_server) after the configuration wizard has completed. Make sure you set a default IP, email address and primary/secondary NS records, or your zonefiles will be malformed.
  4. Edit and re-save the default platform node (node/6/edit) so that it will store a default DNS server for the platform ( the DNS server doesn't show up in the form if there is only one DNS server node in the system)
  5. Re-verify the platform in order to setup the new perms on $config and $config/named dirs. Note: The verify task is currently failing as per #343428, but it should still do everything required here

I think that's all, but please let me know if you have any trouble getting the patch to work- I'm very eager to get feedback on this! :)

spiderman’s picture

StatusFileSize
new45.95 KB
new23.29 KB

Thanks to ac for finding some logic errors in the FQDN parsing routine (_provision_dns_split_url) which determines what zone to provision the site within. I've added a new settings page (currently at admin/settings/provision_dns) to allow the admin to define valid TLDs for Aegir to accept. These are used to validate site nodes on creation, and subsequently used to determine the zone at provision time.

One open question for me is: can we assume that the 'zone' for a given URL is simply the TLD + the last component preceding it? ie. what is the right 'zone' for a FQDN like www.foo.bar.com?

Anyway, fresh patch attached, with replacement _provision_dns_split_url function, error-checking in the _provision_pre_install and _provision_verify hooks, along with a hook_nodeapi implementation in the hosting_dns_server.module to validate sites upon creation. Also moved the BIND provision settings to a local task under the provision_dns ones, so they're now found at admin/settings/provision_dns/bind.

anarcat’s picture

Quick review.

1. the define()s added to provision.module and hosting.module are not required, put that in the hook_init() of the new module
2. the modifications to the hook_block() in should probably be factored out using a cleaner API. Same with the hosting_platform_form() hook, which should probably be form_alter()'d instead. hook_update() and hook_insert(), hook_view() are clearly a pain here, but should be overridden too, I don't know how.
3. hosting_node_help() has a similar issue. The content type should probably be defined by the module itself and not hosting.module. Similarly, web_server, platform and db_server could also define their own hook_help()
4. the modification to the hosting_platform table is trickier. I don't exactly know how to handle this with Drupal "modularly": can we just define a hook_update_N() that modifies the table of another module and leave it at that?

I guess that's about it for now. Just by looking at the patches, I think it's good, and I would be ready to see this checked in, but if it goes in, this issue should not be closed unless the above API issues are resolved or we should create a specific issue for those.

I'll go around actually testing the thing now.

anarcat’s picture

Status: Active » Needs work

I actually forgot to review the new files added with the patch, so here goes more comments.

The database structure for the hosting_dns_server table worries me:

+        ns1 char(255) NOT NULL,
+        ns2 char(255) NOT NULL,

That's not extensible enough. What if I have 3 nameservers? 4? That should probably be a TEXT field, with nameservers space-seperated or something.

+        default_ip char(255) NOT NULL,

That is also tricky. The default_ip doesn't belong in the DNS server, it's in the web server. The webserver is the server the A records should point to. So I think this may be a good start, but it's a patch, what happens when you have two webservers?

+        refresh int(10) unsigned NOT NULL default '7200',
+        retry int(10) unsigned NOT NULL default '300',
+        expire int(10) unsigned NOT NULL default '604800',
+        minimum int(10) unsigned NOT NULL default '86400',
+        ttl int(10) unsigned NOT NULL default '86400',

I don't get what those fields are for... They are the defaults for the zones created under that dns server? Do we really need this per server or shouldn't this be hardcoded in the source? I guess it's a nice addition, but it wasn't necessary. :)

A few things in the dns server form, again minor:

* maxlength should match the database setting for the xfer field (255)
* maxlength should be lower for the IP address, 15?(3 digits * 4 + 3 dots)

Now for the new provision modules...

* hosting_site_zone should probably be named hosting_dns_server_zone, since it maps dns servers to zones and not sites to zones
* having an enum for the RR type is nice, but i think it's too much trouble, and the current enum doesn't cover all RFC-defined types, which are extendable anyways. So let's make this a string and keep validation at the source layer here. (see also http://www.dns.net/dnsrd/rr.html and http://www.bind9.net/dns-parameters, which would be nice additions to comments in the code ;))
* i would allow the TTL in the RR to be null, in which case it's not specified, and falls back to the zonefile default (IIRC)
* the "class" (generally IN) is not in the database structure, not sure if that's relevant, as I've never seen anything else in there :)
* also note that most of those records are actually binary in the RFC1035 and could be therefore stored much more efficiently in the database. I don't think we want to optimize that at this point, but it's worth mentionning
* the zone manipulation routines to not check for write errors
* provision_bind_create_zone() is confusing. What it's actually creating is the bind configuration that references the zonefile. So maybe create_zone() should be renamed to create_zone_config() and create_zone() just be an alias to update_zone().
* the earlier comments about the NS records also apply here: we should allow for an arbitrary number of DNS servers in the zone. in fact, NS records are RR types like any others. it's just that we need at least 1 (or two?)
* why do you need to unlink the zonefile (and bind config file) before recreating it? In fact, why recreating the whole thing everytime when you have neat functions like _provision_bind_editfile? only the SOA needs to be templatized, basically, and that, I guess, could be parsed and checked for consistency... The nice feature there is right now with the AlternC bind zone manipulation scripts is that it respects changes you would do manually. It doesn't go as far as updating the underlying data structure (ie. the database), but it's still useful...

Other nitpicking stuff:

* functions should be commented with usual phpdoc strings

Also, to answer earlier questions about what is a "zone", I'll paste my discussion with spiderman earlier on IRC:

11:54:50 <anarcat> what i think is that we should have the notion of zone in the module, that's a certainty
11:55:01 <anarcat> that shold be mostly hidden from the user when he creates a drupal site
11:55:07 <anarcat> because we don't want the trouble
11:55:21 <anarcat> but some heuristics need to be established on how to determine the TLD
11:55:25 <anarcat> in general, it's fairly easy
11:55:38 <anarcat> foo.com and www.foo.com, the zone is foo.com
11:55:39 <anarcat> why?
11:55:43 <anarcat> because it's .com
11:56:04 <anarcat> same for .org, .net, ... 
11:56:09 <anarcat> it's 'general practice'
11:56:10 <anarcat> however
11:56:17 <anarcat> www.anarcat.koumbit.org also exists
11:56:24 <anarcat> (and we use it extensively here)
11:56:34 <spyd> yep
11:56:37 <anarcat> actually
11:56:39 <anarcat> it's very simple
11:56:47 <anarcat> to determine the zone:
11:56:52 <anarcat> strip the first part
11:56:57 <anarcat> if the remaining hostname is a TLD
11:57:02 <anarcat> then add the first part back
11:57:03 <anarcat> else
11:57:09 <anarcat> that's the zone
11:57:25 <anarcat> i could make that more pseudo-code if that's not clear enough
11:57:34 <anarcat> so in the case of yahoo.com, it works
11:57:38 <anarcat> www.yahoo.com also works
11:57:47 <anarcat> anarcat.koumbit.org works, because you add anarcat to the koumbit.org zone
11:58:03 <anarcat> www.anarcat.koumbit.org also works because you create the anarcat.koumbit.org zone and add www to it
11:58:19 <anarcat> also note that in that case, you may want to setup delegation for anarcat.koumbit.org in the koumbit.org zone
11:58:29 <anarcat> and all sorts of 'ownership' issues come up
11:59:32 <spyd> ya, that's the case that's messing with me- cuz in most cases, you'd just define a 'www.anarcat' record in the koumbit.org zonefile, no?
11:59:40 <spyd> ie. the 'zone' should still be treated as koumbit.org
11:59:42 <anarcat> hum
11:59:45 <anarcat> no
11:59:49 <anarcat> i would rather not, in fact
12:00:02 <anarcat> i'd rather do it cleanly and allow users to have proper delegation and their own zones
12:00:56 <spyd> so.. we create a second zonefile for 'anarcat.koumbit.org', with its own SOA/NS stuff, and put a 'www' record in there?
12:01:01 <anarcat> yep

Bottomline: this is excellent work, and a mighty good start. I think the code still needs work, but can be committed.

Again, I haven't tested functionality yet.

anarcat’s picture

Note that I had to verify the platform again to make sure the named config directory was created.

anarcat’s picture

Oh, and installing the module right now gives me this fatal error now:

Fatal error: Cannot redeclare hosting_dns_server_install() (previously declared in /var/hostmaster/drupal-5.x/sites/default/modules/hosting/dns_server/hosting_dns_server.install:5) in /var/hostmaster/drupal-5.x/sites/default/modules/hosting/hosting_dns_server.install on line 26

Am I seeing double?

Update: meh, i fucked up, I had the file twice in there for some reason.

anarcat’s picture

So this is not working for me, in fact. I have installed the module, re-verified the platform, and now I can't create a new site:

    * warning: Invalid argument supplied for foreach() in /var/hostmaster/drupal-5.x/sites/default/modules/hosting/package/hosting_package.module on line 138.
    * warning: key() [function.key]: Passed variable is not an array or object in /var/hostmaster/drupal-5.x/sites/default/modules/hosting/site/hosting_site.module on line 665.
    * warning: Invalid argument supplied for foreach() in /var/hostmaster/drupal-5.x/sites/default/modules/hosting/package/hosting_package.module on line 138.
    * warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/hostmaster/drupal-5.x/sites/default/modules/hosting/site/hosting_site.module on line 331.
    * Please fill in a valid profile

that's rather weird. Also, I do not understand how dns server nodes are supposed to be associated with platforms. I tried verifying the platform, editing it, no luck...

So this shouldn't be committed as is just yet. Especially since it will generate warnings if the patch is applied but the DNS module is not installed.

spiderman’s picture

@anarcat: thanks for these comments. i've begun incorporating changes to my patch, but will leave this issue open until a decision is made on the server-type abstraction direction.

1. I've moved the define()s out of hosting.module and provision.module and put them in hook_init()s instead.

2. leaving this one until a hook exists ;)

3. I've moved the hook_help() into the hosting_dns_server.module and out of hosting_help.inc

4. To avoid patching hosting_platform, I've removed the dns_server field from hosting_platform.install for the moment, and used your suggestion of incorporating the ALTER TABLEs into hosting_dns_server.install instead.

However, this still leaves a 5 chunks in the patch of hosting_platform.module, which all relate to the new dns_server field, so I'm not sure this move makes much sense, until we abstract the 'servers' and how they relate to the platform better.

new patch forthcoming, after I go through your other set of comments, attempt to debug the error you got, and get my network connection stabilized again ;)

adrian’s picture

As i've just mentioned on irc.

Do not use alter to modify other modules' tables. There's no way to keep track of that stuff, and will cause problems later on.

Instead use a join table to keep the records, and write your own nodeapi implementation.

http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/hosting/ali...

That is an example to do it.

adrian’s picture

I created a hook_hosting_summary, and updated the existing modules to use that.

Here is an example :


function hosting_web_server_hosting_summary() {
  $summary = array();
  $web_servers = _hosting_get_web_servers();
  $summary['web_servers'] = theme('item_list', array_map('_hosting_node_link', array_keys($web_servers)), t('Web servers'));
  return $summary;
}

spiderman’s picture

@Adrian: thanks for the feedback and new hook. I've rejigged hosting_dns_server.module to use a join table and nodeapi implementation for keeping track of each platform node's DNS server. I've also implemented the new hosting_summary hook.

I believe this addresses all of the comments in anarcat's first set of feedback (#8 above), and leaves my patch with only one small change to provision.module:

-    _provision_create_dir(PROVISION_CONFIG_PATH, t('Provision configuration'), 0700);
+    _provision_create_dir(PROVISION_CONFIG_PATH, t('Provision configuration'), 0755);

Which was something Adrian and I discussed on IRC as a way to allow BIND to see the $config/named contents generated by the provision_dns.module. Everything else is now self-contained in my new modules. Yay!

I'm still working on re-rolling the patch with all the other comments and feedback incorporated, as well as trying to reproduce or track down the error anarcat saw during his testing. More to come soon..

spiderman’s picture

i've done my best to incorporate most of the changes in #9 above, so here's a breakdown of what i've done:

1. replaced ns1/ns2 with a 'ns' field in hosting_dns_server, a text field in the table corresponding to a textarea in the form, which takes one DNS server per line.

2. matched the maxlength for the form fields with the corresponding fields in the hosting_dns_server table.

3. renamed hosting_site_zone to hosting_dns_server_zone, although I realized I'm not actually using this table as yet, and I've kinda lost track of how/where to do so :P

4. replaced the 'type' enum with a regular string, which defaults to 'A'

5. altered ttl field in the RR table to default to NULL, as suggested.

6. added error-checking to all the routines that write/manipulate the zone/named files.

7. renamed provision_bind_create_zone to provision_bind_create_zone_config as suggested.

8. I've refined the provision_bind_update_zone routine with respect to the unlinking comments above. The reason for the unlinking was in case the origin had changed on a new revision of the zone, so I've made this logic explicit: if the zone exists, and there is an 'old_origin' field (set by the caller of this func), then unlink the old zonefile and remove the zone from named.conf before re-creating it.

9. Added useful phpdoc strings to all the functions (at least, I hope I got 'em all!)

10. Finally, and perhaps most importantly, I've tweaked the "zone" parsing logic in _provision_dns_split_url to match that discussed in the IRC snippet anarcat pasted above. The 'host' is now determined to be the first component of the FQDN provided, unless removing that component leaves us with a TLD. I haven't looked at setting up delegation/ownership for the cases where the "zone" is actually a subdomain of a larger zone (but not a TLD), so this will naively produce a 'foo.bar.com' zonefile for the url 'www.foo.bar.com' as it stands.

I have not dealt with the following:

1. default_ip: will need some help and further guidance to relate this to the web_server nodes, but hopefully this will do as-is for now.

2. the extra "default" fields in hosting_dns_server table are left alone for now

3. for the time being, i'm leaving the ns1/ns2 in the provision modules, just for simplicity. clearly this needs to be refined further, but i'm not sure exactly how that should work as yet. in any case, we can handle arbitrary numbers of NS records via the admin UI which lets you create records of any type within a given zone.

That's all for now- some quick tests seem to indicate all of this is working reasonably well, but I'll do a more thorough review in the morning, and post a revised patch for review.

spiderman’s picture

StatusFileSize
new54.05 KB
new19.06 KB

attached is a more recent version of the patch, with most of the feedback above applied (as described in my previous two comments). unfortunately, this is not quite working for me yet, but i'm posting it in hopes that somebody can help me work out the kinks ;)

known issues: the weighting of the two new modules in the system table (provision_bind and provision_dns) need to be manually altered to '4' and '5' respectively, otherwise you end up with dir a dir called PROVISION_CONFIG_PATH/named in your drupal root, instead of $config/named (i'm not totally sure how/where to make this change automatic, but i'm guessing in my install hooks?)

beyond that, this code seems to significantly slow (and sometimes stall) the verify and install tasks when run through drush. not sure why, but it seems to take an inordinately long time for the drush process to restart apache.

finally, the node/add/site form (after the patch is applied) seems to lose any installed 'profile' packages in the system, making it impossible to validate said form and create a new site.

all of this started happening after Vertice's most recent commits (http://drupal.org/project/cvs/196005 see Dec 11) on provision.module related to proc_open. before that, things seemed to mostly be working, except that the zonefile's NS records were slightly malformed.

anarcat’s picture

I have merged in the patch to the provision.module so that this patch is not independent from the core. I'll review the other changes shortly.

anarcat’s picture

So I haven't reviewed the code thoroughly again, since I trust the

One thing we discussed me and Adrian was that, unfortunatly, the Node API and database backend for the zonefiles will need to be migrated up to the hosting module, since we're trying to make the provision modules able to function without a database backend. The Bind API can stay in provision, but the idea is that Hosting has the "reference data" (the actual data structures of the zones) and calls provision to update whatever backend we have (bind zonefiles, djbdns text files, or powerdns stuff, which could even not need drush and talk direct to the db).

So concrectely, that would mean moving the provision_dns.admin.inc stuff up into the hosting project. I *guess* from the looks of it, that this would mean that the dns.api stuff would also need to be moved up. I think this would remove some code duplication with hosting, but it may mean that we duplicate the DNS API between the provision and hosting module.

Regarding the PROVISION_PATH stuff you mentionned above, that seems really strange and problematic. Maybe that would be fixed by migrating some things into the hosting module, I have no idea.

Regarding the performance / reliability issues with proc_open, that's problematic. We were thinking of releasing the current CVS as is for a first release candidate and we're probably going to do just that anyways. Please try to confirm that this is really the issue because that would mandate a release-critical issue to be filed, and maybe reverting the code for the 0.1 release.

Thanks for your hard work, derek.

PS: Note that the exec-to-popen switch is this commit: http://drupal.org/cvs?commit=158270.

ac’s picture

Has this been removed from cvs for RC1?

anarcat’s picture

This wasn't committed yet, as it creates problems with the platform. Derek needs to post a new patch for us to reproduce the issue at this point.

spiderman’s picture

Attached is a fresh version of this patch, which breaks the hosting_platform/package functionality that detects modules/themes/profiles for new site creation.

When applied after checking out the hosting and provision modules, but before installing the system with the hostmaster install profile, the subsequently installed system fails to detect the "packages" properly, resulting in no profile/module/theme nodes appearing in the admin/content/node page. For me, this seems to happen even if I don't enable the DNS-related modules at install time.

I've no idea why or how this is happening, so I'm hoping someone can provide some insight into what my code is doing to cause such havoc!

spiderman’s picture

I've committed this latest patch, having tested an install on a pristine server and confirmed that nothing breaks provided the modules aren't turned on. The code still needs a lot of work, but at least we can do that in cvs and collaborate more easily now.

The commits are here:
http://drupal.org/cvs?commit=164899
http://drupal.org/cvs?commit=164901

adrian’s picture

Ok. I fixed the issue you were having with package breaks.

You were initializing the $data array in the verify task.

As it stands atm, this does nothing because the tables need to be migrated up.

adrian’s picture

This code looks sound, apart from the issues related to the migration of things to hosting.

The one problem i noticed is lack of documentation. It doesn't instruct you how to add the necessary lines to your named.conf etc.

anarcat’s picture

Status: Needs work » Fixed

I'm closing this issue, for two reasons:

1. the code has been checked in and "mostly works" (famous last words)
2. there has been too many comments here to be able to follow up on specific issues

So further problems with this code should be reported as seperate issues (which I will do in seconds :).

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.