Hi maintainers,

I've got a large patch for the D7 branch.

It fixes a bunch of bugs that I found, including some from the issue list, and adds a bunch of new features:
- Extends Views selector to include multiple page Views.
- Selectively remove paths
- Limit the saved pages to less than a particular number - useful for testing
- Optionally include .htaccess
- Export to Amazon S3. With ability to export to multiple buckets (useful for streaming assets).
- Upload all or only changes to S3.
- Invalidate uploads on Cloudfront.

Here's an uberpatch against 7.x-2.x head. I'd suggest it going against a new working branch, maybe.

For people considering using this patch: this code should be treated as -dev version, despite that it is in use in production, and nobody except an administrator should have access rights to html export.

I can also supply an example module of a easy interface, suitable for end-user use - 2 radio choices for uploading to a staging S3 site/bucket-set or live, and radios for choosing whether to upload everything, or only the changes.

CommentFileSizeAuthor
many_changes.patch119.63 KBJeff Veit
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

btopro’s picture

Priority: Normal » Major
Status: Active » Needs review

Wow Jeff! Thank you for jumping in and taking a stab at this. I'll have to check it out but those all sound like really useful enhancements. Considering the scope of this patch I agree that it should probably go into a 3.x as not to conflict with the 50 sites that 2.x is on.

That example radio module would also be useful and I'd be happy to include it. If you'd be interested in comaintainer I'd be happy to grant that to you as well. Too many modules and never enough love given to this one.

btopro’s picture

Status: Needs review » Needs work

hmm anything special that has to be done to use this? I haven't tried with S3 but locally I couldn't get it to export correctly on a site I had.

Closest thing I had was 200+ nodes rendered their content as anonymous correctly and then after about 2 files being added to the folder structure it failed.

Non-anonymous fails immediately as well as doing 10 or so items individually. Ever try this locally?

Jeff Veit’s picture

#1: It's based on 7.x-2.x the bulk of that is the same. I'm not sure if I'd call it 7.x-3.x but that's fine if you want to.

#2:

Closest thing I had was 200+ nodes rendered their content as anonymous correctly and then after about 2 files being added to the folder structure it failed.

I added that option for greedy/non-greedy. It should be defaulted to greedy copying of resources. That may be the problem. I was going to use non-greedy to read CSS files. At the moment it's only files directly used in the html.

Non-anonymous fails immediately as well as doing 10 or so items individually. Ever try this locally?

I haven't tried that for ages. I'll have a look-see what the problem is. One of my todos is make a proper test pack.

ShaneEaston’s picture

I applied this patch but I'm not able to export anything. All I get is an empty folder.. I'm trying to export one node type. I've tried all the export types tar, folder, and s3. It says it exported 13 pages and 0 files, but nothing is there.. Any ideas?

Jeff Veit’s picture

Well, that means that it thought it went through the list of pages and found something to write to the output sink. I can't tell what the problem is with this detail, but in general, when debugging, work with the folder output option. All the other output options write a folder, then process it more. What is odd is that you are seeing 0 files. You'd expect to see some css files and images. I'm soon to do another round of development, so I'll test that particular path. One thing I have to do soon is to develop a test pack, which should add to the robustness. Overall, I'd say that this dev code is useable if you are a software developer, and can sort out the inevitable problems. I wouldn't yet use it if you aren't able to debug.

ochab’s picture

Hey there, same issue here, applied the patch then ran the module, it found all the nodes and seemed to work but then only the folder was created, no files in it.
Any ideas of what might have gone wrong ?
Please Jeff Veit you are our only hope ! ;)

sohail.aslam@gmail.com’s picture

I ran in the same error, i.e., 0 files. The log in 'Administration->Reports' contains the error:

Notice: Undefined index: path in _do_export_output() (line 654 of /opt/drupal-7.24/sites/all/modules/html_export/html_export.pages.inc).

I am looking through the patched html_export code to see everything is ok.

sohail.aslam@gmail.com’s picture

Seems like this is a FireFox issue reported in https://drupal.org/node/1988432, i.e., $_SERVER['HTTP_ORIGIN']. Export of pages works OK when I use Chrome.