The tidy extension is loaded from php5 but the error still appears.
Module is installed unders sites > all > modules (prefered path for contributions).

CommentFileSizeAuthor
#2 phpinfo.PNG8.91 KBthierry.beeckmans

Comments

dman’s picture

The binary is only looked for if the php extension is not detected.
Can you confirm it's showing up in your phpinfo?

"Found 'tidy' binary" - implies that a file called 'tidy' was found in your PATH, or in the executable path set in the import_html configs.
"It didn't run right" says that running it with -v (to retrieve the version) failed.

It should have displayed the command that failed (in fact I'm sure it did but you didn't post that vital bit of info). Try running that command from the command line (when in drupal root) and seeing what the problem is.

thierry.beeckmans’s picture

StatusFileSize
new8.91 KB

the php extension shows up in phpinfo.

I haven't set the path correctly, thought it wasn't necessary becouse it should use the one that's delivered with php5.
When I placed include_once drupal_get_path('module','import_html').'/install-htmltidy.inc'; in import_html.module the error dissapeard telling me "PHP Tidy Extension enabled OK".

Now I tried the Demo, and a lot of errors shows up in a debug div block

'c:/www/php/ext/' is not recognized as an internal or external command,
operable program or batch file.
: in debug_pre(), line 95 debug.inc : in xml_tidy_file(), line 155 tidy-functions.inc : in xml_tidy_string(), line 76 tidy-functions.inc : in parse_in_xml_file(), line 69 xml-transform.inc : in _import_html_process_html_page(), line 1705 import_html.module : in _import_html_import_files(), line 1513 import_html.module : in import_html_demo_form_submit(), line 915 import_html.module : in drupal_submit_form(), line 428 form.inc 0.09s elapsed. (9 total)

The first drupal-style error shows user warning: HTMLTidy failed to parse the input at all! It's probably very problematic HTML. A working version of tidy IS at c:/www/php/ext/ isn't it? I ran c:/www/php/ext/ -q -config D:/drupal/sites/all/modules/import_html/coders_php_library/xhtml_tidy.conf "/htm8399.tmp" and it returned: 1 in D:\drupal\sites\all\modules\import_html\coders_php_library\tidy-functions.inc on line 156.

So I guess I have to set the path right, BUT because the extension gets recognized I cannot set the path (can still dive into the database offcourse).
The path is now set to c:/www/php/ext/ because I tried it that way first. Then later on I noticed that it couldn't find the required file install-htmltidy.inc

thierry.beeckmans’s picture

Title: Found 'tidy' binary, but it didn't run right. » PHP Tidy Extension enabled OK after change

I changed include_once 'install-htmltidy.inc'; into include_once drupal_get_path('module','import_html').'/install-htmltidy.inc'; and I get the 'PHP Tidy Extension enabled OK' message.

A part of #2 was probably caused by caching...
The errors I get are:

# warning: Missing argument 1 for htmltidy_test(), called in D:\drupal\sites\all\modules\import_html\coders_php_library\tidy-functions.inc on line 127 and defined in D:\drupal\sites\all\modules\import_html\install-htmltidy.inc on line 36.
# Shouldnt get here!

passing throug a var, assigned with an empty string, can solve error 1.
But how about the rest...

dman’s picture

Clearly, if the code is progressing past

function xml_tidy_string($data,$xhtml=TRUE) {
...
  // Tidy
  if ( extension_loaded( "tidy" ) && function_exists('tidy') ) {
...
    return (string)tidy_get_output($tidy) ;
  }
...
// No tidy [error happens around here - line 76 tidy-functions.inc ]
}
xml_tidy_file(), line 155
tidy-functions.inc : in xml_tidy_string(), line 76 tidy-functions.inc 

then the check for extension_loaded( "tidy" ) isn't behaving.

For some reason (possibly this one) I'm a little more paranoid the second time around.

When testing in the settings page:
extension_loaded ( "tidy" ) ? OK. Cool

Just before actually doing the action:
extension_loaded( "tidy" ) && function_exists('tidy') ? No. Damn.

So your system claims to have the extension, but it doesn't support the basic function I expect the extension to provide. WTF?
I wonder if PHP did something sneaky to start hiding objects and I should use class_exists() or something instead...?

Try removing that second check : function_exists('tidy') from tidy-functions.inc
Otherwise I really dunno. Try the commandline option instead :-(

thierry.beeckmans’s picture

Title: PHP Tidy Extension enabled OK after change » Added check -> code needs to be revised

I added in my code a manual check for the php5-tidy in function xml_tidy_file($filepath), after the else I left your check...

if (extension_loaded('tidy')) {
		// Specify configuration
		$config = array(
		  'tidy-mark'  	=> false,
		  'tab-size' 		=> 2,
		  'indent' 			=> false,
		  'wrap' 			=> 0,
		  'output-xml' 	=> false,
		  'output-xhtml' 	=> true,
		  'doctype' 		=> 'omit',
		  'input-xml' 		=> false,
		  'show-warnings' 	=> false,
		  'numeric-entities' 	=> true,
		  'quote-marks' 	=> false,
		  'quote-nbsp' 	=> true,
		  'quote-ampersand' 	=> false,
		  'break-before-br' 	=> false,
		  'uppercase-tags' 	=> false,
		  'uppercase-attributes' 	=> false,
		  'clean' 			=> true,
		  'drop-font-tags' 	=> true,
		  'enclose-text' 	=> false,
		  'quiet' 			=> true
		);
		
		// Tidy
		$tidy = new tidy;
		$tidy->parseString($html, $config, 'utf8');
		$tidy->cleanRepair();
		$out = $tidy;
  } else {

Now I can see the Preview, but in the body stands '

'
I certainly go into that

thierry.beeckmans’s picture

Title: Added check -> code needs to be revised » I was wrong

I was searching at a wrong place, idd replacing function_exists with class_exists does do it.
Now I get error about not proper utf-8... probably that solved it all when it's in the right charset

dman’s picture

It certainly sounds like progress.
Try adding
'output-encoding' => 'utf8'
as one of the configs

http://tidy.sourceforge.net/docs/quickref.html#output-encoding
http://tidy.sourceforge.net/docs/quickref.html

I've not encountered this before, but it is certainly something to handle. I've hit XML vs non-UTF8 a few times in other places

dman’s picture

Title: I was wrong » Problems with parsing HTML using the php 'tidy' extension

(try not to change the title unless it's actually explanatory)

thierry.beeckmans’s picture

I kicked out the stuff I placed in function xml_tidy_file($filepath), that was the wrong place I was talking about ;-)
And I suddenly did realize that I not only had to change the charset from the pages but also had to delete all the 'microshit' out, now it goes through your module with success.

Just a question, I thought it did handle images, or doesn't it with pages done through the demo?

btw, fabulous module, congrats. Sometimes I really like the DOM, it has so many possibilities.

thierry.beeckmans’s picture

About the UTF8 problems, I still had weard signs in the html.
I thought that you had to pass the charset like $tidy->parseString($data, $config, 'utf8');
I did that and the signs are correctly converted.

There is still one problem left, there are these signs: �
Dunno if I can skip those with the template...

Could it be the images only gets copied when the page is submitted?

dman’s picture

no images in the demo - just one file, text/template testing.

When importing images we need subdirectories, relative paths, etc. That doesn't happen until you are attempting a full directory tree. Demo is just to get you through these sort of issues :) and to tune your import template.

As you've seen, choosing XSL processing has a bit of overhead attached to it, but when it works ... :). Damn glad I'm not doing it via regexps any more!

dman’s picture

Status: Active » Closed (fixed)