hi everyone
100 euro bounty via PayPal for a GPL drupal 4.6 module similar to the "htmlcorrector module" but which also clean up bad word-html (like the htmltidy linux command-line utility). I want my users to be able to cut-and-paste word-generated or outlook-generated html output in the "create module" text box and have it stripped of all rubbish.

the module will, of course, be released to the community
my contacts:
email - gabriele.ferriREMOVETHISTEXT@gmail.com
mobile - +393397550504 but keep in mind that I live in Italy (watch the timezone) and my first tongue is not english so speak slowly ;-)

Comments

paddy_deburca’s picture

FCKeditor already has a past-from-word button that removes superflous garbage.

Paddy.

http://deburca.org, and http://amadain.net

gabro1980’s picture

actually I've tried tinyMCE, I think it's almost the same isn't it?
but tinyMCE is a wysiwyg editor, and I cannot enter <!--break--> manually...

is FCKeditor different? can I enter plain but ugly word-html code in it?

paddy_deburca’s picture

FCKeditor is a full wysiwyg environment, with the ability to remove buttons that you consider unnecessary - or down right dangerous.

FCKeditor should clean your code for you.

Give it a test at http://fckeditor.net

Paddy.

http://deburca.org, and http://amadain.net

gabro1980’s picture

yes, I know, but how do I enter <!--break--> in the text I'm editing? that's an important feature for me

sangamreddi’s picture

The latest module does all ur features. There's a plugin for <!--break--> tag avilable in that module.

http://drupal.org/node/42452

Sunny                      
www.gleez.com | www.sandeepone.com

kbahey’s picture

Even the older versions of TinyMCE can add the

.

Just click the "HTML" button, and enter it whereever you want.
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

dman’s picture

I use it in my Import HTML module.

Works well on making stuff pure XHTML.
It wouldn't take much to wrap that into a plugin if you want.

.dan.

http://www.coders.co.nz/

gabro1980’s picture

I'm serious, I've been recently paid for some other (non-programming) jobs. I'm using drupal a lot and I'd like to help the community and, besides, I really need such module for 4.6. But I'm not a programmer :-( If you can adapt htmltidy to run just like the htmlcorrector module, then the bounty is your

G:

amstercad’s picture

I've tested TinyMCE and FCKeditor extensively, and FCKeditor is by far the more robust.

The only fault I can give it, in production (i.e. just trying to get the html in, and doing it well) is it makes the page slow to load as you work. It does do the best job of anything I've ever seen cleaning up Word HTML, and if you wanted/needed further refinement you could copy/paste the text into html-kit for color coding, which I use to manually clean what remains of the cruft.

Once I'm done with FCKeditor, I disable it in modules until the next time I need it, to speed up the page loading.

If you are seeking production or support services, please use the contact form to send me a private message.

gabro1980’s picture

if you make such a module, what should I load on my server? does it require the unix executable? I don't know if I can use it on my server...

but if the module is just like the other drupal modules (for example like htmlcorrector), I know I can install it and the bounty for it is still valid.

And, no, I don't want wysiwyg editors, sorry...
G:

dman’s picture

There is a dependancy issue, but not too bad.

If you've got PHP5, it comes as an extension that can be enabled pretty easily (just need to uncomment it in php.ini). Even if you have to beg your host to do it, there is no (good) reason why they would refuse.

If not, yeah, you need to get the binary working somehow, but it should be only a small hassle if you have any shell access at all. (Famous last words)

Basically, I knew that this little program was so tried and tested that there was no way any other hand-rolled HTML parser could be anything but trouble.

So, it's not quite as simple as most modules. Tell me about your hosting environment. php_info()?

.dan.

http://www.coders.co.nz/

gabro1980’s picture

damn is php 4.4.1. all info is at http://62.149.140.47/ver.php

a rough cut-n-paste from my host knowledge base (sorry it's partly in italian, I tried to translate the most important bits):
- PHP 4.4.1
modules: MySQL , gettext, gestione immagini jpeg e png, GDlib (Graphic Development) versioni 1 e 2, Netpbm, caratteri FreeType, crittografia con Mcrypt, xslt-Sablotron. Compatibilita' all'indietro per le variabili globali (register_global = on), estensioni di file abilitate: php, php3, phtml
- PERL5.6.1
modules: DBI, DBD::mysql, DBD::Pg, DBD::CVS, LWP, CGI, Crypt, Digest, Net ed altri. Estensioni abilitate: qualunque, è sufficiente abbiano i giusti permessi di esecuzione
- SSI: sono abilitati, con limitazione sulla direttiva "exec cmd" per motivi di sicurezza, al posto della quale è possibile utilizzare
<!--#include virtual="cgi-bin/nomescript" --> Estensioni abilitate: shtml
- RUBY-1.6.6: moduli per interfacciamento mysql e postgresql
- PYTHON-2.0.1: moduli per mysql e postgresql, stesso discorso del perl per installarne di nuovi.
- TCL-8.3.3: nessun modulo aggiuntivo
- BASH-2.0.5: con comandi di comune utilizzo: sed, awk, grep, cat, ls, sleep ed altri
- C: standard libraries (stdio.h, math.h, zlib.h ) gdbm, mysql, pgsql. (cgi written in C must be compiled and tranferred on the server in binary form). In case of missing libraries, the binary must be statically linked on GNU/linux i386 and then uploaded.

I don't have shell access but it seems I can upload and run executables. If it's so, can the module be made?

gabro1980’s picture

by the way, I've got an empty cgi-bin directory. I assume executables go in there, isn't it?

gabro1980’s picture

hi everyone
the bounty is still there for anyone who can help me.
if someone is willing to code it I'd ask him to announce his participation both on this forum and by sending me a mail

dman’s picture

I've explored the implications for the server, and I can be 80% sure it will be no hassle!

If you have shell access, I can walk you through the test process.

I'm also currently attacking the existing legacy htmltidy.module, which works, but is very inefficient as it runs the process every time for the whole page.
I'm patching it now to optionally work as an output filter (identical to htmlcorrector as requested) But also as a validator at edit time - so the crap code doesn't even make it into the system. (current filters only process on display)

Looks good so far.
An hour or less more to go stable, then testing. I'll have to write up a bit about deployment however.
All options are go!

.dan.

http://www.coders.co.nz/

gabro1980’s picture

hi Dan!
as I wrote before, I don't have shell access. there's a cgi-bin directory and I've got ftd access... is it a problem

gabro1980’s picture

actually that's ftp...

dman’s picture

It just means we can't test in real-time, but there's ways around that. I'll try writing up some instructions ...

  • First, make a folder called 'htmltidy' in your modules directory.
  • Get the executable from http://tidy.sourceforge.net/#binaries for the target system -today it's Linux - and place it there
  • Upload that to the appropriate place on your server. I'll send you the module later, we need to test first.

test the environment

  • Enable PHP Code Input filter. Make a test page through Drupal. Ensure PHP executes OK.
  • Paste this in the test page
    print('<h1>Test 1 OK</h1>');
    
  • Next, try
    print('Install Dir is '.dirname(__FILE__));
    
  • $htmltidypath = dirname(__FILE__).'/modules/htmltidy/htmltidy';
    if (file_exists($htmltidypath)){
    print("binary found at $htmltidypath");
    } else {
    print("NO binary found at $htmltidypath");
    }
    

... yes, I'm taking it slow here, but we're almost there!

You start on that, Tell me if that works OK, I'll go test exactly what version of the last step will work for us...

.dan.

http://www.coders.co.nz/

gabro1980’s picture

will do right now

gabro1980’s picture

I'm going to reply you by email

dman’s picture

If anyone's interested, This is what I did to get a binary installed on a machine without shell access.

It fetches and unpacks the executable.
Actually testing it is still to do, but it ended up working just like this!


/**
 * Attempt to download the tidy binary from the sourceforge repository
 * Flying blind, and with no introspection on the actual platform,
 * I just make some guesses about what should work.
 * Tested on two different locked-down commercial host so far.
 */
function install_tidy() {
  $wd = dirname(__FILE__).'/';
  print("Working dir is $wd<br/>");

  $source = 'http://tidy.sourceforge.net/cf/tidy_linux_x86.tgz';
  $tar = 'tidy_linux_x86.gz';
  $unzipped = 'tidy_linux_x86';

  if ( ! file_exists($wd.$tar) ){
    system("GET $source > $wd$tar");
  }
  if ( ! file_exists($wd.$tar) ){
    print("Failed to fetch the remote package <br/>");
    return FALSE;
  }

  print("We have the package <br/>");

  $command = "gunzip $wd$tar"; 
  print("running $command  <br/>");
  system($command);

  if (! file_exists($wd.$unzipped) ){
      print("didn't unzip <br/>");
    return FALSE;
  }

  print("unzipped OK<br/>");
  $command = "cd $wd; tar -xf {$wd}tidy_linux_x86"; 
  print("running $command  <br/>");
  system($command);

  $apppath = $wd.'bin/tidy';
  if ( file_exists($apppath) ){
    print("Unpacked the binary to '$apppath' OK! Tres cool<br/>");
    variable_set("htmltidy_apppath", $apppath);

    // TODO TEST IT!


  } else {
    print("Failed to unpack binary, it should have been at '$apppath' by now");
  }
}


http://www.coders.co.nz/

gabro1980’s picture

thanks to Dan, I officially declare this bounty hunt closed!