Hi

I'm trying scrape old site which has bad markup and gives always error (end tag problem) when trying to import fetched HTML.

  • Is Tidy enabled automatically or is there something to configure?
  • Shouldn't these errors go away if Tidy was applied?
  • And if tidy is applied it is most probably applied before anything else touches the fetched HTML?

Comments

twistor’s picture

Status: Active » Fixed
Issue tags: -tidy, -errors, -automatic, -markup, -end tag

If the Tidy extension is installed, the "Use Tidy" option will appear on the parser settings page.

As to whether it will fix your problem, I don't know. Tidy can help, but it's not perfect.

ETENTION’s picture

Great, thanks for the clarification :) I'll check Tidy status and see if it appears.

twistor’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

ETENTION’s picture

Status: Closed (fixed) » Needs review

I'm back in business and I ended up wondering that should there be a notice/field also if Tidy isn't installed?

It would be quite informative if it stated that "Hey, maybe you would like to install Tidy extension in order to use it?"
Or something more formal "Tidy extension not found. Install it to enabled Tidy parser."