Hi everybody,

I have to deal with a site linking to multiple documents in PDF format (some scanned as page pictures) and more seldom - in PPT and DOC formats. Source documents are not available in most cases.

To make content more findable and keep visitors onsite, it makes sense to try and parse the PDFs into nodes. Parsing them through an OCR is an option but someone has to clean it up. It makes sense to let visitors do it when they solve CAPTCHAS - sort of ReCAPTCHA but for my own site and not for

I am not a developer and have no idea how much effort this could take but it seems to me that other sites might have a similar need for coping with legacy content and the feature can be interesting.

Feedbach would be welcome.

Comments

soxofaan’s picture

Project: CAPTCHA Pack » reCAPTCHA
Component: Code » General
Status: Active » Closed (won't fix)

reCAPTCHA is not a part of CAPTCHA pack.

Apart from that, I think this feature request is way out of scope of the reCAPTCHA module for drupal, which just a wrapper around the reCAPTCHA web service. What you are asking is far from trivial and involves things that do not fit very well in the architecture and workflow of a Drupal module.

I would recommend contacting the reCAPTCHA webservice developers, maybe they already offer products and services for processing your own content.

kremena’s picture

Status: Closed (won't fix) » Closed (fixed)

Thanks for the clarification. In this case, there's no issue for this module.