Ocr image, parser file

The main point of the module is to be able to convert text in images, regular document content in pdf, doc, excel, Power point into text and save it in a certain textarea field. If it is an image, save it in title and alt of image field, if it is a document file, save it in description. Besides, you can map a textarea field to store. It helps a lot with searching with views.
You can set a text field and hide it in the form. When you upload file it will fill convert text into form field mapping.

Key features and benefits of this module include:

Text extraction from common image formats like JPG, PNG, TIFF as well as PDF documents. Get content in office file doc, docx, xls, xlsx, ppt, pptx, pdf... The extracted text can be stored and manipulated within Drupal.
Integration with Views for searching and filtering content based on text extracted from images. No need for external OCR services.
Update title alt existing files in bulk with view and VBO.
Support for multiple languages using available OCR engines like Tesseract.
A robust set of APIs and hooks to leverage OCR capabilities throughout the site.

How to Get Started

Install Tesseract on your environnement
Example on ubuntu:
sudo apt-get install tesseract-ocr
The main purpose of this module is that you can read the content of the image/document and assign it to the title, alt, description of the image, or a specified text field.
If you use file field, the module also extends to read the contents of the input files image, pdf, offfice (doc, docx, xls, xlsx,...)
How to work
- Setup permission Tesseract to work with php.
- Install module "OCR Image" with composer it will install

- Add field image / file
- with the image field

Turn on Enable Alt field and Enable Title field.
form display manage select OCR image

- with the file field

Turn on Enable Description field.
Form display manage select OCR / parser file

- In widget setting select your language, limit text (set 0 for full text)

Use with services

This module has a service that can be used by your own module
For example to parser text in document pdf, doc, excel, powerpoint,...


$document_parser_service =  \Drupal::service('ocr_image.DocParser');
$line_text_array = $doc_parser_service->getText($file_path, $language = 'eng', $limit = 0);

For example ocr image:

$file_path = 'https://example.com/photo.jpg';
$ocr_image_service = \Drupal::service('ocr_image.OcrImage');
$image_text_array = $ocr_image_service->getText($file_path, $language = 'eng', $limit = 500);

This will return an array with the following keys: full_text (everything, as it appears on the image), title (only the first line), alt (everything but the first line) and array (1 line of text per value).

Update all existing images

This requires using View Bulk Operations.

Optionally add a text field to the entity that your image field belongs to.
Go to the "Manage Form Display" tab for the entity with the image field.
Change the widget to OCR Image. Configure the widget as desired.
Create a view that lists the entities with the image field. Add a the Bulk Operations field.
Save the view.
Now use it to select all your entities and choose the "Update empty image text (Image OCR)"

Do you like this module? Show your appreciation by buying me ☕.

Supporting organizations:

nbao

Project information

Project categories: Media
Ecosystem: Bootstrap 5 admin
31 sites report using this module
Created by lazzyvn on 12 May 2023, updated 24 December 2024
Stable releases for this project are covered by the security advisory policy.
Look for the shield icon below.

Releases

1.0.2

released 6 August 2024

Works with Drupal: ^9 || ^10 || ^11

Install:

Development version: 1.0.x-dev updated 2 Jul 2025 at 03:19 UTC

View all releases

Ocr image, parser file

Primary tabs

How to Get Started

Use with services

Update all existing images

Project information

Releases

Maintainers

Issues for Ocr image, parser file

All issues

Bug report

Statistics

Resources

Development

News items

Our community

Documentation

Drupal code base

Governance of community