Syncfusion.OCRProcessor On Docker

3 Replies
4 Participants

Created by
RM Renan Marcel

Platform
Console App

Platform
Console App

Control
PDF

Created On
Sep 19, 2022 08:05 PM UTC

Last Activity On
Feb 21, 2025 07:15 AM UTC

Want to subscribe?
SIGN IN

Hi.

I'm trying to run the OCR reading example on a PDF file using Library Syncfusion.OCRProcessor in a Docker container but I'm getting the error below:

Syncfusion.Pdf.PdfException: Exception has been thrown by the target of an invocation.

at Syncfusion.OCRProcessor.OCRProcessor.ProcessOCR(String[] args, String[] imagePathList)

at Syncfusion.OCRProcessor.OCRProcessor.GetHOCR(String imagePath, String dataPath, Boolean multiPageTiff, String[] imagePathList)

at Syncfusion.OCRProcessor.OCRProcessor.PerformOCR(PdfLoadedDocument lDoc, Int32 startIndex, Int32 endIndex, String dataPath)

Could anyone help me with this?

Attachment: Ticket.App_dab91d37.rar

3 Replies

GK Gowthamraj Kumar Syncfusion Team September 20, 2022 12:57 PM UTC

Hi Renan Marcel,

On our further analysis, we can resolve this exception by replacing the new tesseract binaries in Linux Docker 6.0 environment. Please find the download location for new tesseract binaries.

We have created a runnable sample with latest tesseract binaries and output document for your reference. Please try the sample on your end and let us know the result.

Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/DockerSample6.01225307647

Output: https://www.syncfusion.com/downloads/support/directtrac/general/pd/SampleNet62092803318

Please refer the below documentation link about OCR,

UG: https://help.syncfusion.com/file-formats/pdf/working-with-ocr/dot-net-core#prerequisites-for-linux

Troubleshooting: https://help.syncfusion.com/file-formats/pdf/working-with-ocr/dot-net-core#troubleshooting

Please let us know if you need any further assistance with this.

Regards,

Gowthamraj K

RP Rohit Pitre February 20, 2025 02:59 PM UTC

Hi, we tried this solution and we could see that the necessary files in Tesseract/binaries and Tessdata are available under app on the Linux container. But we still get the same error.

What else are we missing ?

AM Arumugam Muppidathi Syncfusion Team February 21, 2025 07:15 AM UTC

Hi Rohit,

Thank you for getting back to us.

We have thoroughly examined the reported issue on our end, and after further analysis, we were unable to reproduce the problem. The OCR is working fine both locally and within a Linux Docker container. We suspect that the issue may be due to the tesseract binaries not being structured correctly. To resolve this exception, please ensure that the tesseract binaries are organized according to the following structure:

The tessdata and tesseract binaries should be placed automatically in the bin folder of the application. The assemblies must follow this structure:

bin\Debug\net8.0\runtimes\linux\native\leptonica-1.80.0.dll,libSyncfusionTesseract.dll,libSkiaSharp.dll

Additionally, ensure that the VC++ 2015 redistributable files are installed on your machine to prevent any exceptions. You can download and install both files from the official Microsoft Download Center, as shown in the screenshot below:

Visual C++ 2015 Redistributable file

Download the Visual C++ Redistributable for Visual Studio 2015: Download Visual C++ Redistributable for Visual Studio 2015 from Official Microsoft Download Center

We also recommend reviewing our troubleshooting guide to resolve PDF OCR failures:

Troubleshooting PDF OCR failures | Syncfusion

For your reference, we have attached a sample and the corresponding output document below.

Please try the sample on your end and let us know the results. If you continue to encounter any issues, we kindly request that you provide the modified sample, input document, complete code snippet, Syncfusion package name and version, and the .NET version. This will help us analyze the situation and provide a prompt solution.

Regards,

Arumugam M

Attachment: Docker_81716590.zip

3 Replies
4 Participants
Want to subscribe?
SIGN IN
Created by
RM Renan Marcel
Platform
Console App
Control
PDF
Created On
Sep 19, 2022 08:05 PM UTC
Last Activity On
Feb 21, 2025 07:15 AM UTC

.NET PDF Processing Library

.NET PDF Processing Library

Viewer Component

Conversions

.NET Word Processing Library

.NET Word Processing Library

Editor Component

Conversions

.NET Excel Processing Library

.NET Excel Processing Library

Editor Component

Conversions

.NET PowerPoint Processing Library

.NET PowerPoint Processing Library

Conversions

Syncfusion.OCRProcessor On Docker

Enterprise Solutions

.NET PDF Processing Library

.NET PDF Processing Library

Viewer Component

Conversions

.NET Word Processing Library

.NET Word Processing Library

Editor Component

Conversions

.NET Excel Processing Library

.NET Excel Processing Library

Editor Component

Conversions

.NET PowerPoint Processing Library

.NET PowerPoint Processing Library

Conversions

Learning

Resources

Support

Syncfusion.OCRProcessor On Docker