Syncfusion.OCRProcessor On Docker

Hi.

I'm trying to run the OCR reading example on a PDF file using Library Syncfusion.OCRProcessor in a Docker container but I'm getting the error below:

Syncfusion.Pdf.PdfException: Exception has been thrown by the target of an invocation.

at Syncfusion.OCRProcessor.OCRProcessor.ProcessOCR(String[] args, String[] imagePathList)

at Syncfusion.OCRProcessor.OCRProcessor.GetHOCR(String imagePath, String dataPath, Boolean multiPageTiff, String[] imagePathList)

at Syncfusion.OCRProcessor.OCRProcessor.PerformOCR(PdfLoadedDocument lDoc, Int32 startIndex, Int32 endIndex, String dataPath)

Could anyone help me with this?


Attachment: Ticket.App_dab91d37.rar

3 Replies

GK Gowthamraj Kumar Syncfusion Team September 20, 2022 12:57 PM UTC

Hi Renan Marcel,


On our further analysis, we can resolve this exception by replacing the new tesseract binaries in Linux Docker 6.0 environment. Please find the download location for new tesseract binaries.


We have created a runnable sample with latest tesseract binaries and output document for your reference. Please try the sample on your end and let us know the result.

Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/DockerSample6.01225307647

Output: https://www.syncfusion.com/downloads/support/directtrac/general/pd/SampleNet62092803318

Please refer the below documentation link about OCR,

UG: https://help.syncfusion.com/file-formats/pdf/working-with-ocr/dot-net-core#prerequisites-for-linux   

Troubleshooting:  https://help.syncfusion.com/file-formats/pdf/working-with-ocr/dot-net-core#troubleshooting


Please let us know if you need any further assistance with this.


Regards,

Gowthamraj K



RP Rohit Pitre February 20, 2025 02:59 PM UTC

Hi, we tried this solution and we could see that the necessary files in Tesseract/binaries and Tessdata are available under app on the Linux container. But we still get the same error. 


What else are we missing ?



AM Arumugam Muppidathi Syncfusion Team February 21, 2025 07:15 AM UTC

Hi Rohit,


Thank you for getting back to us.

 

We have thoroughly examined the reported issue on our end, and after further analysis, we were unable to reproduce the problem. The OCR is working fine both locally and within a Linux Docker container. We suspect that the issue may be due to the tesseract binaries not being structured correctly. To resolve this exception, please ensure that the tesseract binaries are organized according to the following structure:

 

The tessdata and tesseract binaries should be placed automatically in the bin folder of the application. The assemblies must follow this structure:

 

bin\Debug\net8.0\runtimes\linux\native\leptonica-1.80.0.dll,libSyncfusionTesseract.dll,libSkiaSharp.dll

 

Additionally, ensure that the VC++ 2015 redistributable files are installed on your machine to prevent any exceptions. You can download and install both files from the official Microsoft Download Center, as shown in the screenshot below:

 

Visual C++ 2015 Redistributable file

 

Download the Visual C++ Redistributable for Visual Studio 2015: Download Visual C++ Redistributable for Visual Studio 2015 from Official Microsoft Download Center

 

We also recommend reviewing our troubleshooting guide to resolve PDF OCR failures:

Troubleshooting PDF OCR failures | Syncfusion

 

For your reference, we have attached a sample and the corresponding output document below.

 

Please try the sample on your end and let us know the results. If you continue to encounter any issues, we kindly request that you provide the modified sample, input document, complete code snippet, Syncfusion package name and version, and the .NET version. This will help us analyze the situation and provide a prompt solution.


Regards,

Arumugam M


Attachment: Docker_81716590.zip

Loader.
Up arrow icon