BoldDesk®Customer service software offering ticketing, live chat, and omnichannel support, starting at $49/mo. for 10 agents. Try it for free.
Hi.
I'm trying to run the OCR reading example on a PDF file using Library Syncfusion.OCRProcessor in a Docker container but I'm getting the error below:
Syncfusion.Pdf.PdfException: Exception has been thrown by the target of an invocation.
at Syncfusion.OCRProcessor.OCRProcessor.ProcessOCR(String[] args, String[] imagePathList)
at Syncfusion.OCRProcessor.OCRProcessor.GetHOCR(String imagePath, String dataPath, Boolean multiPageTiff, String[] imagePathList)
at Syncfusion.OCRProcessor.OCRProcessor.PerformOCR(PdfLoadedDocument lDoc, Int32 startIndex, Int32 endIndex, String dataPath)
Could anyone help me with this?
Hi Renan Marcel,
On our further analysis, we can resolve this exception by replacing the new tesseract binaries in Linux Docker 6.0 environment. Please find the download location for new tesseract binaries.
We have created a runnable sample with latest tesseract binaries and output document for your reference. Please try the sample on your end and let us know the result.
Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/DockerSample6.01225307647
Output: https://www.syncfusion.com/downloads/support/directtrac/general/pd/SampleNet62092803318
Please refer the below documentation link about OCR,
UG: https://help.syncfusion.com/file-formats/pdf/working-with-ocr/dot-net-core#prerequisites-for-linux
Troubleshooting: https://help.syncfusion.com/file-formats/pdf/working-with-ocr/dot-net-core#troubleshooting
Please let us know if you need any further assistance with this.
Regards,
Gowthamraj K
Hi, we tried this solution and we could see that the necessary files in Tesseract/binaries and Tessdata are available under app on the Linux container. But we still get the same error.
What else are we missing ?
Hi Rohit,
Thank you for getting back to us.
We have thoroughly examined the reported issue on our end, and after further analysis, we were unable to reproduce the problem. The OCR is working fine both locally and within a Linux Docker container. We suspect that the issue may be due to the tesseract binaries not being structured correctly. To resolve this exception, please ensure that the tesseract binaries are organized according to the following structure:
The tessdata and tesseract binaries should be placed automatically in the bin folder of the application. The assemblies must follow this structure:
bin\Debug\net8.0\runtimes\linux\native\leptonica-1.80.0.dll,libSyncfusionTesseract.dll,libSkiaSharp.dll |
Additionally, ensure that the VC++ 2015 redistributable files are installed on your machine to prevent any exceptions. You can download and install both files from the official Microsoft Download Center, as shown in the screenshot below:
Download the Visual C++ Redistributable for Visual Studio 2015: Download Visual C++ Redistributable for Visual Studio 2015 from Official Microsoft Download Center
We also recommend reviewing our troubleshooting guide to resolve PDF OCR failures:
Troubleshooting PDF OCR failures | Syncfusion
For your reference, we have attached a sample and the corresponding output document below.
Please try the sample on your end and let us know the results. If you continue to encounter any issues, we kindly request that you provide the modified sample, input document, complete code snippet, Syncfusion package name and version, and the .NET version. This will help us analyze the situation and provide a prompt solution.
Regards,
Arumugam M