We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
Syncfusion Feedback


Trusted by the world’s leading companies

Overview

The Syncfusion .NET Optical Character Recognition (OCR) Library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract engine.


Why Syncfusion’s OCR

Cross-platform support

Designed for C# and VB.NET running on .NET 6, 5, Core, Standard, or Framework.

Cloud platform

Works in cloud platforms such as Azure (Webapps, Websites, Webservices, and Functions) and AWS (EC2, Lambda).

Customize OCR engine

By default, the OCR library uses the Tesseract OCR engine. Other external OCR services from Microsoft Azure, AWS, Google, and more can also be used.

International languages

The OCR engine supports 120+ languages. It is possible to use more than one language at a time to read documents that contain words in more than one language.


Create searchable PDF

Perform OCR on the entire scanned PDF document and convert it into a searchable PDF document.

Image to searchable PDF/A

Make images searchable and selectable by converting them to PDF or PDF/A document using OCR.

Extract text from an image

Extract the text from a single scanned image or multi-page tiff images.

Zonal text extraction

Extract data from PDFs and images by restricting OCR to a particular region in the PDF or image.

OCR on a rotated page

Extract the text from the scanned rotated page of a PDF document and convert it to a searchable PDF document.

Improved accessibility

Automatically convert images into an accessible PDF (PDF/UA) document by applying necessary tags to the hidden text, so that text in the PDF document is machine readable.

Post-processing

After OCR, you can programmatically highlight, underline, and strike through the text of a PDF document. You can also redact, edit, and digitally sign the PDF document.


Convert scanned PDF to a searchable PDF in C#

Convert the scanned PDF document to a searchable PDF document using the Syncfusion OCR Library with just a few lines of C# code as demonstrated below.

//Initialize the OCR processor by providing the path of Tesseract binaries
using (OCRProcessor processor = new OCRProcessor(@"TesseractBinaries/Windows"))
{
    //Load the existing PDF document
    PdfLoadedDocument loadedDocument = new PdfLoadedDocument(inputStream);
    processor.Settings.Language = Languages.English; //Set the OCR language
    //Process OCR by providing the PDF document and Tesseract language data
    processor.PerformOCR(loadedDocument, @"TessData/");
    MemoryStream stream = new MemoryStream(); //Save the OCRed document to memory stream
    loadedDocument.Save(stream);
    loadedDocument.close(true); //Close the PDF document
}


Awards

Greatness—it’s one thing to say you have it, but it means more when others recognize it. Syncfusion is proud to hold the following industry awards.

Scroll up icon