OCR library C# | .NET Core OCR Library

Trusted by the world’s leading companies

Overview

The Syncfusion .NET Optical Character Recognition (OCR) Library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract engine.

Why Syncfusion’s OCR

Cross-platform support

Designed for C# and VB.NET running on .NET 6, 5, Core, Standard, or Framework.

Cloud platform

Works in cloud platforms such as Azure (Webapps, Websites, Webservices, and Functions) and AWS (EC2, Lambda).

Customize OCR engine

By default, the OCR library uses the Tesseract OCR engine. Other external OCR services from Microsoft Azure, AWS, Google, and more can also be used.

International languages

The OCR engine supports 120+ languages. It is possible to use more than one language at a time to read documents that contain words in more than one language.

Create searchable PDF

Perform OCR on the entire scanned PDF document and convert it into a searchable PDF document.

Perform OCR for an entire PDF document

Image to searchable PDF/A

Make images searchable and selectable by converting them to PDF or PDF/A document using OCR.

OCR on image to PDF/A document

Extract text from an image

Extract the text from a single scanned image or multi-page tiff images.

Perform OCR on image

Zonal text extraction

Extract data from PDFs and images by restricting OCR to a particular region in the PDF or image.

Perform OCR for a region of the document

OCR on a rotated page

Extract the text from the scanned rotated page of a PDF document and convert it to a searchable PDF document.

Perform OCR on a rotated PDF document

Improved accessibility

Automatically convert images into an accessible PDF (PDF/UA) document by applying necessary tags to the hidden text, so that text in the PDF document is machine readable.

Post-processing

After OCR, you can programmatically highlight, underline, and strike through the text of a PDF document. You can also redact, edit, and digitally sign the PDF document.

Convert scanned PDF to a searchable PDF in C#

Convert the scanned PDF document to a searchable PDF document using the Syncfusion OCR Library with just a few lines of C# code as demonstrated below.

c#
//Initialize the OCR processor by providing the path of Tesseract binaries
using (OCRProcessor processor = new OCRProcessor(@"TesseractBinaries/Windows"))
{
    //Load the existing PDF document
    PdfLoadedDocument loadedDocument = new PdfLoadedDocument(inputStream);
    processor.Settings.Language = Languages.English; //Set the OCR language
    //Process OCR by providing the PDF document and Tesseract language data
    processor.PerformOCR(loadedDocument, @"TessData/");
    MemoryStream stream = new MemoryStream(); //Save the OCRed document to memory stream
    loadedDocument.Save(stream);
    loadedDocument.close(true); //Close the PDF document
}