The Syncfusion .NET Optical Character Recognition (OCR) Library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract engine.
Designed for C# and VB.NET running on .NET 6, 5, Core, Standard, or Framework.
Works in cloud platforms such as Azure (Webapps, Websites, Webservices, and Functions) and AWS (EC2, Lambda).
By default, the OCR library uses the Tesseract OCR engine. Other external OCR services from Microsoft Azure, AWS, Google, and more can also be used.
The OCR engine supports 120+ languages. It is possible to use more than one language at a time to read documents that contain words in more than one language.
Perform OCR on the entire scanned PDF document and convert it into a searchable PDF document.
Make images searchable and selectable by converting them to PDF or PDF/A document using OCR.
Extract the text from a single scanned image or multi-page tiff images.
Extract data from PDFs and images by restricting OCR to a particular region in the PDF or image.
Extract the text from the scanned rotated page of a PDF document and convert it to a searchable PDF document.
Automatically convert images into an accessible PDF (PDF/UA) document by applying necessary tags to the hidden text, so that text in the PDF document is machine readable.
After OCR, you can programmatically highlight, underline, and strike through the text of a PDF document. You can also redact, edit, and digitally sign the PDF document.
Convert the scanned PDF document to a searchable PDF document using the Syncfusion OCR Library with just a few lines of C# code as demonstrated below.
//Initialize the OCR processor by providing the path of Tesseract binaries
using (OCRProcessor processor = new OCRProcessor(@"TesseractBinaries/Windows"))
{
//Load the existing PDF document
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(inputStream);
processor.Settings.Language = Languages.English; //Set the OCR language
//Process OCR by providing the PDF document and Tesseract language data
processor.PerformOCR(loadedDocument, @"TessData/");
MemoryStream stream = new MemoryStream(); //Save the OCRed document to memory stream
loadedDocument.Save(stream);
loadedDocument.close(true); //Close the PDF document
}
Greatness—it’s one thing to say you have it, but it means more when others recognize it. Syncfusion is proud to hold the following industry awards.