How to remove blank image

7 Replies
4 Participants

Created by
AB Ambrogio Brambilla

Platform
ASP.NET Web Forms

Platform
ASP.NET Web Forms

Control
PDF

Created On
Sep 14, 2017 09:39 AM UTC

Last Activity On
Mar 6, 2025 04:41 PM UTC

Want to subscribe?
SIGN IN

Hi.

We are working on a project which manages pdf file created via scanner.

We have documents with empty pages and we want to remove them. Unfortunately an empty page 'includes' a blank image.

How can we recognize that it is an empty page (so we can remove it)?

We tried with your example about image extract, but it fails because img.length is 1.

7 Replies

CM Chinnu Muniyappan Syncfusion Team September 15, 2017 09:23 AM UTC

Hi Ambrogio,

Thank you for contacting Syncfusion support.

We can identify the blank images by using OCRProcessor, we have created a simple sample for exporting the images from PDF document and processed the exported images using OCRProcessor and the OCRProcessor returns null or empty string then marked that page as a blank one. Please refer the below code snippet and sample for more details.

private bool IsBlankPage(PdfLoadedPage lpage)

{

bool isBlankPage = false;

//Extract images

Image[] images = lpage.ExtractImages();

if (images.Length > 0)

{

foreach (Image img in images)

{

if (!PerformOCR(img as Bitmap))

{

isBlankPage = false;

break;

}

else

isBlankPage = true;

}

else

{

isBlankPage = true;

}

return isBlankPage;

}

private bool PerformOCR(Bitmap img)

{

bool empty = false;

//Create a new OCR processor

using (OCRProcessor processor = new OCRProcessor(tesseractBinariesPath))

{

//Set language.

processor.Settings.Language = Languages.English;

//perform OCR

string text = processor.PerformOCR(img,tessdataPath);

if(text == null || text == string.Empty )

{

empty = true;

}

return empty;

}

Sample Link: http://www.syncfusion.com/downloads/support/forum/132658/ze/WFSample-2081739903

Please let us know if you have any concern.

Regards,

Chinnu

AB Ambrogio Brambilla September 15, 2017 09:40 AM UTC

Hi.

Thanks for the answer.

A question about your answer:

If the image doesn't include text (eg a picture), PerformOCR will return empty value and we will remove a good page (not only the empty one).

CM Chinnu Muniyappan Syncfusion Team September 18, 2017 10:32 AM UTC

Hi Ambrogio,

Thank you for your update.

Yes, if the image does not have any text then the PerformOCR will result empty text. At present, we do not have any image manipulation library for processing images. So that we are suggesting you to identify the empty images by processing the each image pixels individually. Please refer the below code snippet for more details.

1.Here we are processing all the image pixels.

2.If the pixel has colored data, then we consider not an empty image and skipped the process.

3.And also check if the image has 25% of black pixels, then marked it is not an empty image.

private bool IsEmptyImage(Bitmap image)

{

bool isEmpty = true;

int blackPixelCount = 0;

//Suspect 25% of image have black pixels then it is not an empty image.

int blackPixelRange = ((image.Width * image.Height) / 100) * 25;

for (int i = 0; i < image.Width; i++)

{

for (int j = 0; j < image.Height; j++)

{

Color color = image.GetPixel(i, j);

if (color.R == 255 && color.G == 255 && color.B == 255)

{

//Skip the white pixels

}

else if (color.R == 0 && color.G == 0 && color.B == 0)

{

//Get the black pixels count

blackPixelCount++;

}

else

{

//Colored pixels

isEmpty = false;

break;

}

if (blackPixelCount >= blackPixelRange)

{

isEmpty = false;

break;

}

if (!isEmpty)

break;

}

return isEmpty;

}

Please try the above workaround and let us know the details.

Regards,

Chinnu

AB Ambrogio Brambilla September 19, 2017 07:35 AM UTC

Hi. Thanks for your help.

Your solution works well but it is very slow. It takes 1 minute to work a 42 pages pdf file.

CM Chinnu Muniyappan Syncfusion Team September 20, 2017 09:42 AM UTC

Hi Ambrogio,

Yes, it takes some amount of time for processing all the image pixels by using Image.GetPixel method. We can overcome this by using Bitmap.LockBits methods, so we suggest you to use Bitmap.LockBits functions to avoid the performance related issues. Please refer the below code snippet for more details.

private bool IsEmpty(Bitmap image)

{

Rectangle bounds = new Rectangle(0, 0, image.Width, image.Height);

BitmapData bmpData = image.LockBits(bounds, ImageLockMode.ReadWrite, image.PixelFormat);

IntPtr ptr = bmpData.Scan0;

int bytes = Math.Abs(bmpData.Stride) * image.Height;

byte[] rgbValues = new byte[bytes];

// Copy the RGB values into the array.

Marshal.Copy(ptr, rgbValues, 0, bytes);

// Unlock the bits.

image.UnlockBits(bmpData);

//Suspect 25% of image have black pixels then it is not an empty image.

int blackPixelRange = ((image.Width * image.Height) / 100) * 25;

//Get the white pixels count

int whitePixelsCount = Enumerable.Range(0, rgbValues.Length).Where(i => rgbValues[i] == 255).ToList().Count;

//Get the black pixels count

int blackPixelsCount = Enumerable.Range(0, rgbValues.Length).Where(i => rgbValues[i] == 0).ToList().Count;

if ((blackPixelsCount + whitePixelsCount) != rgbValues.Length)

return false;

else if (blackPixelsCount >= blackPixelRange)

return false;

else

return true;

}

Please try the above workaround and let us know the results.

Regards,

Chinnu

RP Rohit Pitre March 5, 2025 05:29 AM UTC

Hi,

If a page contains only watermark or a shape or a textbox, the above logic would consider it as blank since it does not detect any images or text. How do we handle such a scenario to return truly blank page ?

Attached are sample pdfs in which page is wrongly detected as blank. Please run this through your code and let us know the best approach.

Regards,

Rohit

Attachment: Blank_sample_ace296f9.zip

SN Santhiya Narayanan Syncfusion Team March 6, 2025 04:41 PM UTC

Hi Rohit Pitre,

We have modified the sample based on your requirements. Kindly try the attached sample on your end and let us know the result.

Please find the sample below:

Regards,

Santhiya.

Attachment: DetectBlankPage_bf7182a3.zip

7 Replies
4 Participants
Want to subscribe?
SIGN IN
Created by
AB Ambrogio Brambilla
Platform
ASP.NET Web Forms
Control
PDF
Created On
Sep 14, 2017 09:39 AM UTC
Last Activity On
Mar 6, 2025 04:41 PM UTC

.NET PDF Processing Library

.NET PDF Processing Library

Viewer Component

Conversions

.NET Word Processing Library

.NET Word Processing Library

Editor Component

Conversions

.NET Excel Processing Library

.NET Excel Processing Library

Editor Component

Conversions

.NET PowerPoint Processing Library

.NET PowerPoint Processing Library

Conversions

How to remove blank image

Enterprise Solutions

.NET PDF Processing Library

.NET PDF Processing Library

Viewer Component

Conversions

.NET Word Processing Library

.NET Word Processing Library

Editor Component

Conversions

.NET Excel Processing Library

.NET Excel Processing Library

Editor Component

Conversions

.NET PowerPoint Processing Library

.NET PowerPoint Processing Library

Conversions

Learning

Resources

Support

How to remove blank image