We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

PdfPageBase.ExtractText: String must be exactly one character long

Unhandled exception. System.FormatException: String must be exactly one character long.

   at System.Convert.ToChar(String value, IFormatProvider provider)

   at Syncfusion.Pdf.PdfPageBase.ExtractText(TextLineCollection& textLineCollection)


What can I specify to work around this? My program works fine on "clean" PDFs, but my clients often send me messed up files. I can share the test file that triggers this if needed. Email me at hcobb at telegenisys dot com


13 Replies

IJ Irfana Jaffer Sadhik Syncfusion Team January 25, 2023 04:54 PM UTC

As you have mentioned, the specific exception occurs with the messed up files. Can you provide a confidential dummy input document that caused the specific exception you mentioned, so we can investigate the issue and assist you further in this.



HE Henry replied to Irfana Jaffer Sadhik January 25, 2023 05:49 PM UTC

I have the PDF that bombs but your forms won't let me upload PDFs (naturally).

Email me and I'll respond with the file.



HE Henry January 25, 2023 08:10 PM UTC

Trying 7z for the bad file


Attachment: TestFile_45d97e84.7z


IJ Irfana Jaffer Sadhik Syncfusion Team January 27, 2023 09:41 AM UTC

Thank you for sharing the details. We are currently validating on this and will provide the further details shortly



IJ Irfana Jaffer Sadhik Syncfusion Team January 31, 2023 12:45 PM UTC

On our further analysis, we found that in the reported document some of the text glyphs are not added properly, Due to the reported exception occurring on our end. And we have confirmed the issue with "Format exception occurs while extracting text from the PDF file" as a defect in our product and the fix will be included in our upcoming weekly NuGet release which will be expected on February 1st week.


Please use the below feedback link to track the status of the reported bug.

https://www.syncfusion.com/feedback/40757/format-exception-occurs-while-extracting-text-from-the-pdf-file


Note: After the fix, some of the invalid glyphs are not extracted properly. You can find the excepted output text file after the fix.

https://www.syncfusion.com/downloads/support/directtrac/general/txt/ExtractText-768816761



HE Henry January 31, 2023 03:16 PM UTC

It is acceptable that your code return garbage output from garbage input.

Crashing in the middle of your code with no means of recovery is the problem here.

Just return that a section was invalid so that the user code can mask that out please.



IJ Irfana Jaffer Sadhik Syncfusion Team February 1, 2023 01:37 PM UTC

As of now, we could not change our current architecture, We have compared this with Adobe Acrobat Pro. Adobe acrobat also extracted the content with the garbage output. So as of now, we will resolve the exception and update you the patch as promised before



IJ Irfana Jaffer Sadhik Syncfusion Team February 3, 2023 04:51 AM UTC

We are glad to announce that our Essential Studio 2022 Volume 4 Service Pack Release V20.4.0.48 is rolled out and is available for download under the following link.

https://www.syncfusion.com/forums/180292/essential-studio-2022-volume-4-service-pack-release-v20-4-0-48-is-available-for-download


So, we do not have any weekly release this week and Fix will be included in upcoming weekly release (February 7th, 2023)

We thank you for your support and appreciate your patience in waiting for this release. Please get in touch with us if you would require any further assistance.



IJ Irfana Jaffer Sadhik Syncfusion Team February 8, 2023 05:45 AM UTC

We have included the include the fix for this issue Format exception occurs while extracting text from the PDF file fix in our latest weekly release (20.4.0.49).


Please use the below link to download our latest weekly NuGet,

NuGet Link: NuGet Gallery | Syncfusion.Pdf.Net.Core 20.4.0.49




KB Koste Budinoski April 25, 2023 12:36 PM UTC

This is still happening with a PDF document I have on my side 

System.FormatException: String must be exactly one character long.

Link to the PDF: https://drive.google.com/file/d/1DP6rs_T3gz7O10Z1lJlSKgAPNP-8FU5Y/view?usp=sharing

Using latest Syncfusion.EJ2.PdfViewer.AspNet.Core.Windows package version 

<PackageReference Include="Syncfusion.EJ2.PdfViewer.AspNet.Core.Windows" Version="21.1.41" />


Example code:

var loadedDocument = new PdfLoadedDocument(new FileStream(@"file_path_here", FileMode.Open));
foreach (PdfPageBase page in loadedDocument.Pages)
{
page.ExtractText(out var lineCollection);
}


IJ Irfana Jaffer Sadhik Syncfusion Team April 26, 2023 06:19 AM UTC

Since our 2023 volume 1 SP1 release is expected to be rolled out in the upcoming week. So there will be no weekly release this week. We will include the fix for the reported issue in our upcoming weekly NuGet release (May 9th,2023), once our Volume 1 SP1 is rolled out which we excepted at the start of May.


You can track the status of the feature using the following feedback link.

https://www.syncfusion.com/feedback/40757/format-exception-occurs-while-extracting-text-from-the-pdf-file


Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.



IJ Irfana Jaffer Sadhik Syncfusion Team May 10, 2023 04:42 AM UTC

We have included the include the fix for this issue “System.FormatException occurs while extracting text from the messed PDF file” fix in our latest weekly release (21.2.4). Please use the below link to download our latest weekly NuGet,


NuGet : https://www.nuget.org/packages/Syncfusion.Pdf.Net.Core/21.2.4



HE Henry May 10, 2023 03:55 PM UTC

Thanks, will test.


Loader.
Up arrow icon