BoldDesk®Customer service software with ticketing, live chat & omnichannel support, starting at $99/mo for unlimited agents. Try for free!
Hi:
I am performing a series of tests extracting text from a PDF document. I have found a possible problem in the following lines:
// Create a list to store the text data
List<TextData> textDataCollection = new List<TextData>();
//Extract text and get the text data
string extractedText = loadedPage.ExtractText(out textDataCollection);
The extractedText string contains all the text on the page, however the textDataCollection the text is incomplete, mainly in the central part of the page. The project and the pdf file are attached.
Additionally, I have observed that the "y" is repeated in the extracted text string. highlighted in bold and underlined
Thank you
nosolvidaría en muchos años (es posible que nunca), y encima habríamos disfrutadode todo el tiempo empleado en cada una de las visualizaciones. Esta es la razónpor la que este curso tiene un guión muy progresivo, fácil de seguir, yy está
óptimamente distribuido durante días sucesivos. Su contenido creará sólidas
bases de conocimiento y nos permitirá avanzar con rapidez sin tener ninguna
sensación de dificultad. En vez de ver la misma película varias veces seguidas,vamos a ver una serie cuya trama bien entrelazada nos aportará mucha mayor«cultura cinéfila». En el aprendizaje hay que repartir tareas yy saber dejar cosaspara mañana, pero ojo, también hay que ser muyy constantes si queremos tener
éxito.Como en mis libros anteriores, el lector podrá encontrar aquí las tablas devocabulario completamente traducidas y asociadas. yEn ellas se incluyyen lostérminos en español y alemán, la pronunciación figurada de cada palabra
(my apologies for the Spanish text)
Thank you
Regards,
Jesús
Attachment: Extracion_v4SF_552eec5e.rar
Hi Jesus,
Currently we are validating on the reported behavior with the provided details on our end and we will share the further details on December 9th, 2024.
Regards,
Irfana J.
Hi Jesus,
We have confirmed the issue “Text cut down issue occurs while extracting the text from the PDF document” as a defect in our product and we will include the fix in weekly release on 24th December, 2024.
Please use the below feedback link to track the status of the reported bug.
Note: If you require a patch for the reported issue in any of our Essential Studio Main or SP release version, then kindly let us know the version, so that we can provide a patch in that version based on our SLA policy.
Disclaimer: “Inclusion of this solution in the weekly release may change due to other factors including but not limited to QA checks and works reprioritization.”
Regards,
Rangarajan.
Hi Jesus,
We were unable to include the fix for the issue "Text cut down issue occurs while extracting the text from the PDF document" as promised in this weekly release due to stability concerns. The fix will be included in the upcoming weekly release on December 31, 2024.
If you would like to verify the fix before the next release, we can provide you with a custom patch. Please let us know if you are interested.
We apologize for any inconvenience this may have caused and appreciate your understanding.
Regards,
Irfana J.
HI Irfana,
Please send us the corresponding patch to advance in development
Thank you
Regards,
Jesús
Hi Jesus,
We are currently working on resolving the issue. We will provide custom patch of latest version on December 27, 2024.
Regards,
Irfana J.
Hi Jesus,
We were unable to include the fix for the issue "Text cut down issue occurs while extracting the text from the PDF document" as promised in this weekly release due to preservation issues and stability concerns. The fix will be included in the upcoming weekly release on January 7, 2025.
We will provide the custom patch of latest version on December 31, 2024.
We apologize for any inconvenience this may have caused and appreciate your understanding.
Regards,
Rangarajan.
Hi Jesus,
We apologize for the inconvenience caused.
Due to the complexities involved in resolving the text preservation issue, we are unable to provide the custom patch today. However, we assure you that it will be delivered by January 3, 2025.
Regards,
Sameerkhan N
Hi Jesus,
We have resolved the issue and prepared a custom NuGet package of version 28.1.37. We have attached the NuGet package here. Please check it.
Regards,
Rangarajan.
Hi Jesus,
We have included the fix for the reported issue “Text cut down issue occurs while extracting the text from the PDF document" in our weekly release (v28.1.38). Please use the below link to download our latest NuGet.
https://www.nuget.org/packages/Syncfusion.Pdf.WinForms/28.1.38
Root cause:
The input document contains escape sequences in the content. It is not handled properly to skip the escape sequence and causes invalid index. It leads to the issue and draws with cut down content.
Rega