problems extracting text from pdf

hi, im facing issues when extracting text from pdf files


the text is coming letter by letter



also, arabic words are broken down in letters sometimes, and reversed at other times


Attachment: Desktop_d75e0b1f.rar

2 Replies

IJ Irfana Jaffer Sadhik Syncfusion Team March 18, 2025 07:25 AM UTC

Hi ilyas,


Currently we are validating on the reported behavior with the provided details on our end and we will share the further details on March 20th, 2025.



Regards,

Irfana J.




IJ Irfana Jaffer Sadhik Syncfusion Team March 20, 2025 11:17 AM UTC

Hi ilyas,

Thanks for the update.
In our PDF library, text is extracted based on the order in which it was inserted into the content records. Upon analyzing the document, we have verified that the extracted text follows this expected order.
We understand that you were expecting the text to be extracted based on sorted content glyph records. However, our current implementation does not support text extraction and layout based on sorted order.

Adobe preview 



Syncfusion text extraction


Regards,

Irfana J


Loader.
Up arrow icon