Discussion:
[iText-questions] Wrong extraction of X (disParallelStart/disParallelEnd) coordinate
hkaren
2014-12-30 07:00:46 UTC
Permalink
Hello everyone.

I am using LocationTextExtractionStrategy for extracting a text from PDF.
In most of all cases it works fine.
However, there are some cases when the iText library returns wrong X
coordinate (disParallelStart/disParallelEnd) for a word.

For Instance an original line from a PDF file is:

01/01/2008 *44566020 TVA A* RECU FFG FFG3801024 004 FFG LF42032

But when I extract it I get the following line

01/01/2008 RECU FFG FFG3801024 004 FFG LF42032 *44566020 TVA A*

As it's visible the *"44566020 TVA A"* part goes to the end of a line after
extraction.

How to solve this problem?


Thanks
Karen



--
View this message in context: http://itext-general.2136553.n4.nabble.com/Wrong-extraction-of-X-disParallelStart-disParallelEnd-coordinate-tp4660642.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
iText mailing list
2014-12-30 15:01:51 UTC
Permalink
Post by hkaren
Hello everyone.
I am using LocationTextExtractionStrategy for extracting a text from PDF.
In most of all cases it works fine.
However, there are some cases when the iText library returns wrong X
coordinate (disParallelStart/disParallelEnd) for a word.
01/01/2008 *44566020 TVA A* RECU FFG FFG3801024 004 FFG LF42032
But when I extract it I get the following line
01/01/2008 RECU FFG FFG3801024 004 FFG LF42032 *44566020 TVA A*
As it's visible the *"44566020 TVA A"* part goes to the end of a line after
extraction.
How to solve this problem?
1. You posted this question on Nabble, ignoring the warning that you are
not subscribed on the iText mailing-list. Somebody had to manually
approve your question. Please avoid this. Read http://itextpdf.com/nabble
2. This mailing-list is being abandoned in favor of StackOverflow. See
the intro of https://leanpub.com/itext_so
3. a. You are not telling us how you are extracting the text. Maybe you
aren't using the correct extraction strategy.
3. b. If you are using the correct extraction strategy, you should know
that iText returns the string based on the height of the baselines. You
claim that the X value is wrong, but (1.) this is not true, and (2.)
that is irrelevant. Maybe the Y-value of the baseline of "01/01/2008
RECU FFG FFG3801024 004 FFG LF42032" is higher than the Y-value of the
baseline of "*44566020 TVA A*", in that case you need to use an adapted
strategy that introduces a higher tolerance for the position of the
baseline.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Loading...