2017-06-15 06:12:27 UTC
Suppose that you have a table in a PDF that is not tagged.
* To the human eye, that table consists of columns and rows. Maybe there is
a header row and a footer row, and so on.
* To a machine, that table consists of nothing more than lines and shapes
and snippets of text added on arbitrary positions. A machine doesn't know
which text belongs to which cell. A machine doesn't know if a row is a
header row, a body row, or a footer row.
In a tagged PDF, you will add special marks (we talk about "marked content")
to indicate what is what. When a machine is presented a tagged PDF with a
table, it knows which parts of the table are the header, which parts are the
body, which parts are the footer. Usually, the content will also be added in
the logical reading order. That is not the case for a PDF that isn't tagged.
In a PDF that isn't tagged, you can add all the regular text first, then all
the bold text, then all the italic text,... it really doesn't matter where
the text is on the page since all text is added a absolute positions anyway.
If you understand everything I wrote above, you should realize that software
that can turn a PDF without tags into a tagged PDF without human interaction
either doesn't exist, or it fools the customer.
It doesn't exist, because it takes a human to teach the software which parts
of the content are table headers, table cells, titles, paragraphs,... It
takes a human to teach the software what is shown in images so that "Alt
text" can be provided. Tagging a PDF correctly requires human intelligence.
If you do find software that converts a PDF without tags to a PDF that is
tagged from a technical point of view, you fool the customer. You could for
instance tag all the content as one big paragraph, and you could add Alt
text for images that says nothing more than "This is an image."
Since I assume that you want to do a good job, I am confident that you are
not asking us to help you fool your customer. We hope that you will tell
your customer in all honesty that he is asking something that can't be done.
View this message in context: http://itext.2136553.n4.nabble.com/Converting-to-Tagged-PDF-tp4661170p4661171.html
Sent from the iText mailing list archive at Nabble.com.
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
iText-questions mailing list
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php