2007-03-20 16:44:26 UTC
plus hidden text, so that you can search/select the text, but only see the
originally scanned image. We then use Adobe FlashPaper 2 to turn it into a
SWF that can be imbedded in a web page. However, the hidden text is being
stripped out of the final SWF, so that it is no longer searchable. Adobe
considers this a "limitation" (we consider it a "bug"). Most other OCR
software has the same problem as the platform we chose, but there is one
that seems to convert to SWF just fine. In an attempt to find out what the
difference was between the two files, I tried to use the Tree Viewer from
iText to examine the contents of the files. However, when I select the
Content node of the one that gets the text stripped out, I don't see
anything. If I use the API to try to extract the Stream directly, I get a
So I guess I really have two questions.
1) Is there something wrong with how the PDF is constructed that we cannot
examine the text content with iText, or is there a bug in iText?
2) Is there a way we can manipulate the PDF from the OCR software we chose
to make it structurally look like the one that actually keeps the text when
converted to SWF?
I'm attaching a copy of the two files (0112_094_no_text_select.pdf from our
selected OCR product, which we cannot view the text content, and
0112_094_text_select.pdf from the other product, which we CAN view the text
content, and actually keeps the text in the SWF) in a zip file.
OK, it seems I can't attach a file, or the message gets refused. I've
uploaded it to http://www.sharebigfile.com/file/116699/0112-094-zip.html
i'm making a difference. Make every IM count for the cause of your choice.