A Cheung
2010-04-27 07:48:09 UTC
Hello,
I have spent quite a while investigating the automated removal/replacement
of images in a PDF. (ex: 100 page pdf, 1 image per page needs replacing, but
each page also has text which I want to keep)
Strategy 1:
http://www.opensubscriber.com/message/itext-***@lists.sourceforge.net/4608002.html
(from 2006)
Find the XObject, KillIindirect() it, remove it from the XObject Dictionary.
But when you open this file in Acrobat it says there are still references to
it.
I don't know how to cleanly remove them from the Content stream (I didn't
add them so they aren't Marked like that old message suggests).
Conceptually, it (Im0 in this case) is somewhere in one of the Arrays of the
Page's Contents like "q 612.2400055 0 0 792 0 0 cm /Im0 Do Q" I don't know
what is safe to remove, how to remove it or what I'm looking for, besides
the /Im0 in this one case.
Strategy 2: Replace it,
Similar to http://1t3xt.info/examples/browse/?page=example&id=421
But the original image was a 2 bit (black/white = CCITTFAXDECODE) and the
new image is a 256 color tiff (FLATEDECODE) (or maybe some other # of
colors, or some DCTDECODE filter type) and this seems to need a custom
Colorspace that I don't know how to create.
PdfContentByte p = stamper.getOverContent(1);
p.addImage(Image.getInstance("my.tiff"), width,0,0,height,0,0) will create
the image object with this Colorspace but (a) I'm stuck wth the original
that I can't remove (and setting the original's width/height to 0 and its
setData to an empty byte[] causes Acrobat to complain when it is opened)
and (b) my images never seem to be the right size (due to DPI I assume, my
images aren't 72dpi, but even with img.setDpi(300,300) they don't show up
right but I think I can eventually get this part right)
Is there a way to use this Image.getinstance object (and its
colorspace/+other fields) as a template to modify the Xobject of the
original image that I want to get rid of?
I hope I'm making sense. So much time on this one problem have given me
tunnel vision and I might be leaving some things out.
Is there some newer process that makes this easy? Snippets of existing
code?
I have spent quite a while investigating the automated removal/replacement
of images in a PDF. (ex: 100 page pdf, 1 image per page needs replacing, but
each page also has text which I want to keep)
Strategy 1:
http://www.opensubscriber.com/message/itext-***@lists.sourceforge.net/4608002.html
(from 2006)
Find the XObject, KillIindirect() it, remove it from the XObject Dictionary.
But when you open this file in Acrobat it says there are still references to
it.
I don't know how to cleanly remove them from the Content stream (I didn't
add them so they aren't Marked like that old message suggests).
Conceptually, it (Im0 in this case) is somewhere in one of the Arrays of the
Page's Contents like "q 612.2400055 0 0 792 0 0 cm /Im0 Do Q" I don't know
what is safe to remove, how to remove it or what I'm looking for, besides
the /Im0 in this one case.
Strategy 2: Replace it,
Similar to http://1t3xt.info/examples/browse/?page=example&id=421
But the original image was a 2 bit (black/white = CCITTFAXDECODE) and the
new image is a 256 color tiff (FLATEDECODE) (or maybe some other # of
colors, or some DCTDECODE filter type) and this seems to need a custom
Colorspace that I don't know how to create.
PdfContentByte p = stamper.getOverContent(1);
p.addImage(Image.getInstance("my.tiff"), width,0,0,height,0,0) will create
the image object with this Colorspace but (a) I'm stuck wth the original
that I can't remove (and setting the original's width/height to 0 and its
setData to an empty byte[] causes Acrobat to complain when it is opened)
and (b) my images never seem to be the right size (due to DPI I assume, my
images aren't 72dpi, but even with img.setDpi(300,300) they don't show up
right but I think I can eventually get this part right)
Is there a way to use this Image.getinstance object (and its
colorspace/+other fields) as a template to modify the Xobject of the
original image that I want to get rid of?
I hope I'm making sense. So much time on this one problem have given me
tunnel vision and I might be leaving some things out.
Is there some newer process that makes this easy? Snippets of existing
code?