[iText-questions] Replacing and removing images

Discussion:

A Cheung

2010-04-27 07:48:09 UTC

Hello,
I have spent quite a while investigating the automated removal/replacement
of images in a PDF. (ex: 100 page pdf, 1 image per page needs replacing, but
each page also has text which I want to keep)

Strategy 1:
http://www.opensubscriber.com/message/itext-***@lists.sourceforge.net/4608002.html
(from 2006)

Find the XObject, KillIindirect() it, remove it from the XObject Dictionary.
But when you open this file in Acrobat it says there are still references to
it.
I don't know how to cleanly remove them from the Content stream (I didn't
add them so they aren't Marked like that old message suggests).
Conceptually, it (Im0 in this case) is somewhere in one of the Arrays of the
Page's Contents like "q 612.2400055 0 0 792 0 0 cm /Im0 Do Q" I don't know
what is safe to remove, how to remove it or what I'm looking for, besides
the /Im0 in this one case.

Strategy 2: Replace it,

Similar to http://1t3xt.info/examples/browse/?page=example&id=421

But the original image was a 2 bit (black/white = CCITTFAXDECODE) and the
new image is a 256 color tiff (FLATEDECODE) (or maybe some other # of
colors, or some DCTDECODE filter type) and this seems to need a custom
Colorspace that I don't know how to create.
PdfContentByte p = stamper.getOverContent(1);
p.addImage(Image.getInstance("my.tiff"), width,0,0,height,0,0) will create
the image object with this Colorspace but (a) I'm stuck wth the original
that I can't remove (and setting the original's width/height to 0 and its
setData to an empty byte[] causes Acrobat to complain when it is opened)
and (b) my images never seem to be the right size (due to DPI I assume, my
images aren't 72dpi, but even with img.setDpi(300,300) they don't show up
right but I think I can eventually get this part right)
Is there a way to use this Image.getinstance object (and its
colorspace/+other fields) as a template to modify the Xobject of the
original image that I want to get rid of?

I hope I'm making sense. So much time on this one problem have given me
tunnel vision and I might be leaving some things out.

Is there some newer process that makes this easy? Snippets of existing
code?

1T3XT info

2010-04-27 15:59:56 UTC

Permalink

Post by A Cheung
Hello,
I have spent quite a while investigating the
automated removal/replacement of images in a PDF. (ex: 100 page pdf, 1
image per page needs replacing, but each page also has text which I want
to keep)
Find the XObject, KillIindirect() it, remove it from the XObject Dictionary.

Won't work.

You can't remove an image stream if it's referred to, for instance
from a Resources dictionary of a page.

You can't remove an object /Im0 from the Resources dictionary,
if you don't remove the /Im0 Do from the content stream.

This is way to complicated for what you need.

Post by A Cheung
Strategy 2: Replace it,
Similar to http://1t3xt.info/examples/browse/?page=example&id=421
<http://1t3xt.info/examples/browse/?page=example&id=421>

This is a better strategy, but the example only works for JPEG images.

Post by A Cheung
Is there some newer process that makes this easy? Snippets of existing
code?

This is an update of the example you refer to:
http://itextpdf.com/examples/index.php?page=example&id=286

Instead of using the old image to "resize" the image, you need
to create the BufferedImage and use drawString() instead of
drawRenderedImage(). You may also want to create another type
of image, for instance a PNG instead of a JPEG, in which case
you'll need to use a different Filter and stuff.

Retrieving the text from the original image isn't possible.

--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

A Cheung

2010-04-27 16:52:21 UTC

Permalink

Post by A Cheung

Dictionary.
Won't work.
You can't remove an image stream if it's referred to, for instance
from a Resources dictionary of a page.
You can't remove an object /Im0 from the Resources dictionary,
if you don't remove the /Im0 Do from the content stream.
This is way to complicated for what you need.

Complicated yes, but is it theoretically possible with iText? Can it modify
these streams and save the modifications?
GUI Editors/Applications that do allow the removal of images must be doing
it somehow.

Post by A Cheung
Strategy 2: Replace it,
Similar to http://1t3xt.info/examples/browse/?page=example&id=421
<http://1t3xt.info/examples/browse/?page=example&id=421>

This is a better strategy, but the example only works for JPEG images.

Post by A Cheung

Post by A Cheung
Is there some newer process that makes this easy? Snippets of existing
code?

http://itextpdf.com/examples/index.php?page=example&id=286
Instead of using the old image to "resize" the image, you need
to create the BufferedImage and use drawString() instead of
drawRenderedImage(). You may also want to create another type
of image, for instance a PNG instead of a JPEG, in which case
you'll need to use a different Filter and stuff.
Retrieving the text from the original image isn't possible.

Thanks, but my question is then about Colorspaces. I am trying to keep the
PDF small so making images of 16M colours makes it too large.

PdfContentByte p = stamper.getOverContent(1);
p.addImage(Image.getInstance("my.tiff"), width,0,0,height,0,0)

The above code adds an image (Xi0) to my file with the Colorspace that is
correct. (It isn't DeviceGray or DeviceRGB). I use the Enfocus Browser to
see the PDF object tree structure and this new "Xi0" image has a ColorSpace
array[4] and item [3] is the stream data for the colorspace of 256 specific
colors in use. What I'd like to do is have iText create this Colorspace
structure for me based on an image file (along with
Filter/Width/BitsPerComponent) so I can replace my original Im0 structure
with that data (as in the sample at your URL), but _without_ having to add
an Xi0 to the PDF to do it since that just leaves me with another image that
I'd want to remove.

Possible?

1T3XT info

2010-04-28 07:03:48 UTC

Permalink

Post by A Cheung
Complicated yes, but is it theoretically possible with iText? Can it
modify these streams and save the modifications?

Yes, it's possible, but it's a lot of work.

Post by A Cheung
Thanks, but my question is then about Colorspaces. I am trying to keep
the PDF small so making images of 16M colours makes it too large.

I didn't say you had to use an image if 16M colors.
You can easily use a black and white image.
You probably want to use CCITT.

Post by A Cheung
PdfContentByte p = stamper.getOverContent(1);
p.addImage(Image.getInstance("my.tiff"), width,0,0,height,0,0)
The above code adds an image (Xi0) to my file with the Colorspace that
is correct. (It isn't DeviceGray or DeviceRGB). I use the Enfocus
Browser to see the PDF object tree structure and this new "Xi0" image
has a ColorSpace array[4] and item [3] is the stream data for the
colorspace of 256 specific colors in use.

And Indexed Colorspace is fine too.

Post by A Cheung
What I'd like to do is have
iText create this Colorspace structure for me based on an image
file (along with Filter/Width/BitsPerComponent) so I can replace my
original Im0 structure with that data (as in the sample at your URL),
but _without_ having to add an Xi0 to the PDF to do it since that just
leaves me with another image that I'd want to remove.
Possible?

Yes.

--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

distes

2015-08-19 13:54:31 UTC

Permalink

I know this is an old post but I wanted to post the way I got this to work.
No one posting was very helpful.

This is my code using ItextSharp. I hope it's helpful to someone. I needed
to move an existing barcode on the page. Instead what you do is copy the
barcode, delete the existing barcode and place a copy back on the page at an
absolute position.

public class BarcodeMover
{
public void MoveDuploBarcode(string inputfile, string outputfile)
{
using (FileStream outputstreampdf = new FileStream(outputfile,
FileMode.Create))
{
using (PdfReader inputstreampdf = new PdfReader(inputfile))
{
PdfStamper pdfstamper = new PdfStamper(inputstreampdf,
outputstreampdf);

for (int pagenum = 1; pagenum <=
inputstreampdf.NumberOfPages; pagenum++)
{
if (IsEven(pagenum))
{
using (MemoryStream pagememorystream = new
MemoryStream())
{
PdfDictionary pagexobjects =
GetAllXObjectsDictionaryFromPage(inputstreampdf, pagenum);
PdfContentParser pagecontentparser =
GetContentParserForPage(inputstreampdf, pagenum);

PdfName barcodestreamobject = null;

while (true)
{
List<PdfObject> currentstreamobjects =
GetNextSectionOfContent(pagecontentparser);
if (currentstreamobjects.Count == 0)
{
break;
}

bool ismatrixbarcode =
DoesStreamContainMatrixBarcode(currentstreamobjects, pagexobjects);

if (ismatrixbarcode)
{
barcodestreamobject =
(PdfName)currentstreamobjects.First();
}
else
{

WriteToMemoryStream(currentstreamobjects, pagememorystream);
}
}

if (barcodestreamobject != null)
{
PdfObject barcodeobject =
pagexobjects.Get((PdfName)barcodestreamobject);
PdfDictionary xobjectdictionary =
(PdfDictionary)PdfReader.GetPdfObject(barcodeobject);

int xrefIdx =
((PRIndirectReference)barcodeobject).Number;
PdfObject pdfObj =
inputstreampdf.GetPdfObject(xrefIdx);
PdfStream streamobject =
(PdfStream)pdfObj;
byte[] imagestream =
PdfReader.GetStreamBytesRaw((PRStream)streamobject);

PdfReader.KillIndirect(barcodeobject);

ImgCCITT timg =
BuildNewImage(xobjectdictionary, imagestream, inputstreampdf, pagenum);

PlaceNewImageOnPage(pdfstamper, pagenum,
pagememorystream, timg);

barcodestreamobject = null;
}
}
}
}

pdfstamper.Close();
}
}
}

private List<PdfObject> GetNextSectionOfContent(PdfContentParser
pagecontentparser)
{
return pagecontentparser.Parse(null);
}

private bool IsEven(int pagenum)
{
return pagenum % 2 == 0;
}

private void WriteToMemoryStream(List<PdfObject> pagecontentobjects,
MemoryStream memoryStream)
{
foreach (PdfObject o in pagecontentobjects)
{
o.ToPdf(null, memoryStream);
memoryStream.WriteByte((byte)'\n');
}
}

private PdfDictionary GetAllXObjectsDictionaryFromPage(PdfReader
pdfreader, int pagenum)
{
PdfDictionary pagedictionary = pdfreader.GetPageN(pagenum);
PdfDictionary pageresources =
(PdfDictionary)PdfReader.GetPdfObject(pagedictionary.Get(PdfName.RESOURCES));
return
(PdfDictionary)PdfReader.GetPdfObject(pageresources.Get(PdfName.XOBJECT));
}

private PdfContentParser GetContentParserForPage(PdfReader
pdfReader, int pagenum)
{
byte[] pagecontentstream = pdfReader.GetPageContent(pagenum);
return new PdfContentParser(new PRTokeniser(new
RandomAccessFileOrArray(pagecontentstream)));
}

private void PlaceNewImageOnPage(PdfStamper pdfStamper, int i,
MemoryStream memoryStream, ImgCCITT timg)
{
pdfStamper.Reader.SetPageContent(i, memoryStream.GetBuffer());
pdfStamper.GetOverContent(i).AddImage(timg);
}

private ImgCCITT BuildNewImage(PdfDictionary tg, byte[] bytes,
PdfReader pdfReader, int i)
{
double width =
Convert.ToInt32(tg.Get(PdfName.WIDTH).ToString());
double height =
Convert.ToInt32(tg.Get(PdfName.HEIGHT).ToString());

ImgCCITT timg = new ImgCCITT((int)width, (int)height, false,
ImgCCITT.CCITTG4, ImgCCITT.CCITT_ENDOFBLOCK, bytes);
timg.ScaleToFit(24, 24);
timg.SetAbsolutePosition(0, pdfReader.GetPageSize(i).Top - 140);

return timg;
}

private bool DoesStreamContainMatrixBarcode(List<PdfObject>
contentobjects, PdfDictionary pagexobjects)
{
if ("Do".Equals(contentobjects.Last().ToString()) &&
contentobjects.First().ToString().Contains("img"))
{
PdfObject possibleobject =
pagexobjects.Get((PdfName)contentobjects.First());
if (possibleobject.IsIndirect())
{
if (possibleobject != null)
{
PdfDictionary xobjectdictionary =
(PdfDictionary)PdfReader.GetPdfObject(possibleobject);
PdfName type =
(PdfName)PdfReader.GetPdfObject(xobjectdictionary.Get(PdfName.SUBTYPE));
if (PdfName.IMAGE.Equals(type))
{
if
(xobjectdictionary.Get(PdfName.FILTER).ToString() == "/CCITTFaxDecode")
{
return true;
}
}
}
}
}

return false;
}
}

--
View this message in context: http://itext.2136553.n4.nabble.com/Replacing-and-removing-images-tp2165841p4660872.html
Sent from the iText mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php