Discussion:
What is required to make a file PDF/A?
(too old to reply)
David Thielen
2011-11-17 14:20:22 UTC
Permalink
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave
TvT
2011-11-17 15:12:19 UTC
Permalink
1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to
consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO
19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta
information. Probably if you set that one acrobat will say its PDF/A. A
better check is the PDF/A preflight check acrobat professional is offering.
It shows you which part of the spec you are missing. If all tests pass then
you probably have a 95% compliant PDF/A document.

Regards,
ToM
Post by David Thielen
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.
thanks - dave
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/
http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2011-11-17 15:43:54 UTC
Permalink
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by non-technical people.

What you really mean to say is "no externally referenced resources/assets".

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM

2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
TvT
2011-11-17 16:09:09 UTC
Permalink
What you really mean to say is “no externally referenced resources/assets”.
Yes, excactly :-)

** **
Leonard****
** **
*Sent:* Thursday, November 17, 2011 7:12 AM
*To:* Post all your questions about iText here
*Subject:* Re: [iText-questions] What is required to make a file PDF/A?***
*
** **
PDF/A-1a or PDF/A-1b or PDF/A-2?
2. Even if you take the simplest PDF/A-1b there is lots of stuff to
consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO
19005-2:2011) No javascript, no external referencing, colors etc etc.
3. What probably acrobat is looking at is the PDF/A tag in the meta
information. Probably if you set that one acrobat will say its PDF/A. A
better check is the PDF/A preflight check acrobat professional is offering.
It shows you which part of the spec you are missing. If all tests pass then
you probably have a 95% compliant PDF/A document.
Regards,
ToM
****
I thought it was just embedding fonts but when we do that Acrobat says it
is not PDF/A.****
****
thanks - dave****
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/
http://itextpdf.com/themes/keywords.php****
** **
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/
http://itextpdf.com/themes/keywords.php
David Thielen
2011-11-17 16:12:38 UTC
Permalink
I didn't know about pre-flight, that's cool.

Ok, ran my iText generated document through it and got the following:

* Convert to PDF/A-1a (sRGB)

* Convert to PDF/A-1b (sRGB)

I then double clicked on "Verify compliance with PDF/A-1a" and got a lot:

* Author mismatch between Document Info and XMP Metadata

* CIDset in subset font missing (238 matches on 4 pages)

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* MarkInfo missing

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

* Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. I'm guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd party app?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by non-technical people.

What you really mean to say is "no externally referenced resources/assets".

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2011-11-17 15:42:27 UTC
Permalink
If only it were that simple...

Embedded Fonts that meet the requirements (CIDSet, CharSet, Width matching, etc.)
Calibrated Colors including an OutputIntent
Limited Actions
Limited Annots
And the list goes on....

From: David Thielen [mailto:***@windward.net]
Sent: Thursday, November 17, 2011 6:20 AM
To: itext-***@lists.sourceforge.net
Subject: [iText-questions] What is required to make a file PDF/A?

I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave
Leonard Rosenthol
2011-11-17 16:20:18 UTC
Permalink
If you are creating the PDF ENTIRELY with iText – then you can just use the setConformance() API and it will take care of the details for you. If you are starting with an existing PDF – then there aren't any options for iText at this time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content – something that iText will not (currently) do for you. So you will need to do all that work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either – but most folks aren't there just yet…

Leonard

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn’t know about pre-flight, that’s cool.

Ok, ran my iText generated document through it and got the following:

· Convert to PDF/A-1a (sRGB)

· Convert to PDF/A-1b (sRGB)

I then double clicked on “Verify compliance with PDF/A-1a” and got a lot:

· Author mismatch between Document Info and XMP Metadata

· CIDset in subset font missing (238 matches on 4 pages)

· Creation date mismatch between Document Info and XMP Metadata

· Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

· Last Modification Date mismatch between Document Info and XMP Metadata

· MarkInfo missing

· Metadata missing (XMP)

· PDF/A entry missing

· Producer mismatch between Document Info and XMP Metadata

· Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. I’m guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd party app?

Thanks – dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it’s led to misunderstanding by non-technical people.

What you really mean to say is “no externally referenced resources/assets”.

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
David Thielen
2011-11-17 17:33:17 UTC
Permalink
Hi;

Is setConformance() in iText 5 only? We're on iText 2 and I can't find it anywhere. We do use iText to create the PDF so we should be ok on that part.

Thanks - dave


From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

If you are creating the PDF ENTIRELY with iText - then you can just use the setConformance() API and it will take care of the details for you. If you are starting with an existing PDF - then there aren't any options for iText at this time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content - something that iText will not (currently) do for you. So you will need to do all that work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either - but most folks aren't there just yet...

Leonard

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn't know about pre-flight, that's cool.

Ok, ran my iText generated document through it and got the following:

* Convert to PDF/A-1a (sRGB)

* Convert to PDF/A-1b (sRGB)

I then double clicked on "Verify compliance with PDF/A-1a" and got a lot:

* Author mismatch between Document Info and XMP Metadata

* CIDset in subset font missing (238 matches on 4 pages)

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* MarkInfo missing

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

* Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. I'm guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd party app?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by non-technical people.

What you really mean to say is "no externally referenced resources/assets".

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
David Thielen
2011-11-18 16:12:16 UTC
Permalink
I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);

It's a lot closer now but I still get:

* Author mismatch between Document Info and XMP Metadata

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

Are these expected? Or do I need to set something else also?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

If you are creating the PDF ENTIRELY with iText - then you can just use the setConformance() API and it will take care of the details for you. If you are starting with an existing PDF - then there aren't any options for iText at this time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content - something that iText will not (currently) do for you. So you will need to do all that work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either - but most folks aren't there just yet...

Leonard

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn't know about pre-flight, that's cool.

Ok, ran my iText generated document through it and got the following:

* Convert to PDF/A-1a (sRGB)

* Convert to PDF/A-1b (sRGB)

I then double clicked on "Verify compliance with PDF/A-1a" and got a lot:

* Author mismatch between Document Info and XMP Metadata

* CIDset in subset font missing (238 matches on 4 pages)

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* MarkInfo missing

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

* Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. I'm guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd party app?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by non-technical people.

What you really mean to say is "no externally referenced resources/assets".

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2011-11-17 17:52:37 UTC
Permalink
2.x. Maybe it's setPDFAConformance.

However, as you know, 2.x is no longer supported.

Leonard

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 09:33:17 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

Hi;

Is setConformance() in iText 5 only? We’re on iText 2 and I can’t find it anywhere. We do use iText to create the PDF so we should be ok on that part.

Thanks – dave


From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

If you are creating the PDF ENTIRELY with iText – then you can just use the setConformance() API and it will take care of the details for you. If you are starting with an existing PDF – then there aren't any options for iText at this time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content – something that iText will not (currently) do for you. So you will need to do all that work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either – but most folks aren't there just yet…

Leonard

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn’t know about pre-flight, that’s cool.

Ok, ran my iText generated document through it and got the following:

· Convert to PDF/A-1a (sRGB)

· Convert to PDF/A-1b (sRGB)

I then double clicked on “Verify compliance with PDF/A-1a” and got a lot:

· Author mismatch between Document Info and XMP Metadata

· CIDset in subset font missing (238 matches on 4 pages)

· Creation date mismatch between Document Info and XMP Metadata

· Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

· Last Modification Date mismatch between Document Info and XMP Metadata

· MarkInfo missing

· Metadata missing (XMP)

· PDF/A entry missing

· Producer mismatch between Document Info and XMP Metadata

· Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. I’m guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd party app?

Thanks – dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it’s led to misunderstanding by non-technical people.

What you really mean to say is “no externally referenced resources/assets”.

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2011-11-18 16:49:21 UTC
Permalink
PDFAConformance. PDF/X is a completely separate standard!

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Fri, 18 Nov 2011 08:12:16 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);

It’s a lot closer now but I still get:

· Author mismatch between Document Info and XMP Metadata

· Creation date mismatch between Document Info and XMP Metadata

· Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

· Last Modification Date mismatch between Document Info and XMP Metadata

· Metadata missing (XMP)

· PDF/A entry missing

· Producer mismatch between Document Info and XMP Metadata

Are these expected? Or do I need to set something else also?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

If you are creating the PDF ENTIRELY with iText – then you can just use the setConformance() API and it will take care of the details for you. If you are starting with an existing PDF – then there aren't any options for iText at this time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content – something that iText will not (currently) do for you. So you will need to do all that work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either – but most folks aren't there just yet…

Leonard

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn’t know about pre-flight, that’s cool.

Ok, ran my iText generated document through it and got the following:

· Convert to PDF/A-1a (sRGB)

· Convert to PDF/A-1b (sRGB)

I then double clicked on “Verify compliance with PDF/A-1a” and got a lot:

· Author mismatch between Document Info and XMP Metadata

· CIDset in subset font missing (238 matches on 4 pages)

· Creation date mismatch between Document Info and XMP Metadata

· Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

· Last Modification Date mismatch between Document Info and XMP Metadata

· MarkInfo missing

· Metadata missing (XMP)

· PDF/A entry missing

· Producer mismatch between Document Info and XMP Metadata

· Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. I’m guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd party app?

Thanks – dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it’s led to misunderstanding by non-technical people.

What you really mean to say is “no externally referenced resources/assets”.

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
David Thielen
2011-11-18 17:48:16 UTC
Permalink
I understand that. However PdfWriter only has setPDFXConformance(), no setConformance() or setPDFAConformance(). And the documentation for setPDFXConformance says:

Sets the PDF/X conformance level. Allowed values are PDFX1A2001, PDFX32002, PDFA1A and PDFA1B. It must be called before opening the document.

I'm clearly missing something but going through the javadoc I can't find anything else that discusses PDF/A.

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Friday, November 18, 2011 9:49 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

PDFAConformance. PDF/X is a completely separate standard!

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Fri, 18 Nov 2011 08:12:16 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);

It's a lot closer now but I still get:

* Author mismatch between Document Info and XMP Metadata

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

Are these expected? Or do I need to set something else also?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

If you are creating the PDF ENTIRELY with iText - then you can just use the setConformance() API and it will take care of the details for you. If you are starting with an existing PDF - then there aren't any options for iText at this time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content - something that iText will not (currently) do for you. So you will need to do all that work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either - but most folks aren't there just yet...

Leonard

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn't know about pre-flight, that's cool.

Ok, ran my iText generated document through it and got the following:

* Convert to PDF/A-1a (sRGB)

* Convert to PDF/A-1b (sRGB)

I then double clicked on "Verify compliance with PDF/A-1a" and got a lot:

* Author mismatch between Document Info and XMP Metadata

* CIDset in subset font missing (238 matches on 4 pages)

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* MarkInfo missing

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

* Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. I'm guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd party app?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by non-technical people.

What you really mean to say is "no externally referenced resources/assets".

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2011-11-18 18:08:14 UTC
Permalink
Then I am guessing whatever version of iText you are using is too old :(.

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Fri, 18 Nov 2011 09:48:16 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I understand that. However PdfWriter only has setPDFXConformance(), no setConformance() or setPDFAConformance(). And the documentation for setPDFXConformance says:

Sets the PDF/X conformance level. Allowed values are PDFX1A2001, PDFX32002, PDFA1A and PDFA1B. It must be called before opening the document.

I’m clearly missing something but going through the javadoc I can’t find anything else that discusses PDF/A.

Thanks – dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Friday, November 18, 2011 9:49 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

PDFAConformance. PDF/X is a completely separate standard!

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Fri, 18 Nov 2011 08:12:16 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);

It’s a lot closer now but I still get:

· Author mismatch between Document Info and XMP Metadata

· Creation date mismatch between Document Info and XMP Metadata

· Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

· Last Modification Date mismatch between Document Info and XMP Metadata

· Metadata missing (XMP)

· PDF/A entry missing

· Producer mismatch between Document Info and XMP Metadata

Are these expected? Or do I need to set something else also?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

If you are creating the PDF ENTIRELY with iText – then you can just use the setConformance() API and it will take care of the details for you. If you are starting with an existing PDF – then there aren't any options for iText at this time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content – something that iText will not (currently) do for you. So you will need to do all that work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either – but most folks aren't there just yet…

Leonard

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn’t know about pre-flight, that’s cool.

Ok, ran my iText generated document through it and got the following:

· Convert to PDF/A-1a (sRGB)

· Convert to PDF/A-1b (sRGB)

I then double clicked on “Verify compliance with PDF/A-1a” and got a lot:

· Author mismatch between Document Info and XMP Metadata

· CIDset in subset font missing (238 matches on 4 pages)

· Creation date mismatch between Document Info and XMP Metadata

· Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

· Last Modification Date mismatch between Document Info and XMP Metadata

· MarkInfo missing

· Metadata missing (XMP)

· PDF/A entry missing

· Producer mismatch between Document Info and XMP Metadata

· Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. I’m guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd party app?

Thanks – dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it’s led to misunderstanding by non-technical people.

What you really mean to say is “no externally referenced resources/assets”.

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
David Thielen
2011-11-18 22:48:07 UTC
Permalink
The javadoc for 5.1.3 is the same - just setPDFXConformance(). I think that is how it's set. But maybe version 2 didn't fully implement it.

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Friday, November 18, 2011 11:08 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

Then I am guessing whatever version of iText you are using is too old :(.

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Fri, 18 Nov 2011 09:48:16 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I understand that. However PdfWriter only has setPDFXConformance(), no setConformance() or setPDFAConformance(). And the documentation for setPDFXConformance says:

Sets the PDF/X conformance level. Allowed values are PDFX1A2001, PDFX32002, PDFA1A and PDFA1B. It must be called before opening the document.

I'm clearly missing something but going through the javadoc I can't find anything else that discusses PDF/A.

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Friday, November 18, 2011 9:49 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

PDFAConformance. PDF/X is a completely separate standard!

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Fri, 18 Nov 2011 08:12:16 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);

It's a lot closer now but I still get:

* Author mismatch between Document Info and XMP Metadata

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

Are these expected? Or do I need to set something else also?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

If you are creating the PDF ENTIRELY with iText - then you can just use the setConformance() API and it will take care of the details for you. If you are starting with an existing PDF - then there aren't any options for iText at this time.

But that will ONLY get you PDF/A-1b.

PDF/A-1a requires that you properly structure & tag your content - something that iText will not (currently) do for you. So you will need to do all that work yourself if you want full conformance.

And then there's PDF/A-2, which iText doesn't currently support either - but most folks aren't there just yet...

Leonard

From: David Thielen <***@windward.net<mailto:***@windward.net>>
Reply-To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Date: Thu, 17 Nov 2011 08:12:38 -0800
To: Post here <itext-***@lists.sourceforge.net<mailto:itext-***@lists.sourceforge.net>>
Subject: Re: [iText-questions] What is required to make a file PDF/A?

I didn't know about pre-flight, that's cool.

Ok, ran my iText generated document through it and got the following:

* Convert to PDF/A-1a (sRGB)

* Convert to PDF/A-1b (sRGB)

I then double clicked on "Verify compliance with PDF/A-1a" and got a lot:

* Author mismatch between Document Info and XMP Metadata

* CIDset in subset font missing (238 matches on 4 pages)

* Creation date mismatch between Document Info and XMP Metadata

* Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)

* Last Modification Date mismatch between Document Info and XMP Metadata

* MarkInfo missing

* Metadata missing (XMP)

* PDF/A entry missing

* Producer mismatch between Document Info and XMP Metadata

* Structured PDF: Structure tree root entry missing

The biggies seem to be the CIDset for the fonts and colors stored correctly. I'm guessing this is not a simple couple of hours to add in.

Is there a setting in iText to set these values or does this require a 3rd party app?

Thanks - dave



From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by non-technical people.

What you really mean to say is "no externally referenced resources/assets".

Leonard

From: TvT [mailto:***@nepatec.de]
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

1. That depends which PDF/A you mean:
PDF/A-1a or PDF/A-1b or PDF/A-2?

2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.

3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.

Regards,
ToM
2011/11/17 David Thielen <***@windward.net<mailto:***@windward.net>>
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.

thanks - dave

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
TvT
2011-11-21 10:42:41 UTC
Permalink
Hi,

here's an example:
http://itextpdf.com/examples/iia.php?id=226

And you are right, the method to use is called 'setPDFXConformance':
(Probably historical reasons?)
writer.setPDFXConformance(PdfWriter.PDFX1A2001);

You should better upgrade to the latest version. If you can't do that
try to use at least 2.1.7.
I checked the history of iText since 2.1.7 and didn't find (major)
changes in the PDF/A area.
But maybe i overlooked it. Best you check for yourself again:
http://itextpdf.com/history/

Also try to run your code in the current version and verify the output...
The javadoc for 5.1.3 is the same – just setPDFXConformance(). I think that is how it’s set. But maybe version 2 didn’t fully implement it.
Thanks – dave
Sent: Friday, November 18, 2011 11:08 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Then I am guessing whatever version of iText you are using is too old :(.
Date: Fri, 18 Nov 2011 09:48:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Sets the PDF/X conformance level. Allowed values are PDFX1A2001, PDFX32002, PDFA1A and PDFA1B. It must be called before opening the document.
I’m clearly missing something but going through the javadoc I can’t find anything else that discusses PDF/A.
Thanks – dave
Sent: Friday, November 18, 2011 9:49 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDFAConformance.  PDF/X is a completely separate standard!
Date: Fri, 18 Nov 2011 08:12:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);
·         Author mismatch between Document Info and XMP Metadata
·         Creation date mismatch between Document Info and XMP Metadata
·         Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)
·         Last Modification Date mismatch between Document Info and XMP Metadata
·         Metadata missing (XMP)
·         PDF/A entry missing
·         Producer mismatch between Document Info and XMP Metadata
Are these expected? Or do I need to set something else also?
Thanks - dave
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
If you are creating the PDF ENTIRELY with iText – then you can just use the setConformance() API and it will take care of the details for you.  If you are starting with an existing PDF – then there aren't any options for iText at this time.
But that will ONLY get you PDF/A-1b.
PDF/A-1a requires that you properly structure & tag your content – something that iText will not (currently) do for you.  So you will need to do all that work yourself if you want full conformance.
And then there's PDF/A-2, which iText doesn't currently support either – but most folks aren't there just yet…
Leonard
Date: Thu, 17 Nov 2011 08:12:38 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I didn’t know about pre-flight, that’s cool.
·         Convert to PDF/A-1a (sRGB)
·         Convert to PDF/A-1b (sRGB)
·         Author mismatch between Document Info and XMP Metadata
·         CIDset in subset font missing (238 matches on 4 pages)
·         Creation date mismatch between Document Info and XMP Metadata
·         Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)
·         Last Modification Date mismatch between Document Info and XMP Metadata
·         MarkInfo missing
·         Metadata missing (XMP)
·         PDF/A entry missing
·         Producer mismatch between Document Info and XMP Metadata
·         Structured PDF: Structure tree root entry missing
The biggies seem to be the CIDset for the fonts and colors stored correctly. I’m guessing this is not a simple couple of hours to add in.
Is there a setting in iText to set these values or does this require a 3rd party app?
Thanks – dave
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it’s led to misunderstanding by non-technical people.
What you really mean to say is “no externally referenced resources/assets”.
Leonard
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDF/A-1a or PDF/A-1b or PDF/A-2?
2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.
3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.
Regards,
ToM
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.
thanks - dave
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
David Thielen
2011-11-21 16:02:41 UTC
Permalink
Thank you - this is getting me a lot further. Now I'm getting an exception saying that the RGB colorspace is not allowed. It looks like we have to convert to CMYK for the colorspace.

So question, is it better to always use CMYK or should we stick with RGB and only use CMYK if we need PDF/A output?

Thanks - dave


-----Original Message-----
From: TvT [mailto:***@nepatec.de]
Sent: Monday, November 21, 2011 3:43 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

Hi,

here's an example:
http://itextpdf.com/examples/iia.php?id=226

And you are right, the method to use is called 'setPDFXConformance':
(Probably historical reasons?)
writer.setPDFXConformance(PdfWriter.PDFX1A2001);

You should better upgrade to the latest version. If you can't do that try to use at least 2.1.7.
I checked the history of iText since 2.1.7 and didn't find (major) changes in the PDF/A area.
But maybe i overlooked it. Best you check for yourself again:
http://itextpdf.com/history/

Also try to run your code in the current version and verify the output...
Post by David Thielen
The javadoc for 5.1.3 is the same - just setPDFXConformance(). I think that is how it's set. But maybe version 2 didn't fully implement it.
Thanks - dave
Sent: Friday, November 18, 2011 11:08 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Then I am guessing whatever version of iText you are using is too old :(.
Date: Fri, 18 Nov 2011 09:48:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Sets the PDF/X conformance level. Allowed values are PDFX1A2001, PDFX32002, PDFA1A and PDFA1B. It must be called before opening the document.
I'm clearly missing something but going through the javadoc I can't find anything else that discusses PDF/A.
Thanks - dave
Sent: Friday, November 18, 2011 9:49 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDFAConformance.  PDF/X is a completely separate standard!
Date: Fri, 18 Nov 2011 08:12:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);
·         Author mismatch between Document Info and XMP Metadata
·         Creation date mismatch between Document Info and XMP Metadata
·         Device process color used but no PDF/A OutputIntent (253
matches on 4 pages)
·         Last Modification Date mismatch between Document Info and XMP Metadata
·         Metadata missing (XMP)
·         PDF/A entry missing
·         Producer mismatch between Document Info and XMP Metadata
Are these expected? Or do I need to set something else also?
Thanks - dave
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
If you are creating the PDF ENTIRELY with iText - then you can just use the setConformance() API and it will take care of the details for you.  If you are starting with an existing PDF - then there aren't any options for iText at this time.
But that will ONLY get you PDF/A-1b.
PDF/A-1a requires that you properly structure & tag your content - something that iText will not (currently) do for you.  So you will need to do all that work yourself if you want full conformance.
And then there's PDF/A-2, which iText doesn't currently support either
- but most folks aren't there just yet.
Leonard
Date: Thu, 17 Nov 2011 08:12:38 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I didn't know about pre-flight, that's cool.
·         Convert to PDF/A-1a (sRGB)
·         Convert to PDF/A-1b (sRGB)
·         Author mismatch between Document Info and XMP Metadata
·         CIDset in subset font missing (238 matches on 4 pages)
·         Creation date mismatch between Document Info and XMP Metadata
·         Device process color used but no PDF/A OutputIntent (253
matches on 4 pages)
·         Last Modification Date mismatch between Document Info and XMP Metadata
·         MarkInfo missing
·         Metadata missing (XMP)
·         PDF/A entry missing
·         Producer mismatch between Document Info and XMP Metadata
·         Structured PDF: Structure tree root entry missing
The biggies seem to be the CIDset for the fonts and colors stored correctly. I'm guessing this is not a simple couple of hours to add in.
Is there a setting in iText to set these values or does this require a 3rd party app?
Thanks - dave
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by non-technical people.
What you really mean to say is "no externally referenced resources/assets".
Leonard
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDF/A-1a or PDF/A-1b or PDF/A-2?
2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.
3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.
Regards,
ToM
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.
thanks - dave
----------------------------------------------------------------------
-------- All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-14 15:34:37 UTC
Permalink
I know there are a lot of different Annotation types, and really I'm
interested in stamp/text annots, but in general, is there a way to open
a PDF and output the same PDF without any annotations?

I'm using RUPS to view a PDF that has some kind of text
annotation/stamp, but I can't find the actual object (even though it's a
one-page document and there aren't that many entries in RUPS at all!) to
determine exactly what my target object is, so forgive me for being a
little vague here.

-AJ
Leonard Rosenthol
2012-02-14 15:41:35 UTC
Permalink
Annots array on the Page dictionary. If you remove that object, you remove all annotations.

(you may also want to, for cleanliness, also remove any AcroForm dictionary off the Catalog object).

Leonard

-----Original Message-----
From: AJ Weber [mailto:***@comcast.net]
Sent: Tuesday, February 14, 2012 10:35 AM
To: itext-***@lists.sourceforge.net
Subject: [iText-questions] Strip Annotations?

I know there are a lot of different Annotation types, and really I'm interested in stamp/text annots, but in general, is there a way to open a PDF and output the same PDF without any annotations?

I'm using RUPS to view a PDF that has some kind of text annotation/stamp, but I can't find the actual object (even though it's a one-page document and there aren't that many entries in RUPS at all!) to determine exactly what my target object is, so forgive me for being a little vague here.

-AJ


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-03-28 20:51:25 UTC
Permalink
I'm back to this...sorry :)

OK, so I am taking an "original PDF" and copying the Annots array aside
by using annotInfo = pg1Dict.getAsArray(PdfName.ANNOTS)

But when I go to put it back on a "new PDF" (using reader/stamper), with:
PdfDictionary pg1Dict = reader.GetPageN(1);
pg1Dict.put(PdfName.ANNOTS, annotInfo);

I end up with a corrupt /Annots entry. I am obviously missing something
or some steps. Can anyone help with that (admittedly) limited amount of
info???

Thanks,
AJ
Post by Leonard Rosenthol
Annots array on the Page dictionary. If you remove that object, you remove all annotations.
(you may also want to, for cleanliness, also remove any AcroForm dictionary off the Catalog object).
Leonard
-----Original Message-----
Sent: Tuesday, February 14, 2012 10:35 AM
Subject: [iText-questions] Strip Annotations?
I know there are a lot of different Annotation types, and really I'm interested in stamp/text annots, but in general, is there a way to open a PDF and output the same PDF without any annotations?
I'm using RUPS to view a PDF that has some kind of text annotation/stamp, but I can't find the actual object (even though it's a one-page document and there aren't that many entries in RUPS at all!) to determine exactly what my target object is, so forgive me for being a little vague here.
-AJ
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-03-29 13:04:05 UTC
Permalink
I am thinking my problem is that I'm copying the existing /Annots array,
but it's not doing a "deep copy" of all the referenced annot
dictionaries? Plus, when I try to put it back, it looks like I'm
putting back an Array, but the original document actually had a
reference to the array (don't know if that matters that I accidentally
changed it from an indirect to direct reference)?

Can anyone point me in the right direction here?

Thanks,
AJ
Post by AJ Weber
I'm back to this...sorry :)
OK, so I am taking an "original PDF" and copying the Annots array
aside by using annotInfo = pg1Dict.getAsArray(PdfName.ANNOTS)
PdfDictionary pg1Dict = reader.GetPageN(1);
pg1Dict.put(PdfName.ANNOTS, annotInfo);
I end up with a corrupt /Annots entry. I am obviously missing
something or some steps. Can anyone help with that (admittedly)
limited amount of info???
Thanks,
AJ
Post by Leonard Rosenthol
Annots array on the Page dictionary. If you remove that object, you
remove all annotations.
(you may also want to, for cleanliness, also remove any AcroForm
dictionary off the Catalog object).
Leonard
-----Original Message-----
Sent: Tuesday, February 14, 2012 10:35 AM
Subject: [iText-questions] Strip Annotations?
I know there are a lot of different Annotation types, and really I'm
interested in stamp/text annots, but in general, is there a way to
open a PDF and output the same PDF without any annotations?
I'm using RUPS to view a PDF that has some kind of text
annotation/stamp, but I can't find the actual object (even though
it's a one-page document and there aren't that many entries in RUPS
at all!) to determine exactly what my target object is, so forgive me
for being a little vague here.
-AJ
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/
http://itextpdf.com/themes/keywords.php
1T3XT BVBA
2012-03-29 17:34:07 UTC
Permalink
Post by AJ Weber
I am thinking my problem is that I'm copying the existing /Annots array,
but it's not doing a "deep copy" of all the referenced annot
dictionaries?
You're not making a deep copy.
For instance: you're making a copy of an array like this:
[ 10 0 R 11 0 R 12 0 R ]
And somewhere in your document you have objects 10 0 obj, 11 0 obj, 12 0
obj followed by dictionaries that describe the annotation.
Post by AJ Weber
Plus, when I try to put it back, it looks like I'm
putting back an Array, but the original document actually had a
reference to the array (don't know if that matters that I accidentally
changed it from an indirect to direct reference)?
Now you're copying [ 10 0 R 11 0 R 12 0 R ] into another document, that
may or may not have objects 10 0 obj, 11 0 obj, 12 0 obj. 10 could be a
stream, 11 a font dictionary, 12 a page dictionary. In no way, such a
PDF will be OK.
Post by AJ Weber
Can anyone point me in the right direction here?
Sorry, no time for that. Too tired from the iText Summit (which was very
interesting by the way).
AJ Weber
2012-03-29 18:22:42 UTC
Permalink
1) Glad the summit was successful!!!

2) Thanks for confirming what I thought. So I have to copy the
individual PdfDictionaries (Annots), then basically add them back to the
new document. Which might make it just easier to copy the content and
create new ones, since I'm only concerned about TextBox/text annots
anyway, it makes the parsing of the data easier...

Thank you for taking the time to reply!
-AJ
Post by 1T3XT BVBA
Post by AJ Weber
I am thinking my problem is that I'm copying the existing /Annots array,
but it's not doing a "deep copy" of all the referenced annot
dictionaries?
You're not making a deep copy.
[ 10 0 R 11 0 R 12 0 R ]
And somewhere in your document you have objects 10 0 obj, 11 0 obj, 12 0
obj followed by dictionaries that describe the annotation.
Post by AJ Weber
Plus, when I try to put it back, it looks like I'm
putting back an Array, but the original document actually had a
reference to the array (don't know if that matters that I accidentally
changed it from an indirect to direct reference)?
Now you're copying [ 10 0 R 11 0 R 12 0 R ] into another document, that
may or may not have objects 10 0 obj, 11 0 obj, 12 0 obj. 10 could be a
stream, 11 a font dictionary, 12 a page dictionary. In no way, such a
PDF will be OK.
Post by AJ Weber
Can anyone point me in the right direction here?
Sorry, no time for that. Too tired from the iText Summit (which was very
interesting by the way).
------------------------------------------------------------------------------
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
1T3XT BVBA
2012-03-29 18:46:59 UTC
Permalink
Post by AJ Weber
1) Glad the summit was successful!!!
2) Thanks for confirming what I thought. So I have to copy the
individual PdfDictionaries (Annots), then basically add them back to the
new document. Which might make it just easier to copy the content and
create new ones, since I'm only concerned about TextBox/text annots
anyway, it makes the parsing of the data easier...
Another possibility is to take a look at the source code of classes such
as PdfCopyForm(Imp) and PdfCopyFields(Imp).
Although your approach may be easier to implement.
AJ Weber
2012-03-30 15:44:12 UTC
Permalink
Is there a simple way to copy the necessary data for a FreeText annot's
"defaultAppearance"?

Is this the stream in the /AP dictionary entry of an existing FreeText
annot?

Could I possibly copy that stream to a bytearray and then put it back?
The only entry inside any that I have explored (with RUPS) that might
cause difficulty is that there seems to be a Font "dictionary" indirect
reference - which surprises me a little, but I guess I could look-out
for that.

Thanks again,
AJ
Post by 1T3XT BVBA
Post by AJ Weber
1) Glad the summit was successful!!!
2) Thanks for confirming what I thought. So I have to copy the
individual PdfDictionaries (Annots), then basically add them back to the
new document. Which might make it just easier to copy the content and
create new ones, since I'm only concerned about TextBox/text annots
anyway, it makes the parsing of the data easier...
Another possibility is to take a look at the source code of classes such
as PdfCopyForm(Imp) and PdfCopyFields(Imp).
Although your approach may be easier to implement.
------------------------------------------------------------------------------
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-14 15:54:27 UTC
Permalink
Post by AJ Weber
I'm using RUPS to view a PDF that has some kind of text
annotation/stamp, but I can't find the actual object (even
Post by AJ Weber
though it's a one-page document and there aren't that many entries in
RUPS at all!) to determine exactly what my
Post by AJ Weber
target object is...
OK, I found the "offending text". It's in the Page's /Contents "stream"???

Background: the original PDF is a PDF/Image (actually created using
iText 2.1.4 by a third party, according to the metadata).

So is a user somehow inserting their comment-text as the plain-text
contents of the page (even though the original/image is still there)?
(I'm not entirely clear on what the /Contents stream is supposed to
traditionally be.)

I can't send the document, but I can take a screen-shot of the page's
subtree from RUPS if that helps (and I'm allowed to attach a jpeg or
something to the post).

Thanks again,
AJ
Leonard Rosenthol
2012-02-14 16:02:18 UTC
Permalink
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.

In which case, removal is MUCH harder (but not impossible)

-----Original Message-----
From: AJ Weber [mailto:***@comcast.net]
Sent: Tuesday, February 14, 2012 10:54 AM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Strip Annotations?
I'm using RUPS to view a PDF that has some kind of text annotation/stamp, but I can't find the actual object (even >though it's a one-page document and there aren't that many entries in RUPS at all!) to determine exactly what my >target object is...
OK, I found the "offending text". It's in the Page's /Contents "stream"???

Background: the original PDF is a PDF/Image (actually created using iText 2.1.4 by a third party, according to the metadata).

So is a user somehow inserting their comment-text as the plain-text contents of the page (even though the original/image is still there)?
(I'm not entirely clear on what the /Contents stream is supposed to traditionally be.)

I can't send the document, but I can take a screen-shot of the page's subtree from RUPS if that helps (and I'm allowed to attach a jpeg or something to the post).

Thanks again,
AJ

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-14 16:23:20 UTC
Permalink
Right. How do I determine if the text in the /Contents stream is the
intended contents of the page, or whether someone added it
after-the-fact? Ugh. Users... ;)
Post by Leonard Rosenthol
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.
In which case, removal is MUCH harder (but not impossible)
-----Original Message-----
Sent: Tuesday, February 14, 2012 10:54 AM
Subject: Re: [iText-questions] Strip Annotations?
I'm using RUPS to view a PDF that has some kind of text annotation/stamp, but I can't find the actual object (even>though it's a one-page document and there aren't that many entries in RUPS at all!) to determine exactly what my>target object is...
OK, I found the "offending text". It's in the Page's /Contents "stream"???
Background: the original PDF is a PDF/Image (actually created using iText 2.1.4 by a third party, according to the metadata).
So is a user somehow inserting their comment-text as the plain-text contents of the page (even though the original/image is still there)?
(I'm not entirely clear on what the /Contents stream is supposed to traditionally be.)
I can't send the document, but I can take a screen-shot of the page's subtree from RUPS if that helps (and I'm allowed to attach a jpeg or something to the post).
Thanks again,
AJ
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2012-02-14 16:29:21 UTC
Permalink
You come up with your own heuristics, unfortunately, since there is nothing "specific" in the PDF.

-----Original Message-----
From: AJ Weber [mailto:***@comcast.net]
Sent: Tuesday, February 14, 2012 11:23 AM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Strip Annotations?

Right. How do I determine if the text in the /Contents stream is the intended contents of the page, or whether someone added it after-the-fact? Ugh. Users... ;)
Post by Leonard Rosenthol
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.
In which case, removal is MUCH harder (but not impossible)
-----Original Message-----
Sent: Tuesday, February 14, 2012 10:54 AM
Subject: Re: [iText-questions] Strip Annotations?
I'm using RUPS to view a PDF that has some kind of text annotation/stamp, but I can't find the actual object (even>though it's a one-page document and there aren't that many entries in RUPS at all!) to determine exactly what my>target object is...
OK, I found the "offending text". It's in the Page's /Contents "stream"???
Background: the original PDF is a PDF/Image (actually created using iText 2.1.4 by a third party, according to the metadata).
So is a user somehow inserting their comment-text as the plain-text contents of the page (even though the original/image is still there)?
(I'm not entirely clear on what the /Contents stream is supposed to traditionally be.)
I can't send the document, but I can take a screen-shot of the page's subtree from RUPS if that helps (and I'm allowed to attach a jpeg or something to the post).
Thanks again,
AJ
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft
developers is just $99.99! Visual Studio, SharePoint, SQL - plus
HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-14 16:41:33 UTC
Permalink
Yeah, I think in my case, I might compare the "length" of the Contents
stream with the XObject stream (and see if they are drastically
different -- I know with compression they would never be identical).

In fact, I guess I could also check the /ProcSet for the /ImageB tag,
and it's likely that there will be only one /XObject with a stream in it
as the image of the whole page...
Post by Leonard Rosenthol
You come up with your own heuristics, unfortunately, since there is nothing "specific" in the PDF.
-----Original Message-----
Sent: Tuesday, February 14, 2012 11:23 AM
Subject: Re: [iText-questions] Strip Annotations?
Right. How do I determine if the text in the /Contents stream is the intended contents of the page, or whether someone added it after-the-fact? Ugh. Users... ;)
Post by Leonard Rosenthol
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.
In which case, removal is MUCH harder (but not impossible)
-----Original Message-----
Sent: Tuesday, February 14, 2012 10:54 AM
Subject: Re: [iText-questions] Strip Annotations?
I'm using RUPS to view a PDF that has some kind of text annotation/stamp, but I can't find the actual object (even>though it's a one-page document and there aren't that many entries in RUPS at all!) to determine exactly what my>target object is...
OK, I found the "offending text". It's in the Page's /Contents "stream"???
Background: the original PDF is a PDF/Image (actually created using iText 2.1.4 by a third party, according to the metadata).
So is a user somehow inserting their comment-text as the plain-text contents of the page (even though the original/image is still there)?
(I'm not entirely clear on what the /Contents stream is supposed to traditionally be.)
I can't send the document, but I can take a screen-shot of the page's subtree from RUPS if that helps (and I'm allowed to attach a jpeg or something to the post).
Thanks again,
AJ
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft
developers is just $99.99! Visual Studio, SharePoint, SQL - plus
HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2012-02-14 16:51:44 UTC
Permalink
ProcSets are optional (and deprecated) - so don't rely on those...

-----Original Message-----
From: AJ Weber [mailto:***@comcast.net]
Sent: Tuesday, February 14, 2012 11:42 AM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Strip Annotations?

Yeah, I think in my case, I might compare the "length" of the Contents stream with the XObject stream (and see if they are drastically different -- I know with compression they would never be identical).

In fact, I guess I could also check the /ProcSet for the /ImageB tag, and it's likely that there will be only one /XObject with a stream in it as the image of the whole page...
Post by Leonard Rosenthol
You come up with your own heuristics, unfortunately, since there is nothing "specific" in the PDF.
-----Original Message-----
Sent: Tuesday, February 14, 2012 11:23 AM
Subject: Re: [iText-questions] Strip Annotations?
Right. How do I determine if the text in the /Contents stream is the
intended contents of the page, or whether someone added it
after-the-fact? Ugh. Users... ;)
Post by Leonard Rosenthol
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.
In which case, removal is MUCH harder (but not impossible)
-----Original Message-----
Sent: Tuesday, February 14, 2012 10:54 AM
Subject: Re: [iText-questions] Strip Annotations?
I'm using RUPS to view a PDF that has some kind of text annotation/stamp, but I can't find the actual object (even>though it's a one-page document and there aren't that many entries in RUPS at all!) to determine exactly what my>target object is...
OK, I found the "offending text". It's in the Page's /Contents "stream"???
Background: the original PDF is a PDF/Image (actually created using iText 2.1.4 by a third party, according to the metadata).
So is a user somehow inserting their comment-text as the plain-text contents of the page (even though the original/image is still there)?
(I'm not entirely clear on what the /Contents stream is supposed to traditionally be.)
I can't send the document, but I can take a screen-shot of the page's subtree from RUPS if that helps (and I'm allowed to attach a jpeg or something to the post).
Thanks again,
AJ
---------------------------------------------------------------------
-
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
---------------------------------------------------------------------
-
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft
developers is just $99.99! Visual Studio, SharePoint, SQL - plus
HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft
developers is just $99.99! Visual Studio, SharePoint, SQL - plus
HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
1T3XT BVBA
2012-02-14 16:32:59 UTC
Permalink
Post by AJ Weber
Right. How do I determine if the text in the /Contents stream is the
intended contents of the page, or whether someone added it
after-the-fact? Ugh. Users...
It depends on how the text was added.

For instance: if they used PdfStamper, there will be different content
streams in an array.
Maybe the content you want to remove is in one array, which you could
remove.

Or maybe it was added as an XObject.
In that case, blanking out the content of that XObject would solve your
problem.

Or maybe it's really inside the main content stream.
In that case, you need to write a PDF syntax parser (I've written
several in the past) using PRTokeniser.
Let the parser copy all the PDF syntax except for the Text Operators
(and its operands) that draw the text you want to remove.

Have fun!
AJ Weber
2012-02-14 16:47:21 UTC
Permalink
In this case, as I said, I see that the original PDF was created with
iText. There is an XObject Dictionary with one Stream object ("/lm0" --
I'm verifying if that is always the name).

However, the end-user added their comment (and yes, I don't know why
they didn't use a specific comment tool) with some version of Acrobat --
that much I do know. I'm guessing they used the Text Edit tool or
something instead of the Commenting or Stamp options in that app.

Too bad we can't tell when a specific object was added to the file (I
CAN see that the modify date of the file is different from the create date).
Post by 1T3XT BVBA
Post by AJ Weber
Right. How do I determine if the text in the /Contents stream is the
intended contents of the page, or whether someone added it
after-the-fact? Ugh. Users...
It depends on how the text was added.
For instance: if they used PdfStamper, there will be different content
streams in an array.
Maybe the content you want to remove is in one array, which you could
remove.
Or maybe it was added as an XObject.
In that case, blanking out the content of that XObject would solve your
problem.
Or maybe it's really inside the main content stream.
In that case, you need to write a PDF syntax parser (I've written
several in the past) using PRTokeniser.
Let the parser copy all the PDF syntax except for the Text Operators
(and its operands) that draw the text you want to remove.
Have fun!
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2012-02-14 16:52:19 UTC
Permalink
You COULD look to see if there are update sections in the PDF - and then examine what's in the update section...

-----Original Message-----
From: AJ Weber [mailto:***@comcast.net]
Sent: Tuesday, February 14, 2012 11:47 AM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Strip Annotations?

In this case, as I said, I see that the original PDF was created with iText. There is an XObject Dictionary with one Stream object ("/lm0" -- I'm verifying if that is always the name).

However, the end-user added their comment (and yes, I don't know why they didn't use a specific comment tool) with some version of Acrobat -- that much I do know. I'm guessing they used the Text Edit tool or something instead of the Commenting or Stamp options in that app.

Too bad we can't tell when a specific object was added to the file (I CAN see that the modify date of the file is different from the create date).
Post by 1T3XT BVBA
Post by AJ Weber
Right. How do I determine if the text in the /Contents stream is the
intended contents of the page, or whether someone added it
after-the-fact? Ugh. Users...
It depends on how the text was added.
For instance: if they used PdfStamper, there will be different content
streams in an array.
Maybe the content you want to remove is in one array, which you could
remove.
Or maybe it was added as an XObject.
In that case, blanking out the content of that XObject would solve
your problem.
Or maybe it's really inside the main content stream.
In that case, you need to write a PDF syntax parser (I've written
several in the past) using PRTokeniser.
Let the parser copy all the PDF syntax except for the Text Operators
(and its operands) that draw the text you want to remove.
Have fun!
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft
developers is just $99.99! Visual Studio, SharePoint, SQL - plus
HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-14 17:03:15 UTC
Permalink
Do you know whether that would that be indicated in RUPS? I don't see
anything regarding a specific section for updates. I can see that there
are missing XRef entries (numerically), and that the Contents stream is
one of the last entries, whereas the lm0 stream is the first.
Post by Leonard Rosenthol
You COULD look to see if there are update sections in the PDF - and then examine what's in the update section...
-----Original Message-----
Sent: Tuesday, February 14, 2012 11:47 AM
Subject: Re: [iText-questions] Strip Annotations?
In this case, as I said, I see that the original PDF was created with iText. There is an XObject Dictionary with one Stream object ("/lm0" -- I'm verifying if that is always the name).
However, the end-user added their comment (and yes, I don't know why they didn't use a specific comment tool) with some version of Acrobat -- that much I do know. I'm guessing they used the Text Edit tool or something instead of the Commenting or Stamp options in that app.
Too bad we can't tell when a specific object was added to the file (I CAN see that the modify date of the file is different from the create date).
Post by 1T3XT BVBA
Post by AJ Weber
Right. How do I determine if the text in the /Contents stream is the
intended contents of the page, or whether someone added it
after-the-fact? Ugh. Users...
It depends on how the text was added.
For instance: if they used PdfStamper, there will be different content
streams in an array.
Maybe the content you want to remove is in one array, which you could
remove.
Or maybe it was added as an XObject.
In that case, blanking out the content of that XObject would solve
your problem.
Or maybe it's really inside the main content stream.
In that case, you need to write a PDF syntax parser (I've written
several in the past) using PRTokeniser.
Let the parser copy all the PDF syntax except for the Text Operators
(and its operands) that draw the text you want to remove.
Have fun!
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft
developers is just $99.99! Visual Studio, SharePoint, SQL - plus
HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2012-02-14 17:10:06 UTC
Permalink
Open it up in a text editor and could the number of times you see %%EOF

But no, RUPS won't help you see what differs...

-----Original Message-----
From: AJ Weber [mailto:***@comcast.net]
Sent: Tuesday, February 14, 2012 12:03 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] Strip Annotations?

Do you know whether that would that be indicated in RUPS? I don't see anything regarding a specific section for updates. I can see that there are missing XRef entries (numerically), and that the Contents stream is one of the last entries, whereas the lm0 stream is the first.
Post by Leonard Rosenthol
You COULD look to see if there are update sections in the PDF - and then examine what's in the update section...
-----Original Message-----
Sent: Tuesday, February 14, 2012 11:47 AM
Subject: Re: [iText-questions] Strip Annotations?
In this case, as I said, I see that the original PDF was created with iText. There is an XObject Dictionary with one Stream object ("/lm0" -- I'm verifying if that is always the name).
However, the end-user added their comment (and yes, I don't know why they didn't use a specific comment tool) with some version of Acrobat -- that much I do know. I'm guessing they used the Text Edit tool or something instead of the Commenting or Stamp options in that app.
Too bad we can't tell when a specific object was added to the file (I CAN see that the modify date of the file is different from the create date).
Post by 1T3XT BVBA
Post by AJ Weber
Right. How do I determine if the text in the /Contents stream is
the intended contents of the page, or whether someone added it
after-the-fact? Ugh. Users...
It depends on how the text was added.
For instance: if they used PdfStamper, there will be different
content streams in an array.
Maybe the content you want to remove is in one array, which you could
remove.
Or maybe it was added as an XObject.
In that case, blanking out the content of that XObject would solve
your problem.
Or maybe it's really inside the main content stream.
In that case, you need to write a PDF syntax parser (I've written
several in the past) using PRTokeniser.
Let the parser copy all the PDF syntax except for the Text Operators
(and its operands) that draw the text you want to remove.
Have fun!
---------------------------------------------------------------------
-
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft
developers is just $99.99! Visual Studio, SharePoint, SQL - plus
HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft
developers is just $99.99! Visual Studio, SharePoint, SQL - plus
HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-15 14:26:44 UTC
Permalink
Post by Leonard Rosenthol
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.
In which case, removal is MUCH harder (but not impossible)
OK...if I need to remove a page's /Contents object (and thus stream),
can anyone point to a quick method to do that? Do I need to use one of
the "lower level" methods, or which class/method would be recommended?

Thanks again,
AJ
Leonard Rosenthol
2012-02-15 14:32:53 UTC
Permalink
You can't remove the entire stream - that would give you a blank page!

As Bruno said, you need to parse/analyze the page content and determine what is "good" and what is "bad".

Leonard

-----Original Message-----
From: AJ Weber [mailto:***@comcast.net]
Sent: Wednesday, February 15, 2012 9:27 AM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Strip Annotations?
Post by Leonard Rosenthol
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.
In which case, removal is MUCH harder (but not impossible)
OK...if I need to remove a page's /Contents object (and thus stream), can anyone point to a quick method to do that? Do I need to use one of the "lower level" methods, or which class/method would be recommended?

Thanks again,
AJ

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-15 15:01:29 UTC
Permalink
It should not give me a blank page. The page is actually a scanned
image -- or somehow the entire mediabox is filled with an image of the
document (I say somehow, because the Producer info says iText 2.1.4, but
there's no actual Contents in the original document's page).

The "/Contents" of the page is actually what a user entered using the
Acrobat Std/Pro "Touch-Up Text Tool". That's all. The actual
text-content of the document isn't text at all; like I said, it's a
single image that fills the entire mediabox. Basically, they use that
tool instead of a more appropriate Annotation mechanism such as a
"stamp" or "text box".

Thus the /Contents and the actual document's content is entirely different.

Since the actual content of the document is an image, we are sending it
to an OCR step. If there is something in the /Contents, the OCR engine
assumes there is no need to OCR and the result is virtually the same
output PDF. I need to remove that /Contents object so the OCR engine
detects that it needs to OCR the underlying image; not rely upon the
exiting text.

So I would expect that if I CAN remove the /Contents object from a page,
and there is still an image filling the mediabox for that page, we would
still have that displayed (and OCR'ed correctly).
Post by Leonard Rosenthol
You can't remove the entire stream - that would give you a blank page!
As Bruno said, you need to parse/analyze the page content and determine what is "good" and what is "bad".
Leonard
-----Original Message-----
Sent: Wednesday, February 15, 2012 9:27 AM
Subject: Re: [iText-questions] Strip Annotations?
Post by Leonard Rosenthol
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.
In which case, removal is MUCH harder (but not impossible)
OK...if I need to remove a page's /Contents object (and thus stream), can anyone point to a quick method to do that? Do I need to use one of the "lower level" methods, or which class/method would be recommended?
Thanks again,
AJ
------------------------------------------------------------------------------
Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Virtualization& Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2012-02-15 15:47:37 UTC
Permalink
AJ - you seem to be mixing human terminology and PDF terminology. A quick read of the relevant sections of the PDF standard will probably help.



Here is something I wrote for my upcoming book on PDF that might be helpful to you:

As described in the previous chapter, a PDF file is composed of one or more pages (of a fixed size), and the visible elements on each page come from either the page content or a series of annotations that sit on top (visibly) of the content. This chapter discusses the page content.



Page content is described using a special text-based syntax (related, but different, from the PDF file syntax that you learned about in an earlier chapter) which are stored in the PDF inside of a special type of stream object called a "Content Stream". The content syntax is derived from Adobe's Postscript language and is comprised of a series of operators and their operands, where each operand can be expressed as a standard PDF object.



Given the above, you have a PDF page that consists of a Content Stream which has operators that tell the PDF reader to draw an image and also draw some text. Think of it as something like the following (which doesn't represent reality, but should help you)

SaveState

SetImageLocation

DrawImage

RestoreState

SaveState

SetTextLocation

SetTextFontAndSize

DrawText

RestoreState



If you removed that entire Content Stream (the value of the /Contents on the Page object), you'd lose BOTH the image and the text. Since you only want to lose the text, you would need to scan/parse/analyze the Content Stream, find the "text parts" and remove them.



Does that make more sense??



Leonard



-----Original Message-----
From: AJ Weber [mailto:***@comcast.net]
Sent: Wednesday, February 15, 2012 10:01 AM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Strip Annotations?



It should not give me a blank page. The page is actually a scanned image -- or somehow the entire mediabox is filled with an image of the document (I say somehow, because the Producer info says iText 2.1.4, but there's no actual Contents in the original document's page).



The "/Contents" of the page is actually what a user entered using the Acrobat Std/Pro "Touch-Up Text Tool". That's all. The actual text-content of the document isn't text at all; like I said, it's a single image that fills the entire mediabox. Basically, they use that tool instead of a more appropriate Annotation mechanism such as a "stamp" or "text box".



Thus the /Contents and the actual document's content is entirely different.



Since the actual content of the document is an image, we are sending it to an OCR step. If there is something in the /Contents, the OCR engine assumes there is no need to OCR and the result is virtually the same output PDF. I need to remove that /Contents object so the OCR engine detects that it needs to OCR the underlying image; not rely upon the exiting text.



So I would expect that if I CAN remove the /Contents object from a page, and there is still an image filling the mediabox for that page, we would still have that displayed (and OCR'ed correctly).
Post by Leonard Rosenthol
You can't remove the entire stream - that would give you a blank page!
As Bruno said, you need to parse/analyze the page content and determine what is "good" and what is "bad".
Leonard
-----Original Message-----
Sent: Wednesday, February 15, 2012 9:27 AM
Subject: Re: [iText-questions] Strip Annotations?
Post by Leonard Rosenthol
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.
In which case, removal is MUCH harder (but not impossible)
OK...if I need to remove a page's /Contents object (and thus stream), can anyone point to a quick method to do that? Do I need to use one of the "lower level" methods, or which class/method would be recommended?
Thanks again,
AJ
----------------------------------------------------------------------
-------- Virtualization& Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- Virtualization& Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing also
focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------

Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service.

http://www.accelacomm.com/jaw/sfnl/114/51521223/

_______________________________________________

iText-questions mailing list

iText-***@lists.sourceforge.net<mailto:iText-***@lists.sourceforge.net>

https://lists.sourceforge.net/lists/listinfo/itext-questions



iText(R) is a registered trademark of 1T3XT BVBA.

Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Bruno Lowagie
2012-02-15 16:01:19 UTC
Permalink
Post by Leonard Rosenthol
Here is something I wrote for my upcoming book on PDF
Aha, that's interesting news! I'll be watching the list for news about this!
AJ Weber
2012-02-15 20:06:12 UTC
Permalink
OK, I'm a little further down the road with this. I've used the text
extraction methods to check that I'm in the right place on the page, and
separately used PRTokeniser to check the content of the page.

However, if I can piece together which tokens of the content I want to
remove, how do I go about that? I don't see a way to basically read the
tokens and then filter-out and write only the ones I want to (a la a
PdfStamper type class).

Bonus question: If I want to "set aside" the Strings that I'm not
writing to the filtered-output-PDF, is there a way to determine more
information about the String tokens (page location, font info, etc.)?

Thanks again,
AJ
Post by Leonard Rosenthol
You can't remove the entire stream - that would give you a blank page!
As Bruno said, you need to parse/analyze the page content and determine what is "good" and what is "bad".
Leonard
-----Original Message-----
Sent: Wednesday, February 15, 2012 9:27 AM
Subject: Re: [iText-questions] Strip Annotations?
Post by Leonard Rosenthol
Sure, it's possible that they are using some tool that adds text directly to the content instead of as an annotation. Perfectly valid.
In which case, removal is MUCH harder (but not impossible)
OK...if I need to remove a page's /Contents object (and thus stream), can anyone point to a quick method to do that? Do I need to use one of the "lower level" methods, or which class/method would be recommended?
Thanks again,
AJ
------------------------------------------------------------------------------
Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Virtualization& Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-16 16:22:39 UTC
Permalink
So, now I'm trying to get the Contents of a specific page (of an
existing PDF) so that I can possibly modify it during copy (with either
PdfCopy or PdfStamper hopefully).

I am missing a step, because I can't get the contents of the actual
/Contents stream. I've tried various scenarios similar to the following:
PdfDictionary dict = reader.getPageN(1);
PdfIndirectReference str =
dict.getAsIndirectObject(PdfName.CONTENTS);
String testcontent = str.toString();

But these typically just get the /Contents map (like the properties of
length and filter -- this particular one just returns the dictionary
entry number). Once I have the /Contents entry, how do I get its stream?

I also tried the simpler reader.getPageContents(), but that gets the raw
bytes, and I'm not sure how to parse/read them from there. Is there a
way to apply the designated filter to "decode" the bytes to something
more understandable, maybe? (I have only seen FlateDecode used so far
for filter.)

Thanks for any help pointing me in the right direction.

-AJ
AJ Weber
2012-02-16 17:19:08 UTC
Permalink
Is there a way, maybe to use FilteredRenderListener class to insert into
the PdfCopy/Stamper such that as content is read from the reader, it is
filtered prior to copying to the output pdf???

That would be very cool!

-AJ
Post by AJ Weber
So, now I'm trying to get the Contents of a specific page (of an
existing PDF) so that I can possibly modify it during copy (with
either PdfCopy or PdfStamper hopefully).
I am missing a step, because I can't get the contents of the actual
PdfDictionary dict = reader.getPageN(1);
PdfIndirectReference str =
dict.getAsIndirectObject(PdfName.CONTENTS);
String testcontent = str.toString();
But these typically just get the /Contents map (like the properties of
length and filter -- this particular one just returns the dictionary
entry number). Once I have the /Contents entry, how do I get its stream?
I also tried the simpler reader.getPageContents(), but that gets the
raw bytes, and I'm not sure how to parse/read them from there. Is
there a way to apply the designated filter to "decode" the bytes to
something more understandable, maybe? (I have only seen FlateDecode
used so far for filter.)
Thanks for any help pointing me in the right direction.
-AJ
iText Info
2012-02-16 17:28:32 UTC
Permalink
Post by AJ Weber
Is there a way, maybe to use FilteredRenderListener class to insert into
the PdfCopy/Stamper such that as content is read from the reader, it is
filtered prior to copying to the output pdf???
That would be very cool!
That's exactly what I've tried to explain in an earlier mail!
I've written many different parsers that did exactly that!
BUT: they are custom parser for custom purposes, and that's why you will
need to write a custom parser for your custom need.
This is an example of such code:
http://itext.svn.sourceforge.net/viewvc/itext/trunk/xtra/src/main/java/com/itextpdf/text/pdf/ocg/
AJ Weber
2012-02-16 17:57:23 UTC
Permalink
I am sorry. With Leonard's help yesterday, I started to realize that
your suggestion was exactly what I should look to do, but I was (am
still) struggling with how to implement it given the iText framework.
In other words, how to use a parser/tokeniser to manipulate an existing
PDF [Page] and output the filtered result.

Your example is excellent (as usual), thank you.

Sorry for being so obtuse, but can I ask about where this would be
inserted into the overall effort? Would I:
1) Open existing PDF in PdfReader;
2) Use similar classes to OCGParser/Remover to operate on the reader;
3) Create a PdfCopy or PdfStamper to then simply copy the reader to a
new output PDF (and I assume the changes made to the reader-interface
would be applied to the output)?

I was thinking that writing a "filter" class (extending RenderListener
or similar) and maybe a way to insert that as a sort-of callback for the
PdfCopy to use while copying the PDF (filtering dynamically as pages are
output) was what you had envisioned when naming the class ...Listener.

Thanks again,
AJ
Post by iText Info
Post by AJ Weber
Is there a way, maybe to use FilteredRenderListener class to insert into
the PdfCopy/Stamper such that as content is read from the reader, it is
filtered prior to copying to the output pdf???
That would be very cool!
That's exactly what I've tried to explain in an earlier mail!
I've written many different parsers that did exactly that!
BUT: they are custom parser for custom purposes, and that's why you will
need to write a custom parser for your custom need.
http://itext.svn.sourceforge.net/viewvc/itext/trunk/xtra/src/main/java/com/itextpdf/text/pdf/ocg/
------------------------------------------------------------------------------
Virtualization& Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
iText Info
2012-02-16 18:40:20 UTC
Permalink
Post by AJ Weber
1) Open existing PDF in PdfReader;
Yes.
Post by AJ Weber
2) Use similar classes to OCGParser/Remover to operate on the reader;
Yes.
Post by AJ Weber
3) Create a PdfCopy or PdfStamper to then simply copy the reader to a
new output PDF (and I assume the changes made to the reader-interface
would be applied to the output)?
PdfStamper, NOT PdfCopy.

For instance:

public static void main(String[] args) throws IOException,
DocumentException {
PdfReader reader = new PdfReader("original.pdf");
MyParser parser = new MyParser(args[0]);
PdfDictionary page;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
page = reader.getPageN(i);
PRStream stream = (PRStream)page.getAsStream(PdfName.CONTENTS);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
parser.parse(stream, resources);
}
reader.removeUnusedObjects();
PdfStamper stamper = new PdfStamper(reader, new
FileOutputStream("fixed.pdf"));
stamper.close();
}

MyParser parses and copies all PDF syntax, except for some particular
part defined by args[0].
Note that this code snippet assumes that the /Contents element in the
page dictionary is a stream. In your case it could be an array in which
case the code will throw a ClassCastException.

I suggest that you start by removing everything from the OCG parser
keeping the dummy operator.
This should result in code that makes a copy of the original file.
Then you'd need to find out which operator is used to add the part you
want to remove.
Create a custom operator class that does exactly the same thing as the
dummy operator, except when the part you don't want to copy is encountered.
AJ Weber
2012-02-16 19:00:23 UTC
Permalink
Absolutely fantastic! Thank you very much!

BTW: Is there a specific section of the PDF Reference (or some online
resource) that describes what each of those "operators" is? I would
love to refer to that as well.

Thanks again!!!

-AJ
Post by iText Info
Post by AJ Weber
1) Open existing PDF in PdfReader;
Yes.
Post by AJ Weber
2) Use similar classes to OCGParser/Remover to operate on the reader;
Yes.
Post by AJ Weber
3) Create a PdfCopy or PdfStamper to then simply copy the reader to a
new output PDF (and I assume the changes made to the reader-interface
would be applied to the output)?
PdfStamper, NOT PdfCopy.
public static void main(String[] args) throws IOException,
DocumentException {
PdfReader reader = new PdfReader("original.pdf");
MyParser parser = new MyParser(args[0]);
PdfDictionary page;
for (int i = 1; i<= reader.getNumberOfPages(); i++) {
page = reader.getPageN(i);
PRStream stream = (PRStream)page.getAsStream(PdfName.CONTENTS);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
parser.parse(stream, resources);
}
reader.removeUnusedObjects();
PdfStamper stamper = new PdfStamper(reader, new
FileOutputStream("fixed.pdf"));
stamper.close();
}
MyParser parses and copies all PDF syntax, except for some particular
part defined by args[0].
Note that this code snippet assumes that the /Contents element in the
page dictionary is a stream. In your case it could be an array in which
case the code will throw a ClassCastException.
I suggest that you start by removing everything from the OCG parser
keeping the dummy operator.
This should result in code that makes a copy of the original file.
Then you'd need to find out which operator is used to add the part you
want to remove.
Create a custom operator class that does exactly the same thing as the
dummy operator, except when the part you don't want to copy is encountered.
------------------------------------------------------------------------------
Virtualization& Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
David Thielen
2011-11-21 18:40:15 UTC
Permalink
I just want to add thank you - have it all working great now.

Thanks - dave


-----Original Message-----
From: TvT [mailto:***@nepatec.de]
Sent: Monday, November 21, 2011 3:43 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

Hi,

here's an example:
http://itextpdf.com/examples/iia.php?id=226

And you are right, the method to use is called 'setPDFXConformance':
(Probably historical reasons?)
writer.setPDFXConformance(PdfWriter.PDFX1A2001);

You should better upgrade to the latest version. If you can't do that
try to use at least 2.1.7.
I checked the history of iText since 2.1.7 and didn't find (major)
changes in the PDF/A area.
But maybe i overlooked it. Best you check for yourself again:
http://itextpdf.com/history/

Also try to run your code in the current version and verify the output...
Post by David Thielen
The javadoc for 5.1.3 is the same - just setPDFXConformance(). I think that is how it's set. But maybe version 2 didn't fully implement it.
Thanks - dave
Sent: Friday, November 18, 2011 11:08 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Then I am guessing whatever version of iText you are using is too old :(.
Date: Fri, 18 Nov 2011 09:48:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Sets the PDF/X conformance level. Allowed values are PDFX1A2001, PDFX32002, PDFA1A and PDFA1B. It must be called before opening the document.
I'm clearly missing something but going through the javadoc I can't find anything else that discusses PDF/A.
Thanks - dave
Sent: Friday, November 18, 2011 9:49 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDFAConformance.  PDF/X is a completely separate standard!
Date: Fri, 18 Nov 2011 08:12:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);
·         Author mismatch between Document Info and XMP Metadata
·         Creation date mismatch between Document Info and XMP Metadata
·         Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)
·         Last Modification Date mismatch between Document Info and XMP Metadata
·         Metadata missing (XMP)
·         PDF/A entry missing
·         Producer mismatch between Document Info and XMP Metadata
Are these expected? Or do I need to set something else also?
Thanks - dave
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
If you are creating the PDF ENTIRELY with iText - then you can just use the setConformance() API and it will take care of the details for you.  If you are starting with an existing PDF - then there aren't any options for iText at this time.
But that will ONLY get you PDF/A-1b.
PDF/A-1a requires that you properly structure & tag your content - something that iText will not (currently) do for you.  So you will need to do all that work yourself if you want full conformance.
And then there's PDF/A-2, which iText doesn't currently support either - but most folks aren't there just yet.
Leonard
Date: Thu, 17 Nov 2011 08:12:38 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I didn't know about pre-flight, that's cool.
·         Convert to PDF/A-1a (sRGB)
·         Convert to PDF/A-1b (sRGB)
·         Author mismatch between Document Info and XMP Metadata
·         CIDset in subset font missing (238 matches on 4 pages)
·         Creation date mismatch between Document Info and XMP Metadata
·         Device process color used but no PDF/A OutputIntent (253 matches on 4 pages)
·         Last Modification Date mismatch between Document Info and XMP Metadata
·         MarkInfo missing
·         Metadata missing (XMP)
·         PDF/A entry missing
·         Producer mismatch between Document Info and XMP Metadata
·         Structured PDF: Structure tree root entry missing
The biggies seem to be the CIDset for the fonts and colors stored correctly. I'm guessing this is not a simple couple of hours to add in.
Is there a setting in iText to set these values or does this require a 3rd party app?
Thanks - dave
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by non-technical people.
What you really mean to say is "no externally referenced resources/assets".
Leonard
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDF/A-1a or PDF/A-1b or PDF/A-2?
2. Even if you take the simplest PDF/A-1b there is lots of stuff to consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO 19005-2:2011) No javascript, no external referencing, colors etc etc.
3. What probably acrobat is looking at is the PDF/A tag in the meta information. Probably if you set that one acrobat will say its PDF/A. A better check is the PDF/A preflight check acrobat professional is offering. It shows you which part of the spec you are missing. If all tests pass then you probably have a 95% compliant PDF/A document.
Regards,
ToM
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.
thanks - dave
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2011-11-21 19:09:45 UTC
Permalink
Depends on what ICC profile you specify for the output intent. If you use
a CMYK profile, then you can't use RGB data. If you use an RGB profile,
then you can't have CMYK data. Your choice.

Leonard
Post by David Thielen
Thank you - this is getting me a lot further. Now I'm getting an
exception saying that the RGB colorspace is not allowed. It looks like we
have to convert to CMYK for the colorspace.
So question, is it better to always use CMYK or should we stick with RGB
and only use CMYK if we need PDF/A output?
Thanks - dave
-----Original Message-----
Sent: Monday, November 21, 2011 3:43 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Hi,
http://itextpdf.com/examples/iia.php?id=226
(Probably historical reasons?)
writer.setPDFXConformance(PdfWriter.PDFX1A2001);
You should better upgrade to the latest version. If you can't do that try
to use at least 2.1.7.
I checked the history of iText since 2.1.7 and didn't find (major)
changes in the PDF/A area.
http://itextpdf.com/history/
Also try to run your code in the current version and verify the output...
Post by David Thielen
The javadoc for 5.1.3 is the same - just setPDFXConformance(). I think
that is how it's set. But maybe version 2 didn't fully implement it.
Thanks - dave
Sent: Friday, November 18, 2011 11:08 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Then I am guessing whatever version of iText you are using is too old
:(.
Date: Fri, 18 Nov 2011 09:48:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I understand that. However PdfWriter only has setPDFXConformance(), no
setConformance() or setPDFAConformance(). And the documentation for
Sets the PDF/X conformance level. Allowed values are PDFX1A2001,
PDFX32002, PDFA1A and PDFA1B. It must be called before opening the
document.
I'm clearly missing something but going through the javadoc I can't
find anything else that discusses PDF/A.
Thanks - dave
Sent: Friday, November 18, 2011 9:49 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDFAConformance. PDF/X is a completely separate standard!
Date: Fri, 18 Nov 2011 08:12:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);
· Author mismatch between Document Info and XMP Metadata
· Creation date mismatch between Document Info and XMP
Metadata
· Device process color used but no PDF/A OutputIntent (253
matches on 4 pages)
· Last Modification Date mismatch between Document Info and
XMP Metadata
· Metadata missing (XMP)
· PDF/A entry missing
· Producer mismatch between Document Info and XMP Metadata
Are these expected? Or do I need to set something else also?
Thanks - dave
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
If you are creating the PDF ENTIRELY with iText - then you can just use
the setConformance() API and it will take care of the details for you.
If you are starting with an existing PDF - then there aren't any options
for iText at this time.
But that will ONLY get you PDF/A-1b.
PDF/A-1a requires that you properly structure & tag your content -
something that iText will not (currently) do for you. So you will need
to do all that work yourself if you want full conformance.
And then there's PDF/A-2, which iText doesn't currently support either
- but most folks aren't there just yet.
Leonard
Date: Thu, 17 Nov 2011 08:12:38 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I didn't know about pre-flight, that's cool.
· Convert to PDF/A-1a (sRGB)
· Convert to PDF/A-1b (sRGB)
I then double clicked on "Verify compliance with PDF/A-1a" and got a
· Author mismatch between Document Info and XMP Metadata
· CIDset in subset font missing (238 matches on 4 pages)
· Creation date mismatch between Document Info and XMP
Metadata
· Device process color used but no PDF/A OutputIntent (253
matches on 4 pages)
· Last Modification Date mismatch between Document Info and
XMP Metadata
· MarkInfo missing
· Metadata missing (XMP)
· PDF/A entry missing
· Producer mismatch between Document Info and XMP Metadata
· Structured PDF: Structure tree root entry missing
The biggies seem to be the CIDset for the fonts and colors stored
correctly. I'm guessing this is not a simple couple of hours to add in.
Is there a setting in iText to set these values or does this require a
3rd party app?
Thanks - dave
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by
non-technical people.
What you really mean to say is "no externally referenced
resources/assets".
Leonard
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDF/A-1a or PDF/A-1b or PDF/A-2?
2. Even if you take the simplest PDF/A-1b there is lots of stuff to
consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO
19005-2:2011) No javascript, no external referencing, colors etc etc.
3. What probably acrobat is looking at is the PDF/A tag in the meta
information. Probably if you set that one acrobat will say its PDF/A. A
better check is the PDF/A preflight check acrobat professional is
offering. It shows you which part of the spec you are missing. If all
tests pass then you probably have a 95% compliant PDF/A document.
Regards,
ToM
I thought it was just embedding fonts but when we do that Acrobat says
it is not PDF/A.
thanks - dave
----------------------------------------------------------------------
-------- All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
--------------------------------------------------------------------------
----
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please check
http://itextpdf.com/themes/keywords.php
--------------------------------------------------------------------------
----
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/
http://itextpdf.com/themes/keywords.php
David Thielen
2011-11-21 22:14:37 UTC
Permalink
Ok - thanks. Got it working with RGB so I'll stick with that.



-----Original Message-----
From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Monday, November 21, 2011 12:10 PM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?

Depends on what ICC profile you specify for the output intent. If you use
a CMYK profile, then you can't use RGB data. If you use an RGB profile,
then you can't have CMYK data. Your choice.

Leonard
Post by David Thielen
Thank you - this is getting me a lot further. Now I'm getting an
exception saying that the RGB colorspace is not allowed. It looks like we
have to convert to CMYK for the colorspace.
So question, is it better to always use CMYK or should we stick with RGB
and only use CMYK if we need PDF/A output?
Thanks - dave
-----Original Message-----
Sent: Monday, November 21, 2011 3:43 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Hi,
http://itextpdf.com/examples/iia.php?id=226
(Probably historical reasons?)
writer.setPDFXConformance(PdfWriter.PDFX1A2001);
You should better upgrade to the latest version. If you can't do that try
to use at least 2.1.7.
I checked the history of iText since 2.1.7 and didn't find (major) changes in the PDF/A area.
http://itextpdf.com/history/
Also try to run your code in the current version and verify the output...
Post by David Thielen
The javadoc for 5.1.3 is the same - just setPDFXConformance(). I think
that is how it's set. But maybe version 2 didn't fully implement it.
Thanks - dave
Sent: Friday, November 18, 2011 11:08 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Then I am guessing whatever version of iText you are using is too old :(.
Date: Fri, 18 Nov 2011 09:48:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I understand that. However PdfWriter only has setPDFXConformance(), no
setConformance() or setPDFAConformance(). And the documentation for
Sets the PDF/X conformance level. Allowed values are PDFX1A2001,
PDFX32002, PDFA1A and PDFA1B. It must be called before opening the
document.
I'm clearly missing something but going through the javadoc I can't
find anything else that discusses PDF/A.
Thanks - dave
Sent: Friday, November 18, 2011 9:49 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDFAConformance. PDF/X is a completely separate standard!
Date: Fri, 18 Nov 2011 08:12:16 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I set it using PdfWriter. setPDFXConformance(PdfWriter.PDFA1B);
* Author mismatch between Document Info and XMP Metadata
* Creation date mismatch between Document Info and XMP
Metadata
* Device process color used but no PDF/A OutputIntent (253
matches on 4 pages)
* Last Modification Date mismatch between Document Info and
XMP Metadata
* Metadata missing (XMP)
* PDF/A entry missing
* Producer mismatch between Document Info and XMP Metadata
Are these expected? Or do I need to set something else also?
Thanks - dave
Sent: Thursday, November 17, 2011 9:20 AM
To: Post here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
If you are creating the PDF ENTIRELY with iText - then you can just use
the setConformance() API and it will take care of the details for you.
If you are starting with an existing PDF - then there aren't any options
for iText at this time.
But that will ONLY get you PDF/A-1b.
PDF/A-1a requires that you properly structure & tag your content -
something that iText will not (currently) do for you. So you will need
to do all that work yourself if you want full conformance.
And then there's PDF/A-2, which iText doesn't currently support either
- but most folks aren't there just yet.
Leonard
Date: Thu, 17 Nov 2011 08:12:38 -0800
Subject: Re: [iText-questions] What is required to make a file PDF/A?
I didn't know about pre-flight, that's cool.
* Convert to PDF/A-1a (sRGB)
* Convert to PDF/A-1b (sRGB)
* Author mismatch between Document Info and XMP Metadata
* CIDset in subset font missing (238 matches on 4 pages)
* Creation date mismatch between Document Info and XMP
Metadata
* Device process color used but no PDF/A OutputIntent (253
matches on 4 pages)
* Last Modification Date mismatch between Document Info and
XMP Metadata
* MarkInfo missing
* Metadata missing (XMP)
* PDF/A entry missing
* Producer mismatch between Document Info and XMP Metadata
* Structured PDF: Structure tree root entry missing
The biggies seem to be the CIDset for the fonts and colors stored
correctly. I'm guessing this is not a simple couple of hours to add in.
Is there a setting in iText to set these values or does this require a 3rd party app?
Thanks - dave
Sent: Thursday, November 17, 2011 8:44 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
Post by TvT
no external referencing
Careful with that phrase as it's led to misunderstanding by
non-technical people.
What you really mean to say is "no externally referenced
resources/assets".
Leonard
Sent: Thursday, November 17, 2011 7:12 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] What is required to make a file PDF/A?
PDF/A-1a or PDF/A-1b or PDF/A-2?
2. Even if you take the simplest PDF/A-1b there is lots of stuff to
consider. Best you read the PDF/A spec. (ISO 19005-1:2005 or ISO
19005-2:2011) No javascript, no external referencing, colors etc etc.
3. What probably acrobat is looking at is the PDF/A tag in the meta
information. Probably if you set that one acrobat will say its PDF/A. A
better check is the PDF/A preflight check acrobat professional is
offering. It shows you which part of the spec you are missing. If all
tests pass then you probably have a 95% compliant PDF/A document.
Regards,
ToM
I thought it was just embedding fonts but when we do that Acrobat says it is not PDF/A.
thanks - dave
----------------------------------------------------------------------
-------- All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
----------------------------------------------------------------------
-------- All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please
http://itextpdf.com/themes/keywords.php
--------------------------------------------------------------------------
----
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/ Please check
http://itextpdf.com/themes/keywords.php
--------------------------------------------------------------------------
----
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Leonard Rosenthol
2012-02-16 21:01:44 UTC
Permalink
There is only one definitive standard - ISO 32000-1:2008. You can get it
from Adobe's website.

Leonard
Post by AJ Weber
Absolutely fantastic! Thank you very much!
BTW: Is there a specific section of the PDF Reference (or some online
resource) that describes what each of those "operators" is? I would
love to refer to that as well.
Thanks again!!!
-AJ
Post by iText Info
Post by AJ Weber
1) Open existing PDF in PdfReader;
Yes.
Post by AJ Weber
2) Use similar classes to OCGParser/Remover to operate on the reader;
Yes.
Post by AJ Weber
3) Create a PdfCopy or PdfStamper to then simply copy the reader to a
new output PDF (and I assume the changes made to the reader-interface
would be applied to the output)?
PdfStamper, NOT PdfCopy.
public static void main(String[] args) throws IOException,
DocumentException {
PdfReader reader = new PdfReader("original.pdf");
MyParser parser = new MyParser(args[0]);
PdfDictionary page;
for (int i = 1; i<= reader.getNumberOfPages(); i++) {
page = reader.getPageN(i);
PRStream stream =
(PRStream)page.getAsStream(PdfName.CONTENTS);
PdfDictionary resources =
page.getAsDict(PdfName.RESOURCES);
parser.parse(stream, resources);
}
reader.removeUnusedObjects();
PdfStamper stamper = new PdfStamper(reader, new
FileOutputStream("fixed.pdf"));
stamper.close();
}
MyParser parses and copies all PDF syntax, except for some particular
part defined by args[0].
Note that this code snippet assumes that the /Contents element in the
page dictionary is a stream. In your case it could be an array in which
case the code will throw a ClassCastException.
I suggest that you start by removing everything from the OCG parser
keeping the dummy operator.
This should result in code that makes a copy of the original file.
Then you'd need to find out which operator is used to add the part you
want to remove.
Create a custom operator class that does exactly the same thing as the
dummy operator, except when the part you don't want to copy is
encountered.
-------------------------------------------------------------------------
-----
Virtualization& Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/
http://itextpdf.com/themes/keywords.php
--------------------------------------------------------------------------
----
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a
reference to the iText book: http://www.itextpdf.com/book/
http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-21 17:14:16 UTC
Permalink
I'm apparently doing something wrong. I'm trying to append some content
directly to the page's existing content-stream (it's already formatted
with the correct operators & operands, because I removed it previously).

Not only is the updated content not being written, but my simple removal
of one docInfo property is also ineffective. However, a PDF [copy] is
being written, and no exceptions are being thrown.

Here's a snippet of the java method:

if (docinfo.containsKey("IP_Text")) {
String ipText = docinfo.get("IP_Text");
docinfo.remove("IP_Text");

PdfDictionary page = reader.getPageN(1);
PRStream stream =
(PRStream)page.getAsStream(PdfName.CONTENTS);
byte[] contentBytes = PdfReader.getStreamBytes(stream);
baos = new
ByteArrayOutputStream(contentBytes.length + ipText.length());
baos.write(contentBytes);
baos.write(ipText.getBytes());
baos.flush();
baos.close(); //unnecessary?
stream.setData(baos.toByteArray());

copy = new PdfStamper(reader, new
FileOutputStream(copyFilename));
copy.setMoreInfo(docinfo);
...
copy.close();
1T3XT BVBA
2012-02-22 07:08:09 UTC
Permalink
Post by AJ Weber
I'm apparently doing something wrong.
Your code is very strange. An outsider seeing the code out of context
has no idea what you're trying to achieve.
Post by AJ Weber
I'm trying to append some content
directly to the page's existing content-stream (it's already formatted
with the correct operators& operands, because I removed it previously).
What makes you think this will result in valid PDF syntax?
Post by AJ Weber
Not only is the updated content not being written,
Maybe it is written, but not visible when you open the file in Adobe Reader.
Post by AJ Weber
but my simple removal
of one docInfo property is also ineffective.
What is the docInfo property?
Post by AJ Weber
However, a PDF [copy] is
being written, and no exceptions are being thrown.
if (docinfo.containsKey("IP_Text")) {
String ipText = docinfo.get("IP_Text");
docinfo.remove("IP_Text");
What is docinfo? Is it a dictionary?
What is "IP_Text"?
Post by AJ Weber
PdfDictionary page = reader.getPageN(1);
PRStream stream =
(PRStream)page.getAsStream(PdfName.CONTENTS);
byte[] contentBytes = PdfReader.getStreamBytes(stream);
baos = new
ByteArrayOutputStream(contentBytes.length + ipText.length());
baos.write(contentBytes);
baos.write(ipText.getBytes());
baos.flush();
baos.close(); //unnecessary?
stream.setData(baos.toByteArray());
What is ipText? Is it valid PDF syntax?
Isn't it extremely dangerous to append a snippet of PDF syntax like this?
What are you trying to achieve?
Post by AJ Weber
copy = new PdfStamper(reader, new
FileOutputStream(copyFilename));
copy.setMoreInfo(docinfo);
...
copy.close();
Aha, docinfo is a dictionary. More specifically: metadata.
You're using a method to add more metadata in an attempt to remove
metadata...
Also: you are adding some existing metadata to the content stream of a page.

I have no clue about what you're trying to achieve.
I only have an opinion: what you're doing is dangerous. I wouldn't do it.
AJ Weber
2012-02-22 14:12:14 UTC
Permalink
I'm trying to append some content
directly to the page's existing content-stream (it's already formatted
with the correct operators& operands, because I removed it previously).
Post by 1T3XT BVBA
What makes you think this will result in valid PDF syntax?
Because I removed the exact block of PDF syntax from the stream in a
previous step (and although it could be different, in this particular
case, it was the last part of the Contents stream, so it's putting it
back exactly where it was).
Post by 1T3XT BVBA
Post by AJ Weber
Not only is the updated content not being written,
Maybe it is written, but not visible when you open the file in Adobe Reader.
You are right here. I checked the document with RUPS, and the stream
has the additional block restored to the end of it, but it is not being
displayed. I'm unclear why that is, since, like I said, it's identical
to what I originally removed, and in this case, it's in the exact same
location of the stream.
Post by 1T3XT BVBA
What is ipText? Is it valid PDF syntax? Isn't it extremely dangerous
to append a snippet of PDF syntax like this? What are you trying to
achieve?
It's the block of text extracted from the Contents stream in a previous
step, exactly as it was in the Contents stream (it's a text block
starting with BT and ending with ET). I needed a temporary place to
store it with the document; I considered creating an XObject and putting
it there, but it would not be referenced in the temporary PDF file, and
I was theorizing it could be stripped-out during an optimization, plus I
didn't see a way to add the raw PDF syntax with the iText methods.
Post by 1T3XT BVBA
Aha, docinfo is a dictionary. More specifically: metadata.
You're using a method to add more metadata in an attempt to remove
metadata...
Also: you are adding some existing metadata to the content stream of a page.
Is there a method to remove metadata from a document (instead of
setMoreInfo? ).
Yes, the metadata "value" is, as I mentioned, already valid PDF syntax,
because I stored it there in a previous step.
Post by 1T3XT BVBA
I have no clue about what you're trying to achieve.
I only have an opinion: what you're doing is dangerous. I wouldn't do it.
I appreciate your advice (very much). My only other option would be to
insert the text back on the document using getOverContent (I guess), but
I would have to fully parse the text block (BT...ET) so I could
understand setting the font, location, text, etc. I thought extracting
the block and re-inserting it exactly as it was would save me that trouble.
AJ Weber
2012-02-22 14:44:30 UTC
Permalink
OK, although I would NOT dispute that what I'm doing is still
"dangerous", and am still open to alternative-suggestions, I found my
issue for this particular test:

When using the tokeniser, PdfContentParser and PdfLiteral methods to
parse the original stream (akin to the sample
OCGParser.java), the parenthesis were removed from around the text
literal-string. Until I flipped back and forth with RUPS to view the
stream of the original and the "edited" version, I totally missed that
the parenthesis were missing, so the PDF content was there, but the text
was basically not. I added the parenthesis back to the "Tj" operators'
operand, and it all came back fine.

I still don't know how to remove metadata (versus add it with
setMoreInfo), and I agree that this is not the best way to do this
re-add of content. I think adding it to the "over-content" is probably
the best idea, but I have to figure out how to parse the font and
location info for the text block, and am starting to peruse the PDF
Reference Spec now.

Thanks again,
AJ
Post by AJ Weber
I'm trying to append some content
directly to the page's existing content-stream (it's already formatted
with the correct operators& operands, because I removed it previously).
Post by 1T3XT BVBA
What makes you think this will result in valid PDF syntax?
Because I removed the exact block of PDF syntax from the stream in a
previous step (and although it could be different, in this particular
case, it was the last part of the Contents stream, so it's putting it
back exactly where it was).
Post by 1T3XT BVBA
Post by AJ Weber
Not only is the updated content not being written,
Maybe it is written, but not visible when you open the file in Adobe Reader.
You are right here. I checked the document with RUPS, and the stream
has the additional block restored to the end of it, but it is not being
displayed. I'm unclear why that is, since, like I said, it's identical
to what I originally removed, and in this case, it's in the exact same
location of the stream.
Post by 1T3XT BVBA
What is ipText? Is it valid PDF syntax? Isn't it extremely dangerous
to append a snippet of PDF syntax like this? What are you trying to
achieve?
It's the block of text extracted from the Contents stream in a previous
step, exactly as it was in the Contents stream (it's a text block
starting with BT and ending with ET). I needed a temporary place to
store it with the document; I considered creating an XObject and putting
it there, but it would not be referenced in the temporary PDF file, and
I was theorizing it could be stripped-out during an optimization, plus I
didn't see a way to add the raw PDF syntax with the iText methods.
Post by 1T3XT BVBA
Aha, docinfo is a dictionary. More specifically: metadata.
You're using a method to add more metadata in an attempt to remove
metadata...
Also: you are adding some existing metadata to the content stream of a page.
Is there a method to remove metadata from a document (instead of
setMoreInfo? ).
Yes, the metadata "value" is, as I mentioned, already valid PDF syntax,
because I stored it there in a previous step.
Post by 1T3XT BVBA
I have no clue about what you're trying to achieve.
I only have an opinion: what you're doing is dangerous. I wouldn't do it.
I appreciate your advice (very much). My only other option would be to
insert the text back on the document using getOverContent (I guess), but
I would have to fully parse the text block (BT...ET) so I could
understand setting the font, location, text, etc. I thought extracting
the block and re-inserting it exactly as it was would save me that trouble.
------------------------------------------------------------------------------
Virtualization& Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
AJ Weber
2012-02-22 19:38:13 UTC
Permalink
This question stems from my previous question...

I'm trying to remove document metadata with code very similar to that in
iText In Action (2nd ed), Listing 12.2. That is, I'm retrieving the map
from the reader, and setting it on the stamper with .setMoreInfo().

The paragraph in the book immediately following the listing (Section
12.1.1, page 382 in my copy) says that I should be able to "...add,
remove or replace entries in the HashMap, and put the altered metadata
in the PDF using setMoreInfo()."

This doesn't seem to be working, and one of the comments from either
Bruno or Paulo (I'm not 100% sure who responds with the "1T3XT BVBA"
email address) seems to indicate that it isn't correct.

Which is wrong? Is there a bug in setMoreInfo, or is the book
incorrect? Does anyone know how I can update/remove entries in the
info-dictionary using stamper?

I'm using iText 5.1.1 with Java5.

Thanks,
AJ
AJ Weber
2012-02-22 19:48:06 UTC
Permalink
I answered my question...both are right!

To remove an entry, as per the javadocs, you need to update the map with
a null value, not use the map.remove method.

So the book is right, you can remove entries with setMoreInfo, but since
it's a little counter intuitive as to how to do that, I would recommend
that the 3rd Edition have a quick example or sentence identifying how. :)

Sorry to clog-up the list.

-AJ
Post by AJ Weber
This question stems from my previous question...
I'm trying to remove document metadata with code very similar to that in
iText In Action (2nd ed), Listing 12.2. That is, I'm retrieving the map
from the reader, and setting it on the stamper with .setMoreInfo().
The paragraph in the book immediately following the listing (Section
12.1.1, page 382 in my copy) says that I should be able to "...add,
remove or replace entries in the HashMap, and put the altered metadata
in the PDF using setMoreInfo()."
This doesn't seem to be working, and one of the comments from either
Bruno or Paulo (I'm not 100% sure who responds with the "1T3XT BVBA"
email address) seems to indicate that it isn't correct.
Which is wrong? Is there a bug in setMoreInfo, or is the book
incorrect? Does anyone know how I can update/remove entries in the
info-dictionary using stamper?
I'm using iText 5.1.1 with Java5.
Thanks,
AJ
------------------------------------------------------------------------------
Virtualization& Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Continue reading on narkive:
Loading...