Discussion:
[iText-questions] Mission-critical pdf with thousands of statement of accounts / Design considerations to be verified by the experts
m***@mw-informatik.ch
2009-11-10 13:25:52 UTC
Permalink
Hi Bruno, Paulo and all the other iText cracks,
I have to implement for a big insurance company a PDF including thousands of statement of accounts. This mission-critical application runs on IBM z/OS as a Java batch job (WebSphere XD). You would like to share my design considerations and issues with you and appreciate your comments and recommendations. Many thanks in advance that you spend time to support me. I like to iText book and the comprehensive samples.

1. Requirements
PDF includes up to 3'500 statement of accounts, a statement of account can have up to 50 pages, we do not expect that the PDF has more than 20'000 pages. A typical batch-run creates a PDF with 3'500 statement of accounts with a maximum of 3 pages per statement of account. The parameters for the batch job are: range of account numbers, period, 2 - 4 variable table columns (fixed columns are: debit, credit, balance). The first page of the statement includes: page header with the address of the account owner, a table with 5 to 7 columns with a table header and 30 body rows. Page two and upward of the statement has no page header, layout of the table is equivalent to page one (table header), with 50 body rows. End of statement: empty row followed by a total row for debit, credit and balance. Each statement has its own page number in the footer (page x/y). The footer has a fix text (same content and position for all pages, independent of statement). Each odd table body row is gray underlayed ("table background"), including empty rows at the end of the table.

2. Design

Applying PdfStamper and PdfCopy
I intend to create the statement_of_accounts.pdf in several steps:
1. For each statement: prepare a statement document (statement_without_pageNumber.pdf). Add "tableBackground_first_page" PdfTemplate to the first page and "tableBackground_further_pages" PdfTemplate to the further pages via PdfPTableEvent handling
2. for each statement: stamp page number with PdfStamper (statement_with_pageNumber.pdf)
3. Add statement_with_pageNumber.pdf to the statement_of_accounts_without_footer.pdf) with PdfCopy
4. Stamp the footer (PdfTemplate) with PdfStamper (statement_of_accounts.pdf = final PDF)
The first two documents (statement_without_pageNumber.pdf / statement_with_pageNumber.pdf) are temporary files reused for each statement

3. My concerns

3.1. "table background" templates (first page / further pages)
Each statement_without_pageNumber.pdf (created in step 1) includes one (statement with one page) or two (statements with more than one page) templates. A typical batch-run creates 3'500 statements which are concatenated in step3. Is it possible the share a template across a document to reduce the pdf size ? Other (better ?) solutions to implement the "table background". Please be aware that the "table background" has to be calculated per batch-job because of the variable columns

3.2. Is step 4 (stamps a fix page footer text via PdfTemplate) an overkill ?
Rationale: template with the footer text appears only once in the fine pdf (statement_of_accounts.pdf)

4. Do you have other concerns / issues with the design ?

Again, many thanks for your help.
Kind regards,
Martin
Mike Marchywka
2009-11-10 13:47:19 UTC
Permalink
hoping to elicit input from Leonard but I have been fighting
a bunch of bloated web sites that impede automation lately and this
request sounds like it is headed down that path. Since I have
accused adobe of being a big offender in this regard, let me
supply some additional thoughts for comment for rebuttal.




----------------------------------------
Date: Tue, 10 Nov 2009 14:25:52 +0100
Subject: [iText-questions] Mission-critical pdf with thousands of statement of> 1. Requirements
PDF includes up to 3'500 statement of accounts, a statement of account can have up to 50 pages, we do not expect that the PDF has more than 20'000 pages. A typical batch-run creates a PDF with 3'500 statement of accounts with a maximum of 3 pages per statement of account. The parameters for the batch job are: range
So, what HUMAN will actually read this? Wouldn't you be better off publishing something machine readable,  maybe
some summary statistics and a zip file containing the database to let an interested reader run his own queries and analyses on the DB? When is the last time someone you knew read and understood 20,000 pages of accounting information? Have you ever tried to open a 20,000 page PDF file on your home PC, or even download one? You
could probably find a free DB to give to users and write a front end rather
than making a dead-end pdf file too big for anyone to read. You can
convert the account info to text or xml for that matter and combine
a db with a report generator. If you must generate a 20,000 page PDF, do
you plan on adding information to allow users to extract the numbers in
a useful format?






Note: hotmail is now unusable for TEXT, I am moving to ***@gmail.com or also use
***@yahoo.com. Thanks.

Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
415-264-8477 (w)<- use this
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
***@hotmail.com
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.


_________________________________________________________________
Find the right PC with Windows 7 and Windows Live.
http://www.microsoft.com/Windows/pc-scout/laptop-set-criteria.aspx?cbid=wl&filt=200,2400,10,19,1,3,1,7,50,650,2,12,0,1000&cat=1,2,3,4,5,6&brands=5,6,7,8,9,10,11,12,13,14,15,16&addf=4,5,9&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen2:112009
Leonard Rosenthol
2009-11-10 14:01:17 UTC
Permalink
I think you clearly have asked the right question, Mike - what is the purpose of the document?

If it is to be printed (either now or some point in the future), then there is no question that PDF is the correct format. If the format of the content is mandated by law (ala government forms), then there is no question that PDF is the correct format. If the content is to be viewed by some unknown set of users on an unknown set of computers, then again, PDF is the correct format.

BUT if the data is for computer consumption at some point, then PDF may not be proper - or at least PDF w/o the proper structure/tagging.

So yes, it's clearly worth asking the question and would love to know more.

FYI - on a related note, Adobe sponsored a conference last week in DC about "open government" as part of its work with the US Gov't on such initiatives. You can read more at <http://www.adobe.com/opengov/>. A big part of this is working with folks on the proper production of PDFs containing rich, extractable content.

Leonard

-----Original Message-----
From: Mike Marchywka [mailto:***@hotmail.com]
Sent: Tuesday, November 10, 2009 8:47 AM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Mission-critical pdf with thousands of statement of accounts / Design considerations to be verified by the experts




hoping to elicit input from Leonard but I have been fighting
a bunch of bloated web sites that impede automation lately and this
request sounds like it is headed down that path. Since I have
accused adobe of being a big offender in this regard, let me
supply some additional thoughts for comment for rebuttal.




----------------------------------------
Date: Tue, 10 Nov 2009 14:25:52 +0100
Subject: [iText-questions] Mission-critical pdf with thousands of statement of> 1. Requirements
PDF includes up to 3'500 statement of accounts, a statement of account can have up to 50 pages, we do not expect that the PDF has more than 20'000 pages. A typical batch-run creates a PDF with 3'500 statement of accounts with a maximum of 3 pages per statement of account. The parameters for the batch job are: range
So, what HUMAN will actually read this? Wouldn't you be better off publishing something machine readable,  maybe
some summary statistics and a zip file containing the database to let an interested reader run his own queries and analyses on the DB? When is the last time someone you knew read and understood 20,000 pages of accounting information? Have you ever tried to open a 20,000 page PDF file on your home PC, or even download one? You
could probably find a free DB to give to users and write a front end rather
than making a dead-end pdf file too big for anyone to read. You can
convert the account info to text or xml for that matter and combine
a db with a report generator. If you must generate a 20,000 page PDF, do
you plan on adding information to allow users to extract the numbers in
a useful format?






Note: hotmail is now unusable for TEXT, I am moving to ***@gmail.com or also use
***@yahoo.com. Thanks.

Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
415-264-8477 (w)<- use this
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
***@hotmail.com
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.


_________________________________________________________________
Find the right PC with Windows 7 and Windows Live.
http://www.microsoft.com/Windows/pc-scout/laptop-set-criteria.aspx?cbid=wl&filt=200,2400,10,19,1,3,1,7,50,650,2,12,0,1000&cat=1,2,3,4,5,6&brands=5,6,7,8,9,10,11,12,13,14,15,16&addf=4,5,9&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen2:112009
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Mike Marchywka
2009-11-10 14:29:17 UTC
Permalink
<***@NAMBX02.corp.adobe.com>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0


----------------------------------------
From
Date: Tue=2C 10 Nov 2009 06:01:17 -0800
Subject: Re: [iText-questions] Mission-critical pdf with thousands of sta=
tement of accounts / Design considerations to be verified by the experts
I think you clearly have asked the right question=2C Mike - what is the p=
urpose of the document?
If it is to be printed (either now or some point in the future)=2C then t=
here is no question that PDF is the correct format. If the format of the co=
ntent is mandated by law (ala government forms)=2C then there is no questio=
n that PDF is=20

If you don't mind=2C where is it mandated and are there any requirements
on how it is used? As you have pointed out=2C it can be quite eh=2C er well=
=2C
"versatile"( meaning "PDF" could just include a collection of glorified
and reformatted TIFF images or the underlying data=2C hard to know).=20


[>]the correct format. If the content is to be viewed by some unknown set o=
f users on an unknown set of computers=2C then again=2C PDF is the correct =
format.

Well=2C I would take exception to this as a blanket statement. There are ma=
ny very good alternatives
that use fewer resources and are more versatile. Text and HTML being two
examples. For straight text data=2C I'm still not sure what PDF brings to t=
he table here and it can=2C if not used properly=2C really make resource re=
quirements explode. Sure=2C if appearance is more important than informatio=
n you may be able to make a case for pdf over html but I wouldn't let this =
go without a little thought.=20
BUT if the data is for computer consumption at some point=2C then PDF may=
not be proper - or at least PDF w/o the proper structure/tagging.

I would have been in the former camp but you have convinced me that PDF is
salvagable for automated data processing ( what computers are supposed to d=
o LOL).
So yes=2C it's clearly worth asking the question and would love to know m=
ore.
FYI - on a related note=2C Adobe sponsored a conference last week in DC a=
bout "open government" as part of its work with the US Gov't on such initia=
tives. You can read more at . A big part of this is working with folks on t=
he proper production of PDFs containing rich=2C extractable content.

Thanks=2C that maybe of interest!
Leonard
-----Original Message-----
Sent: Tuesday=2C November 10=2C 2009 8:47 AM
Subject: Re: [iText-questions] Mission-critical pdf with thousands of sta=
tement of accounts / Design considerations to be verified by the experts
hoping to elicit input from Leonard but I have been fighting
a bunch of bloated web sites that impede automation lately and this
request sounds like it is headed down that path. Since I have
accused adobe of being a big offender in this regard=2C let me
supply some additional thoughts for comment for rebuttal.
=20
_________________________________________________________________
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=3D9690331&ocid=3DPID24727::T:WLMTAGL:ON:WL:=
en-US:WWL_WIN_evergreen:112009=
Leonard Rosenthol
2009-11-10 14:48:36 UTC
Permalink
The main reason that, for this specific case, that I would pick PDF over HTML or TEXT is simple - pagination!

As described by the user who posted the question, the information is clearly a collection of SEPARATE statements that are combined into a single document. As such, there needs to be a clear delineation between statements such as page break. Neither HTML or TEXT have that concept.

Leonard

-----Original Message-----
From: Mike Marchywka [mailto:***@hotmail.com]
Sent: Tuesday, November 10, 2009 9:29 AM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Mission-critical pdf with thousands of statement of accounts / Design considerations to be verified by the experts


<***@NAMBX02.corp.adobe.com>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0


----------------------------------------
From
Date: Tue=2C 10 Nov 2009 06:01:17 -0800
Subject: Re: [iText-questions] Mission-critical pdf with thousands of sta=
tement of accounts / Design considerations to be verified by the experts
I think you clearly have asked the right question=2C Mike - what is the p=
urpose of the document?
If it is to be printed (either now or some point in the future)=2C then t=
here is no question that PDF is the correct format. If the format of the co=
ntent is mandated by law (ala government forms)=2C then there is no questio=
n that PDF is=20

If you don't mind=2C where is it mandated and are there any requirements
on how it is used? As you have pointed out=2C it can be quite eh=2C er well=
=2C
"versatile"( meaning "PDF" could just include a collection of glorified
and reformatted TIFF images or the underlying data=2C hard to know).=20


[>]the correct format. If the content is to be viewed by some unknown set o=
f users on an unknown set of computers=2C then again=2C PDF is the correct =
format.

Well=2C I would take exception to this as a blanket statement. There are ma=
ny very good alternatives
that use fewer resources and are more versatile. Text and HTML being two
examples. For straight text data=2C I'm still not sure what PDF brings to t=
he table here and it can=2C if not used properly=2C really make resource re=
quirements explode. Sure=2C if appearance is more important than informatio=
n you may be able to make a case for pdf over html but I wouldn't let this =
go without a little thought.=20
BUT if the data is for computer consumption at some point=2C then PDF may=
not be proper - or at least PDF w/o the proper structure/tagging.

I would have been in the former camp but you have convinced me that PDF is
salvagable for automated data processing ( what computers are supposed to d=
o LOL).
So yes=2C it's clearly worth asking the question and would love to know m=
ore.
FYI - on a related note=2C Adobe sponsored a conference last week in DC a=
bout "open government" as part of its work with the US Gov't on such initia=
tives. You can read more at . A big part of this is working with folks on t=
he proper production of PDFs containing rich=2C extractable content.

Thanks=2C that maybe of interest!
Leonard
-----Original Message-----
Sent: Tuesday=2C November 10=2C 2009 8:47 AM
Subject: Re: [iText-questions] Mission-critical pdf with thousands of sta=
tement of accounts / Design considerations to be verified by the experts
hoping to elicit input from Leonard but I have been fighting
a bunch of bloated web sites that impede automation lately and this
request sounds like it is headed down that path. Since I have
accused adobe of being a big offender in this regard=2C let me
supply some additional thoughts for comment for rebuttal.
=20
_________________________________________________________________
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=3D9690331&ocid=3DPID24727::T:WLMTAGL:ON:WL:=
en-US:WWL_WIN_evergreen:112009=

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Martin Weiss
2009-11-10 17:34:43 UTC
Permalink
Hi all iText cracks,
many thanks for your valuable comments. The size of pdf with the statement of
account(s) varies depending on the job input definition (a single account or a
range of accounts): 1 page to 3000 pages. If the pdf is small it can be
downloaded from the IBM host to the workstation and attached to a mail, etc. A
very big pdf should only be printed. Saying that I am unsure how to start a
print job without downloading the file to the workstation... any idea and
further comments are very appreciated.
Thanks again that you take time to support me.
Martin
Leonard Rosenthol
2009-11-10 17:52:39 UTC
Permalink
PDF doesn't support a "streaming model", so you will need to produce the entire PDF before you can start printing it. However, if you know in advance the document is going to be large, you could produce it in "chunks" (smaller pieces) and then just printing each of those in turn.

Leonard

-----Original Message-----
From: Martin Weiss [mailto:***@mw-informatik.ch]
Sent: Tuesday, November 10, 2009 12:35 PM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Mission-critical pdf with thousands of statement of accounts / Design considerations to be verified by the experts

Hi all iText cracks,
many thanks for your valuable comments. The size of pdf with the statement of
account(s) varies depending on the job input definition (a single account or a
range of accounts): 1 page to 3000 pages. If the pdf is small it can be
downloaded from the IBM host to the workstation and attached to a mail, etc. A
very big pdf should only be printed. Saying that I am unsure how to start a
print job without downloading the file to the workstation... any idea and
further comments are very appreciated.
Thanks again that you take time to support me.
Martin


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Martin Weiss
2009-11-10 18:13:46 UTC
Permalink
Hi Leonard,
thanks for your answer. We need the pdf as a whole. However, we know in advance
if the document is going to be large. How can we start the print job without
downloading the whole pdf to a workstation ?
Kind regards,
Martin
Leonard Rosenthol
2009-11-10 18:41:30 UTC
Permalink
Depends where you are printing - and how. What type of printer? Will it take PDF directly or do you need some form of "digital front end" (DFE) or even application (aka Acrobat/Reader)?

Leonard

-----Original Message-----
From: Martin Weiss [mailto:***@mw-informatik.ch]
Sent: Tuesday, November 10, 2009 1:14 PM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Mission-critical pdf with thousands of statement of accounts / Design considerations to be verified by the experts

Hi Leonard,
thanks for your answer. We need the pdf as a whole. However, we know in advance
if the document is going to be large. How can we start the print job without
downloading the whole pdf to a workstation ?
Kind regards,
Martin



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
1T3XT info
2009-11-10 14:06:12 UTC
Permalink
Post by m***@mw-informatik.ch
1. For each statement: prepare a statement document (statement_without_pageNumber.pdf).
Are there any interactive features involved in
statement_without_pageNumber.pdf?

I mean: does the document has more features when viewed digitally, then
when printed on paper?

Using PdfCopy will result in a huge file size.
PdfSmartCopy will make sure that the background is reused
(but it will use more memory and CPU).
But if the answer to the above questions is "no",
I'd advice to change the order of the different operations (can you
wait adding the background to statement_without_pageNumber.pdf?) and
use PdfWriter and PdfImportedPage to create the final document.

Of course: if you can't wait to add the background, you'd need
to find a way to make sure it's not added X times instead of only once.
Martin Weiss
2009-11-10 17:58:10 UTC
Permalink
Hi all iText cracks,
many thanks for your valuable comments and advices. Based on your comments I
have redesigned the application (see below): only one FileOutputStream (step 3
for the final document), step 1 and 2 uses ByteArrayOutputStream. Step 3 uses
PdfWriter and PdfImportedPage instead of PdfCopy (performance enhancements).
The following fact influences my design: (I guess...) I have to stamp per
statement of account (page x/y, first page of a statement has a different table
background template as the followintg pages within the statement). My concern:
how can I prevent that the templates for the first resp. further pages are only
included ONCE in the final document ? Any help would be very appreciated.

Thanks a lot for your support.
Martin

Step 1 (per statement: unstamped)

Document document = new Document();
ByteArrayOutputStream baosUnstamped = new ByteArrayOutputStream();
PdfWriter.getInstance(document, baosUmstamped);
document.open();
PdfContentByte pdfContentByte = document.getDirectContent();
int numberOfColumns = ...;
PdfPTable table = new PdfPTable(numberOfColumns);
itextTable.writeSelectedRows(0, -1, ... , ... , pdfContextByte);
document.close();

Step 2 (per statement: stamped)

unstampedReader = new Reader(baosUmstamped.toByteArray());
ByteArrayOutputStream baosStamped = new ByteArrayOutputStream();
stamper = new PdfStamper(unstampedReader, baosStamped);
int pageNum = ...;
PdfContentByte underContent = stamper.getUnderContent(pageNum);
PdfTemplate templateFirstPage = ...;
PdfTemplate templateSecondPage = ...;
/* for first page within statement of account */
underContentByte.addTemplate(templateFirstPage);
/* for page two and upward within statement of account */
underContentByte.addTemplate(templateSecondPage);
PdfContentByte overContent = stamper.getOverContent(pageNum);
overContent.beginText();
overContent.setTextMatrix(... , ...);
String pageText = ...; /* page x/y */
overContent.showText(pageText);
overContent.endText();

Step 3

FileOutputStream fos = new FileOutputStream(...);
Document finalDocument = new Document();
finalWriter = PdfWriter.getInstance(finalDocument, fos);

/* per stamped statement */
stampedReader = new Reader(baosStamped.toByteArray());
int pageNum = ...;
finalWriter.getImportPage(stampedReader, pageNum); /* for each stamped page */
Martin Weiss
2009-11-10 18:05:52 UTC
Permalink
Sorry for the type:

My concern:
how can I ensure that the templates for the first resp. further pages are
included only ONCE in the final document (no duplications) ? Any help would be
very appreciated.
Thanks a lot for your support.
Martin
Kevin Brown
2009-11-10 20:24:05 UTC
Permalink
I would examine the fact that you intend on using PDF in a mission critical
application dealing with print. In my experience in the high-end print
market, with significant experience in insurance and statements, I have
never seen a customer desire PDF as the print format output for such an
application. Typically, large batch print runs are more suited for
Postscript, AFP or PPML where print vendors have optimized the
interpretation of the content of such files. These printers stage reusable
content in special ways, ripping them to memory and reusing these assets
while outputting pages. The key in such print applications is to drive the
printer at its highest speed and if the print stream is not optimized, it
will not happen. While some printers may also attempt to or have optimized
such things for PDF, it is much fewer than have done for these other more
widely used print formats.

Within the insurance industry, it would be much more common to have a
dual-batch processing model where statements are rendered simultaneously to
PDF for emailing/archiving and to Postscript or AFP for batch print.

Kevin Brown
Leonard Rosenthol
2009-11-10 21:00:09 UTC
Permalink
What you describe is indeed "old school" high volume printing - but that's the way of the past and you can see this as the industry (or at least those working with anything other than Black & White text-only content) is moving.

Certainly when the PDF needs to be converted to some other PDL (such as Postscript), the performance suffers - but when used with native PDF printers and RIPs the performance will exceed that of other formats and give you greater graphical richness.

IN fact, I am currently doing a final review on ISO 16612-2 - known as PDF/VT (PDF for Variable and Transactional Printing) which is specifically targeted at this market segment.

Leonard

-----Original Message-----
From: Kevin Brown [mailto:***@xportability.com]
Sent: Tuesday, November 10, 2009 3:24 PM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Mission-critical pdf with thousands of statement of accounts / Design considerations to be verified by the experts

I would examine the fact that you intend on using PDF in a mission critical
application dealing with print. In my experience in the high-end print
market, with significant experience in insurance and statements, I have
never seen a customer desire PDF as the print format output for such an
application. Typically, large batch print runs are more suited for
Postscript, AFP or PPML where print vendors have optimized the
interpretation of the content of such files. These printers stage reusable
content in special ways, ripping them to memory and reusing these assets
while outputting pages. The key in such print applications is to drive the
printer at its highest speed and if the print stream is not optimized, it
will not happen. While some printers may also attempt to or have optimized
such things for PDF, it is much fewer than have done for these other more
widely used print formats.

Within the insurance industry, it would be much more common to have a
dual-batch processing model where statements are rendered simultaneously to
PDF for emailing/archiving and to Postscript or AFP for batch print.

Kevin Brown



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Kevin Brown
2009-11-10 21:22:06 UTC
Permalink
Leonard:

I do agree with you and cannot argue as the industry should and will move.
The real question for the OP is/was ... does the customer he is working with
have such hardware/RIP/equipment? Is that was this is targeted for?

It is a *very* new, moving set of standards and I have yet to see *any* real
implementation in most all major insurance/financial industry customers. And
given the state of their economies, I would doubt they are tooling up to
spend revenues in an area that already works for them.

Kevin Brown

-----Original Message-----
From: Leonard Rosenthol [mailto:***@adobe.com]
Sent: Tuesday, November 10, 2009 1:00 PM
To: '***@xportability.com'; itext-***@lists.sourceforge.net
Subject: RE: [iText-questions] Mission-critical pdf with thousands of
statement of accounts / Design considerations to be verified by the experts

What you describe is indeed "old school" high volume printing - but that's
the way of the past and you can see this as the industry (or at least those
working with anything other than Black & White text-only content) is moving.

Certainly when the PDF needs to be converted to some other PDL (such as
Postscript), the performance suffers - but when used with native PDF
printers and RIPs the performance will exceed that of other formats and give
you greater graphical richness.

IN fact, I am currently doing a final review on ISO 16612-2 - known as
PDF/VT (PDF for Variable and Transactional Printing) which is specifically
targeted at this market segment.

Leonard

-----Original Message-----
From: Kevin Brown [mailto:***@xportability.com]
Sent: Tuesday, November 10, 2009 3:24 PM
To: itext-***@lists.sourceforge.net
Subject: Re: [iText-questions] Mission-critical pdf with thousands of
statement of accounts / Design considerations to be verified by the experts

I would examine the fact that you intend on using PDF in a mission critical
application dealing with print. In my experience in the high-end print
market, with significant experience in insurance and statements, I have
never seen a customer desire PDF as the print format output for such an
application. Typically, large batch print runs are more suited for
Postscript, AFP or PPML where print vendors have optimized the
interpretation of the content of such files. These printers stage reusable
content in special ways, ripping them to memory and reusing these assets
while outputting pages. The key in such print applications is to drive the
printer at its highest speed and if the print stream is not optimized, it
will not happen. While some printers may also attempt to or have optimized
such things for PDF, it is much fewer than have done for these other more
widely used print formats.

Within the insurance industry, it would be much more common to have a
dual-batch processing model where statements are rendered simultaneously to
PDF for emailing/archiving and to Postscript or AFP for batch print.

Kevin Brown



----------------------------------------------------------------------------
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus
on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
iText-questions mailing list
iText-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
wasegraves
2009-11-10 21:40:53 UTC
Permalink
I could be mistaken; but I didn't see anything in Martin's OP suggesting the application was a "Print" application.

That said, I offer the following as evidence the insurance industry is still moving very slowly:
Mike Marchywka
2009-11-10 21:30:57 UTC
Permalink
[ sorry if this format is bad, hotmail has been a mess lately ... ]






----------------------------------------
Date: Tue, 10 Nov 2009 12:24:05 -0800
Subject: Re: [iText-questions] Mission-critical pdf with thousands of statement of accounts / Design considerations to be verified by the experts
I would examine the fact that you intend on using PDF in a mission critical
application dealing with print. In my experience in the high-end print
Well, at least feeding a printer suggests human readability as a goal. LOL.
Often it seems people generate huge PDF files thinking they have used some
modern standard and are therefore doing good but they end up taking out
computer readable information and even impeding human readability in many
cases due to size and bw needs when formatting may be extraneous and
just an expensive nicety.
Within the insurance industry, it would be much more common to have a
dual-batch processing model where statements are rendered simultaneously to
PDF for emailing/archiving and to Postscript or AFP for batch print.
At least in this usage, making a single pdf where it is possible for
the printer to re-use things that occur a zillion times makes sense
instead of sending the same logo in thousands of smaller files for
each statement. I don't understand the paper handling mechanics or
any of the dead tree stuff. My
concern is creating a single huge pdf that has less information than
the source data and is difficult to use for online viewing.
Kevin Brown
_________________________________________________________________
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=9690331&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009
Kevin Brown
2009-11-10 21:49:52 UTC
Permalink
Maybe this has transgressed and should be moved offline as I am sure the
majority of iText Users are not necessarily interested. If the OP and Mike
and Leonard wish to do so, I am all for it. Direct email to me is fine and
I'm always happy to impart my 2 cents (for what it's worth).

PS: Bill, wow. And that was to my point. If it isn't broke, they are very
unlikely to fix it.

Kevin Brown
Kevin Brown
1970-01-01 00:00:00 UTC
Permalink
13 0 obj
<</Subject ()
/CreationDate (D:20091110150330-06'00')
/Title (Generated PDF Document)
/Author (CheckFree i-Solutions)
/Producer (iText by lowagie.com \(r1.00 - ps122\))
/Creator ( 5.4.0)
/ModDate (D:20091110150330-06'00')
endobj

Cheers,
Bill Segraves


----- Original Message ----
From: Kevin Brown <***@xportability.com>
To: itext-***@lists.sourceforge.net
Sent: Tue, November 10, 2009 4:22:06 PM
Subject: Re: [iText-questions] Mission-critical pdf with thousands of statement of accounts / Design considerations to be verified by the experts

Leonard:

I do agree with you and cannot argue as the industry should and will move.
The real question for the OP is/was ... does the customer he is working with
have such hardware/RIP/equipment? Is that was this is targeted for?

It is a *very* new, moving set of standards and I have yet to see *any* real
implementation in most all major insurance/financial industry customers. And
given the state of their economies, I would doubt they are tooling up to
spend revenues in an area that already works for them.

Kevin Brown

P.S. I shudder to think of this as a "print" application. Can you imagine a 10,000-page printed report, i.e., two cases of paper. It seems the distribution of printed reports might be the part that MAKES the mission critical. ;-)

....
Martin Weiss
2009-11-11 08:33:57 UTC
Permalink
Hi all iText cracks,

it seems that I have started an strategic discussion about the usage of PDFs
and I appreciate the various comments. Fact is, the insurance company decided
to create a PDF with statement of accounts, in Java with iText on IBM z/OS
(Websphere XD). See requirements at the end of the mail. I have to implement
the application and intent to apply best iText practice. Please let me know if
you doubt that we succeed with the chosen approach (PDF file size, iText, ...).

Based on your comments I have redesigned the application (see below): only one
FileOutputStream (step 3 for the final document), step 1 and 2 uses
ByteArrayOutputStream. Step 3 uses PdfWriter and PdfImportedPage instead of
PdfCopy (performance enhancements). The following fact influences my design: (I
guess...) I have to stamp per statement of account (page x/y, first page of a
statement has a different table background template as the followintg pages
within the statement). My concern: how can I ensure that the templates for the
first resp. further pages are only included ONCE in the final document ? Any
help would be very appreciated.

Thanks a lot for your support.
Martin

Step 1 (per statement: unstamped)

Document document = new Document();
ByteArrayOutputStream baosUnstamped = new ByteArrayOutputStream();
PdfWriter.getInstance(document, baosUmstamped);
document.open();
PdfContentByte pdfContentByte = document.getDirectContent();
int numberOfColumns = ...;
PdfPTable table = new PdfPTable(numberOfColumns);
itextTable.writeSelectedRows(0, -1, ... , ... , pdfContextByte);
document.close();

Step 2 (per statement: stamped)

unstampedReader = new Reader(baosUmstamped.toByteArray());
ByteArrayOutputStream baosStamped = new ByteArrayOutputStream();
stamper = new PdfStamper(unstampedReader, baosStamped);
int pageNum = ...;
PdfContentByte underContent = stamper.getUnderContent(pageNum);
PdfTemplate templateFirstPage = ...;
PdfTemplate templateSecondPage = ...;
/* for first page within statement of account */
underContentByte.addTemplate(templateFirstPage);
/* for page two and upward within statement of account */
underContentByte.addTemplate(templateSecondPage);
PdfContentByte overContent = stamper.getOverContent(pageNum);
overContent.beginText();
overContent.setTextMatrix(... , ...);
String pageText = ...; /* page x/y */
overContent.showText(pageText);
overContent.endText();

Step 3

FileOutputStream fos = new FileOutputStream(...);
Document finalDocument = new Document();
finalWriter = PdfWriter.getInstance(finalDocument, fos);

/* per stamped statement */
stampedReader = new Reader(baosStamped.toByteArray());
int pageNum = ...;
finalWriter.getImportPage(stampedReader, pageNum); /* for each stamped page */


Requirements
PDF includes up to 3'500 statement of accounts, a statement of account can have
up to 50 pages, we do not expect that the PDF has more than 20'000 pages. A
typical batch-run creates a PDF with 3'500 statement of accounts with a maximum
of 3 pages per statement of account. The parameters for the batch job are:
range of account numbers, period, 2 - 4 variable table columns (fixed columns
are: debit, credit, balance). The first page of the statement includes: page
header with the address of the account owner, a table with 5 to 7 columns with
a table header and 30 body rows. Page two and upward of the statement has no
page header, layout of the table is equivalent to page one (table header), with
50 body rows. End of statement: empty row followed by a total row for debit,
credit and balance. Each statement has its own page number in the footer (page
x/y). The footer has a fix text (same content and position for all pages,
independent of statement). Each odd table body row is gray underlayed ("table
background"), including empty rows at the end of the table.

Loading...