365 lines
11 KiB
Plaintext
365 lines
11 KiB
Plaintext
|
==================================
|
||
|
Design document for PDF generation
|
||
|
==================================
|
||
|
|
||
|
:Author: kn
|
||
|
|
||
|
This is design document for the PDF generation in the eZ Components document
|
||
|
component. PDF documents should be created from Docbook documents, which can
|
||
|
be generated from each markup available in the document component.
|
||
|
|
||
|
The requirements which should be designed in this document are specified in
|
||
|
the pdf_requirements.txt document.
|
||
|
|
||
|
Layout directives
|
||
|
=================
|
||
|
|
||
|
The PDF document will be created from the Docbook document object. It will
|
||
|
already contain basic formatting rules for meaningful default document layout.
|
||
|
Additional rules may be passed to the document in various ways.
|
||
|
|
||
|
Generally a CSS like approach will be used to encode layout information. This
|
||
|
allows both, the easily readable addressing of nodes in an XML tree, like
|
||
|
already known from CSS, and humanly readable formatting options.
|
||
|
|
||
|
A limited subset of CSS will be used for now for addressing elements inside the
|
||
|
Docbook XML tree. The grammar for those rules will be::
|
||
|
|
||
|
Address ::= Element ( Rule )*
|
||
|
Rule ::= '>'? Element
|
||
|
Element ::= ElementName ( '.' ClassName | '#' ElementId )
|
||
|
|
||
|
ClassName ::= [A-Za-z_-]+
|
||
|
ElementName ::= XMLName* | '*'
|
||
|
ElementId ::= XMLName
|
||
|
|
||
|
* XMLName references to http://www.w3.org/TR/REC-xml/#NT-Name
|
||
|
|
||
|
The semantics of this simple subset of addressing directives are the same as in
|
||
|
CSS. A second level title could for example then be addressed by::
|
||
|
|
||
|
section title
|
||
|
|
||
|
The formatting options are also mostly the same as in CSS, but again only
|
||
|
using a subset of the definitions available in CSS and with some additional
|
||
|
formatting options, relevant especially for PDF rendering. The used formatting
|
||
|
options depend on the renderer - unknown formatting options may issue errors
|
||
|
or warnings.
|
||
|
|
||
|
The PDF document wrapper class will implement Iterator and ArrayAccess to
|
||
|
access the layout directives, like the following example shows::
|
||
|
|
||
|
$pdf = new ezcDocumentPdf();
|
||
|
$pdf->createFromDocbook( $docbook );
|
||
|
|
||
|
$pdf->styles['article > section title']['font-size'] = '1.6em';
|
||
|
|
||
|
Directives which are specified later will always overwrite earlier directives,
|
||
|
for each each formatting option specified in the later directive. The
|
||
|
overwriting of formatting options will NOT depend on the complexity of the
|
||
|
node addressing like in CSS.
|
||
|
|
||
|
Importing and exporting layout directives
|
||
|
-----------------------------------------
|
||
|
|
||
|
The layout directives can be exported and imported to and from files, so that
|
||
|
users of the component may store a custom PDF layout. The storage format will
|
||
|
again very much look like a simplified variant of CSS::
|
||
|
|
||
|
File ::= Directive+
|
||
|
Directive ::= Address '{' Formatting* '}'
|
||
|
Formatting ::= Name ':' '"' Value '"' ';'
|
||
|
Name ::= [A-Za-z-]+
|
||
|
Value ::= [^"]+
|
||
|
|
||
|
C-style comments are allowed anywhere in the definition file, like ```/* ..
|
||
|
*/``` and ```// ...```.
|
||
|
|
||
|
Importing and exporting styles may be accomblished by::
|
||
|
|
||
|
$pdf->styles->load( 'styles.pcss' );
|
||
|
|
||
|
List of formatting options
|
||
|
--------------------------
|
||
|
|
||
|
There will be formatting options just processed, like they are defined in CSS,
|
||
|
and some custom options. The options just reused from CSS are:
|
||
|
|
||
|
- background-color
|
||
|
- background-image
|
||
|
- background-position
|
||
|
- background-repeat
|
||
|
- border-color
|
||
|
- border-width
|
||
|
- border-bottom-color
|
||
|
- border-bottom-width
|
||
|
- border-left-color
|
||
|
- border-left-width
|
||
|
- border-right-color
|
||
|
- border-right-width
|
||
|
- border-top-color
|
||
|
- border-top-width
|
||
|
- color
|
||
|
- direction
|
||
|
- font-family
|
||
|
- font-size
|
||
|
- font-style
|
||
|
- font-variant
|
||
|
- font-weight
|
||
|
- line-height
|
||
|
- list-style
|
||
|
- list-style-position
|
||
|
- list-style-type
|
||
|
- margin
|
||
|
- margin-bottom
|
||
|
- margin-left
|
||
|
- margin-right
|
||
|
- margin-top
|
||
|
- orphans
|
||
|
- padding
|
||
|
- padding-bottom
|
||
|
- padding-left
|
||
|
- padding-right
|
||
|
- padding-top
|
||
|
- page-break-after
|
||
|
- page-break-before
|
||
|
- text-align
|
||
|
- text-decoration
|
||
|
- text-indent
|
||
|
- white-space
|
||
|
- widows
|
||
|
- word-spacing
|
||
|
|
||
|
Custom properties are:
|
||
|
|
||
|
text-columns
|
||
|
Number of text text columns in one section.
|
||
|
text-column-spacing
|
||
|
The margin between multiple text comlumns on one page
|
||
|
page-size
|
||
|
Size of pages
|
||
|
page-orientation
|
||
|
Orientation of pages
|
||
|
|
||
|
Not all options can be applied to each element. The renderer might complain on
|
||
|
invalid options, depending on the configured error level.
|
||
|
|
||
|
Special layout elements
|
||
|
=======================
|
||
|
|
||
|
Footers & Headers
|
||
|
-----------------
|
||
|
|
||
|
Footnotes and Headers are special layout elements, which can be rendered
|
||
|
manually by the user of the component. They can be considered as small
|
||
|
sub-documents, but their renderer receives additional information about the
|
||
|
current page they are rendered on.
|
||
|
|
||
|
They can be set like::
|
||
|
|
||
|
$pdf = new ezcDocumentPdf();
|
||
|
$pdf->createFromDocbook( $docbook );
|
||
|
|
||
|
$pdf->footer = new myDocumentPdfPart();
|
||
|
|
||
|
Each of those parts can render itself and calculate the appropriate bounding.
|
||
|
There might be extensions from the basic PDFPart class, which again render small
|
||
|
Docbook sub documents into one header, or just take a string, replacing
|
||
|
placeholders with page dependent contents.
|
||
|
|
||
|
Possible implementations would be:
|
||
|
|
||
|
ezcDocumentPdfDocbookPart
|
||
|
Receives a docbook document and renders it using a a defined style at the
|
||
|
header or footer of the current page. Placeholders in the text,
|
||
|
represented by, for example, entities might be replaced.
|
||
|
ezcDocumentPdfStringPart
|
||
|
Receives a simple string, in which simple placeholders are replaced.
|
||
|
|
||
|
Other elements
|
||
|
--------------
|
||
|
|
||
|
There are various possible full site elements, which might be rendered before or
|
||
|
after the actual contents. Those are for example:
|
||
|
|
||
|
- Cover page
|
||
|
- Bibliography
|
||
|
- Back page
|
||
|
|
||
|
To add those to on PDF document you can create a pdf set, which is then rendered
|
||
|
into one file::
|
||
|
|
||
|
$pdf = new ezcDocumentPdf();
|
||
|
$pdf->createFromDocbook( $docbook );
|
||
|
|
||
|
$set = new ezcDocumentPdfSet();
|
||
|
$set->parts = array(
|
||
|
new ezcDocumentPdfPdfPart( 'title.pdf' ),
|
||
|
$customTableOfContents,
|
||
|
$pdf,
|
||
|
$bibliography,
|
||
|
);
|
||
|
$set->render( 'my.pdf' );
|
||
|
|
||
|
Some of the documents aggregated in one set can of course again be documents
|
||
|
created from Docbook documents. Each element in the set may contain custom
|
||
|
layout directives.
|
||
|
|
||
|
For the inclusion of other document parts into a PdfSet you are expected to
|
||
|
extend from the PDF base class and implement you custom functionality there.
|
||
|
This could mean generating idexes, or a bibliography from the content.
|
||
|
|
||
|
Drivers
|
||
|
=======
|
||
|
|
||
|
The actual PDF renderer calls methods on the driver, which abstract the quirks
|
||
|
of the respective implementations. There will be drivers for at least:
|
||
|
|
||
|
- pecl/libharu
|
||
|
- TCPDF
|
||
|
|
||
|
Renderer
|
||
|
========
|
||
|
|
||
|
The renderer will be responsible for the actual typesetting. It will receive a
|
||
|
Docbook document, apply the given layout directives and calculate the
|
||
|
appropriate calls to the driver from those.
|
||
|
|
||
|
The renderer optionally receives a set of helper objects, which perform relevant
|
||
|
parts of the typesetting, like:
|
||
|
|
||
|
Hyphenator
|
||
|
Class implementing hyphenation for a specific language. We might provide a
|
||
|
default implementation, which reads standard hyphenation files.
|
||
|
|
||
|
The renderer state will be shared using an object representing the page
|
||
|
currently processed, which contains information about the already covered
|
||
|
areas and therefore the still available space.
|
||
|
|
||
|
Using such a state object, the state can easily be shared between different
|
||
|
renderers for different aspects of the rendering process. This should allow us
|
||
|
to write simpler rendering classes, which should be better maintainable then
|
||
|
one big renderer class, which methods would take care of all aspects.
|
||
|
|
||
|
This page state object, knowing about free space on the current page, for
|
||
|
example allows to float text around images spanning multiple paragraphs,
|
||
|
because the already covered space is encoded. This allows all renderers for
|
||
|
the different aspects to reuse this knowledge and depend their rendering on
|
||
|
this. The space already covered on a page will most probably be represented by
|
||
|
a list of bounding boxes.
|
||
|
|
||
|
Which renderer classes can be separated, will show up during implementation,
|
||
|
but those for example could be:
|
||
|
|
||
|
ezcDocumentPdfParagraphRenderer
|
||
|
Takes care of rendering the Docbook inline markup inside one paragraph.
|
||
|
Respects orphans and widows and might be required to split paragraphs.
|
||
|
|
||
|
ezcDocumentPdfTableRenderer
|
||
|
Renders tables. It might be useful to even split this up more into a table
|
||
|
row and cell renderer.
|
||
|
|
||
|
Additional renderer features
|
||
|
----------------------------
|
||
|
|
||
|
If the used driver class implements the respective interfaces the renderer will
|
||
|
also offer to sign PDF documents, or add write protection (or similar) to the
|
||
|
PDF document.
|
||
|
|
||
|
Example
|
||
|
=======
|
||
|
|
||
|
A full example for the creation of a PDF document from a HTML page could look
|
||
|
like::
|
||
|
|
||
|
<?php
|
||
|
$html = new ezcDocumentXhtml();
|
||
|
$html->loadFile( 'http://ezcomponents.org/introduction' );
|
||
|
|
||
|
$pdf = new ezcDocumentPdf();
|
||
|
$pdf->createFromDocbook( $html->getAsDocbook() );
|
||
|
|
||
|
// Load some custom layout directives
|
||
|
$pdf->style->load( 'my_styles.pcss' );
|
||
|
$pdf->style['article']['text-columns'] = 3;
|
||
|
|
||
|
// Set a custom header
|
||
|
$pdf->header = new ezcDocumentPdfStringPart(
|
||
|
'%title by %author - %pageNum / %pageCount'
|
||
|
);
|
||
|
|
||
|
// Set a custom paragraph renderer
|
||
|
$pdf->renderer->paragraph = new myPdfParagraphRenderer();
|
||
|
|
||
|
// Use the hyphenator with a german dictionary
|
||
|
$pdf->renderer->hyphenator = new myDictionaryHyphenator(
|
||
|
'/path/to/german.dict'
|
||
|
);
|
||
|
|
||
|
// Store the generated PDF
|
||
|
file_put_contents( 'my.pdf', $pdf );
|
||
|
?>
|
||
|
|
||
|
A file containing the layout directives could look like::
|
||
|
|
||
|
article {
|
||
|
page-size: "A4";
|
||
|
}
|
||
|
|
||
|
paragraph {
|
||
|
font-family: "Bitstream Vera Sans";
|
||
|
font-size: "1em";
|
||
|
}
|
||
|
|
||
|
article > title {
|
||
|
font-weight: "bold";
|
||
|
}
|
||
|
|
||
|
section title {
|
||
|
font-weight: "normal";
|
||
|
}
|
||
|
|
||
|
Classes
|
||
|
=======
|
||
|
|
||
|
The classes implemented for the PDF generation are:
|
||
|
|
||
|
ezcDocumentPdf
|
||
|
Base class, representing the PDF generation. Aggregates the style
|
||
|
information, the docbook source document, renderer and page parts like
|
||
|
footer and header.
|
||
|
|
||
|
ezcDocumentPdfSet
|
||
|
Class aggregating multiple ezcDocumentPdf objects, to create one single
|
||
|
PDF document from multiple parts, like a cover page, the actual content, a
|
||
|
bibliography, etc.
|
||
|
|
||
|
ezcDocumentPdfStyles
|
||
|
Class containing the PDF layout directives, also implements loading and
|
||
|
storing of those layout directives.
|
||
|
|
||
|
ezcDocumentPdfPart
|
||
|
Abstract base class for page parts, like headers and footers. Renders the
|
||
|
respective part and will be extended by multiple concrete
|
||
|
implementations, which offer convient rendering methods.
|
||
|
|
||
|
ezcDocumentPdfRenderer
|
||
|
Basic renderer class, which aggregates renderers for distinct page
|
||
|
elements, like paragraphs and tables, and dispatches the rendering to
|
||
|
them. Also maintains the ezcDocumentPdfPage state object, which contains
|
||
|
information of already covered parts of the pages.
|
||
|
|
||
|
ezcDocumentPdfParagraphRenderer
|
||
|
Example for the concrete aspect specific renderer classes, which only
|
||
|
implement the rendering of small parts of a document, like single
|
||
|
paragraphs, tables, or table cell contents.
|
||
|
|
||
|
ezcDocumentPdfPage
|
||
|
State object describing the current state of a single page in the PDF
|
||
|
document, like still available space.
|
||
|
|
||
|
ezcDocumentPdfHyphenator
|
||
|
Abstract base class for hyphenation implementations for more accurate word
|
||
|
wrapping.
|
||
|
|