317 lines
11 KiB
Plaintext
317 lines
11 KiB
Plaintext
==============================================
|
||
Design document for ODT parsing and generation
|
||
==============================================
|
||
|
||
:Author: ts
|
||
|
||
The scope of this document is to define the design for a first implementaion of
|
||
ODT (Open Document Text) support in the eZ Document component. The parts of the
|
||
Document component designed in this document do not affect other Open Document
|
||
formats like spreadsheets or graphics. The goal is to define the infrastructure
|
||
for reading and writing ODT documents, i.e. to convert existing ODT documents
|
||
into the internal representation of the Document component (DocBook XML) and to
|
||
generate new ODT documents from the internal representation.
|
||
|
||
------------
|
||
Requirements
|
||
------------
|
||
|
||
The following sections describe the requirements for the ODT handling in the
|
||
Document component. The first section defines requirements for reading ODT, the
|
||
second for writing ODT and the third section defines requirements for later
|
||
enhancements to be kept in mind during the initial implementation.
|
||
|
||
Import
|
||
======
|
||
|
||
The Document component should be able to parse existing ODT documents and to
|
||
convert them to the internal format used by the Document component (DocBook
|
||
XML). Requirements for the import process are:
|
||
|
||
- Read plain XML ODT files
|
||
- Parse all necessary structural ODT elements
|
||
- Convert ODT elements properly into equivalent or similar DocBook representations
|
||
- Maintaining the content semantics provided by the ODT as good as possible
|
||
- Maintain meta information provided by the ODT as good as possible
|
||
- Develop a first heuristical approach of how ODT styling information can be
|
||
used to determine semantics of an element.
|
||
|
||
Export
|
||
======
|
||
|
||
The Document component should be able to generate new ODT documents from an
|
||
existing internal representations (DocBook XML). Requirements for this process
|
||
are:
|
||
|
||
- Write plain XML ODT files
|
||
- Convert DocBook representation elements to their corresponding ODT
|
||
representations
|
||
- Maintain the document structure
|
||
- Maintain content and metadata semantics as good as possible
|
||
- Styling of ODT elements.
|
||
|
||
Later enhancements
|
||
==================
|
||
|
||
In the first step of ODT integration only rudimentary features for import and
|
||
export should be realized. The following ideas must be kept in mind during the
|
||
design and implementation, to ensure future extensibility.
|
||
|
||
- Reading / writing of ODT package files (ZIP)
|
||
- ODF can be presented either as a single XML file or as a ZIP package
|
||
containg multiple XML files and other related files (e.g. images) in
|
||
addition.
|
||
- Reading and writing this format is not necessary from the start, but since
|
||
it is the default way for users to store ODT, it should be supported later
|
||
on.
|
||
- The handling of ZIP files requires a tie-in with the Archive component or
|
||
similar.
|
||
|
||
------
|
||
Design
|
||
------
|
||
|
||
In the first development cycle, only the structural conversion between ODT and
|
||
DocBook XML will be considered. In addition, rudimentary styling information
|
||
will be taken into account. The reading and writing of ODF packages is not
|
||
considered in this design.
|
||
|
||
Import
|
||
======
|
||
|
||
Three different steps are necessary to import an ODT document and convert it
|
||
into DocbookXml:
|
||
|
||
1. Read the XML data
|
||
2. Preprocess the ODT representation
|
||
3. Actual conversion to DocBook XML representation
|
||
|
||
Step 1 will be performed through the DOM extension in PHP, the internal
|
||
representation of an ODT will be a DOM treee. The second step performs
|
||
pre-processing on this DOM tree. Pre-processing is e.g. needed to assign
|
||
additional semantics to the ODT elements to achieve a better rendering.
|
||
Finally, the pre-processed DOM tree will be visited, to achieve the actual
|
||
creation of the DocBook XML representation.
|
||
|
||
Pre-processing
|
||
--------------
|
||
|
||
The step of pre-processing the ODT representation is necessary to assign
|
||
DocBoox semantics to the ODT elements. ODT and DocBook XML have some
|
||
similarities, but also differ widely in some parts. The pre-processing step
|
||
performs manipultations on the ODT representation and potentially adds
|
||
information which is utilized by the latter conversion step to create a correct
|
||
semantical representation.
|
||
|
||
This process works similar to filters in the XHTML document import. The class
|
||
level design of this feature is inspired by the XHTML handling: Filters can be
|
||
registered which pre-process the incoming ODT in the given order.
|
||
|
||
A filter may process the following steps on a DOMElement:
|
||
|
||
- Add type information to an XML element to determine into which DocBook XML
|
||
element the element will be converted
|
||
- Add attribute information to determine the attributes in the DocBook XML
|
||
representation
|
||
- Add additional elements or element hierarchies
|
||
|
||
The resulting DOM tree must not necessarily be valid ODT anymore, to reflect
|
||
the latter DocBook structure in a better way.
|
||
|
||
The first implemented filter will only perform rudimentary operations on the
|
||
DOM to assign basic semantical information to the elements. A second
|
||
implementation will be an additional filter which takes some styling
|
||
information into account to enhance this information. Futher filters can be
|
||
implemented by third parties to extend or replace these mechanisms.
|
||
|
||
Conversion
|
||
----------
|
||
|
||
The conversion process itself will mostly visit the DOM tree and utilize the
|
||
information, attached to the elements in the pre-processing step, to generate a
|
||
DocBook XML with the corresponding content. The filter pre-processing step is
|
||
responsible to annotate all significant elements properly so that the
|
||
conversion can use them.
|
||
|
||
Flat ODT documents (consisting of only 1 XML file), which will purely be
|
||
handled in the first version of ODT support, may contain image content embeded.
|
||
To extract those, the user my specify a target directory or the system temp dir
|
||
will be used as the default. The content will then be referenced in DocBook
|
||
from this location.
|
||
|
||
.. note:: We should check if it is possible to define and handle data URLs in
|
||
docbook. May be problematic with other formats though. (kn)
|
||
|
||
Export
|
||
======
|
||
|
||
.. note:: First sentence a bit unclear ;) (kn)
|
||
|
||
The export process for ODT works similar to PDF rendering, except for that is a
|
||
little bit less strict. The internal DocBook representation is converted to the
|
||
desired ODT representation according to its semantics.
|
||
|
||
Based on the DocBook XML elements, the user can define styles using a
|
||
simplified CSS syntax (see PDF). Each of the style definitions is converted to
|
||
an automatic style in the resulting ODT document. ODT elements affected by a
|
||
certain style get this style applied.
|
||
|
||
Styles
|
||
======
|
||
|
||
A style is defined for each styling information. There is no direct assignement
|
||
of layouting elements to styling information, but always a style in between.
|
||
The <style/> element has the following properties:
|
||
|
||
name
|
||
The internal name of the style. Must be unique over all styles, in
|
||
concatenation with the style:family.
|
||
displayname
|
||
Name of the style to display in GUIs. If left out, the name is used.
|
||
family
|
||
Family collection of the style. One of (in context of text documents):
|
||
text
|
||
Style that might be applied to any piece of text.
|
||
paragraph
|
||
Style for complete paragraphs and headings.
|
||
section
|
||
Style to be applied to sections of text in text documents (@TODO: Not
|
||
handled yet!).
|
||
ruby
|
||
Not handled, yet.
|
||
table
|
||
table-column
|
||
table-row
|
||
table-cell
|
||
table-page
|
||
chart
|
||
default
|
||
graphic
|
||
parent-style-name
|
||
Identifies a parten style. Style properties of the parent are inherited and
|
||
maybe overwritten. If no parent style is specified, the default style for
|
||
the styles family will be the base for inheritence.
|
||
next-style-name
|
||
Next paragraph style. If a new paragraph is started after the element this
|
||
style is applied to, this paragraph will have the style named in this
|
||
element. Only sensible for editing in a GUI.
|
||
list-style-name
|
||
Style used in headings and paragraphs of lists contained in the styled
|
||
element, only if the lists have no list-style applied themselves.
|
||
master-page-name
|
||
Styles with a master page applied will force a page break before the
|
||
element and load the styles from the master-page then.
|
||
data-style-name
|
||
Styling of table cells (e.g. formulas, currencies, ...).
|
||
class
|
||
Information for GUIs, to sort styles into categories.
|
||
default-outline-level
|
||
"Transforms" a paragraph into some kind of heading, without making it a
|
||
heading itself. Senseless.
|
||
|
||
Style mappings (replacing a style conditionally with another style) will not be
|
||
taken into account, yet.
|
||
|
||
Types of styles
|
||
---------------
|
||
|
||
default-style
|
||
Default styles must be defined for each used style family. The default
|
||
style is always the base of inheritance for the style family.
|
||
page-layout
|
||
Definition of the global page properties, format and stuff.
|
||
header-style / footer-style
|
||
Styling of the header and footer area.
|
||
master-page
|
||
Definition of a master page. Defines header / footer, forms, styles for the
|
||
page and more.
|
||
|
||
Table templates
|
||
---------------
|
||
|
||
Not yet handled.
|
||
|
||
Font face declaration
|
||
---------------------
|
||
|
||
Correspond to the @font-face declaration of CSS2.
|
||
|
||
Data styles
|
||
-----------
|
||
|
||
Not yet handled.
|
||
|
||
List styles
|
||
-----------
|
||
|
||
Define properties of a list (not its content!). A style for each list level. If
|
||
no style exists for a specific level, the next lower level style is used. If
|
||
none is defined, a default style is used. name and display-name properties as
|
||
ususal. Can have the consecutive-numbering attribute defined, to specify if
|
||
different list levels restart numbering or not
|
||
|
||
List styles
|
||
-----------
|
||
|
||
Define properties of a list (not its content!). A style for each list level. If
|
||
no style exists for a specific level, the next lower level style is used. If
|
||
none is defined, a default style is used. name and display-name properties as
|
||
ususal. Can have the consecutive-numbering attribute defined, to specify if
|
||
different list levels restart numbering or not.
|
||
|
||
List-level styles
|
||
^^^^^^^^^^^^^^^^^
|
||
|
||
A list-level style commonly has a level attribute, defining, to which
|
||
list-level the style is applied. All other attribute depend on the type of
|
||
list. A list may contain different kinds of lists, depending on the depth of
|
||
the level.
|
||
|
||
Number level styles
|
||
~~~~~~~~~~~~~~~~~~~
|
||
|
||
Defining an enumeration list level using a list-level-style-number element. Has
|
||
the following attributes:
|
||
|
||
style-name
|
||
Defines the text style for list item numbers.
|
||
num-format
|
||
Defines the formatting of the list item numbers.
|
||
display-levels
|
||
Defines how many level numberings to display (e.g. 1.2.3 or just 1.2).
|
||
start-value
|
||
Defines the first number to be used by the very first element of the
|
||
defined level.
|
||
|
||
Bullet level style
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
Attributes defining a list level to be an item list.
|
||
|
||
text-style
|
||
Style for the bullet character.
|
||
bullet-character
|
||
A unicode character to be used as the bullet.
|
||
num-format-prefix / num-format-suffix
|
||
Prefix and suffix to be placed before / after a bullet.
|
||
bullet-relative-size
|
||
Relative size (percentage, integer) of the bullet in respect to the item
|
||
content.
|
||
|
||
Image level style
|
||
~~~~~~~~~~~~~~~~~
|
||
|
||
Creates items preceeded by images. The image to be used is either referenced or
|
||
stored using base64 encoded binary data.
|
||
|
||
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: rst
|
||
fill-column: 79
|
||
End:
|
||
vim: et syn=rst tw=79
|