Using the Doxia Sink API

Transforming documents

Doxia can be used to transform an arbitrary input document to any supported output format. The following snippet shows how to use a Doxia Parser to transform an apt file to html:

  File userDir = new File( System.getProperty ( "user.dir" ) );
  File inputFile = new File( userDir, "test.apt" );
  File outputFile = new File( userDir, "test.html" );

  SinkFactory sinkFactory = (SinkFactory) lookup( SinkFactory.ROLE, "html" ); // Plexus lookup

  Sink sink = sinkFactory.createSink( outputFile.getParentFile(), outputFile.getName() ) );

  Parser parser = (AptParser) lookup( Parser.ROLE, "apt" ); // Plexus lookup

  Reader reader = ReaderFactory.newReader( inputFile, "UTF-8" );

  parser.parse( reader, sink );

It is recommended that you use Plexus to look up the parser. In principle you could instantiate the parser directly ( Parser parser = new AptParser(); ) but then some special features like macros will not be available.

You could also use the Doxia Converter Tool to parse a given file/dir to another file/dir.

Generating documents

The snippet below gives a simple example of how to generate a document using the Doxia Sink API.

    /**
     * Generate a simple document and emit it
     * into the specified sink. The sink is flushed but not closed.
     *
     * @param sink The sink to receive the events.
     */
    public static void generate( Sink sink )
    {
        sink.head();

        sink.title();
        sink.text( "Title" );
        sink.title_();

        sink.author();
        sink.text( "Author" );
        sink.author_();

        sink.date();
        sink.text( "Date" );
        sink.date_();

        sink.head_();


        sink.body();

        sink.paragraph();
        sink.text( "A paragraph of text." );
        sink.paragraph_();

        sink.section1();
        sink.sectionTitle1();
        sink.text( "Section title" );
        sink.sectionTitle1_();

        sink.paragraph();
        sink.text( "Paragraph in section." );
        sink.paragraph_();

        sink.section1_();

        sink.body_();

        sink.flush();
    }

A more complete example that also shows the 'canonical' order of events to use when generating a document, can be found in the Doxia SinkTestDocument class.

Passing attributes to Sink events

A number of methods in the Sink API enable passing a set of attributes to many sink events. A typical use case is:

SinkEventAttributeSet atts = new SinkEventAttributeSet();
atts.addAttribute( SinkEventAttributes.ALIGN, "center" );

sink.paragraph( atts );

The kind of attributes supported depends on the event and the sink implementation. The sink API specifies a list of suggested attribute names that sinks are expected to recognize and parsers are expected to use when emitting events.

Avoid sink.rawText!

In Doxia 1.0 it was a common practice to use sink.rawText() to generate elements that were not supported by the Sink API. For example, the following snippet could be used to generate a styled HTML <div> block:

sink.RawText( "<div style=\"cool\">" );
sink.text( "A text with a cool style." );
sink.rawText( "</div>" );

This has a major drawback however: it only works if the receiving Sink is a HTML Sink. In other words, the above method will not work for target documents in any other format than HTML (think of the FO Sink to generate a pdf, or a LaTeX sink,...).

In Doxia 1.1 a new method unknown() was added to the Sink API that can be used to emit an arbitrary event without making special assumptions about the receiving Sink. Depending on the parameters, a Sink may decide whether or not to process the event, emit it as raw text, as a comment, log it, etc.

The correct way to generate the above <div> block is now:

SinkEventAttributeSet atts = new SinkEventAttributeSet();
atts.addAttribute( SinkEventAttributes.STYLE, "cool" );

sink.unknown( "div", new Object[]{new Integer( HtmlMarkup.TAG_TYPE_START )}, atts );
sink.text( "A text with a cool style." );
sink.unknown( "div", new Object[]{new Integer( HtmlMarkup.TAG_TYPE_END )}, null );

Read the javadocs of the unknown() method in the Sink interface and the XhtmlbaseSink for information on the method parameters. Note that an arbitrary sink may be expected to ignore the unknown event completely!

In general, the rawText method should be avoided alltogether when emitting events into an arbitrary Sink.

How to inject javascript code into HTML

Related to the above, here is the correct way of injecting a javascript snippet into a Sink:

// the javascript code is emitted within a commented CDATA section
// so we have to start with a newline and comment the CDATA closing in the end
// note that the sink will replace the newline by the system EOL
String javascriptCode = "\n function javascriptFunction() {...} \n //";

SinkEventAttributeSet atts = new SinkEventAttributeSet();
atts.addAttribute( SinkEventAttributes.TYPE, "text/javascript" );

sink.unknown( "script", new Object[]{new Integer( HtmlMarkup.TAG_TYPE_START )}, atts );
sink.unknown( "cdata", new Object[]{new Integer( HtmlMarkup.CDATA_TYPE ), javascriptCode }, null );
sink.unknown( "script", new Object[]{new Integer( HtmlMarkup.TAG_TYPE_END )}, null );

References