XhtmlBaseParser (Doxia :: Core 1.5 API)

java.lang.Object
- org.apache.maven.doxia.parser.AbstractParser
- - org.apache.maven.doxia.parser.AbstractXmlParser
  - - org.apache.maven.doxia.parser.XhtmlBaseParser

All Implemented Interfaces:

LogEnabled, HtmlMarkup, Markup, XmlMarkup, Parser
```
public class XhtmlBaseParser
extends AbstractXmlParser
implements HtmlMarkup
```
Common base parser for xhtml events.

Since:

1.1

Version:

$Id: XhtmlBaseParser.java 1465336 2013-04-07 07:39:00Z hboutemy $

Author:

Jason van Zyl, ltheussl

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.maven.doxia.parser.AbstractXmlParser
  AbstractXmlParser.CachedFileEntityResolver

Field Summary
- Fields inherited from interface org.apache.maven.doxia.markup.HtmlMarkup
  A, ABBR, ACRONYM, ADDRESS, APPLET, AREA, B, BASE, BASEFONT, BDO, BIG, BLOCKQUOTE, BODY, BR, BUTTON, CAPTION, CDATA_TYPE, CENTER, CITE, CODE, COL, COLGROUP, DD, DEL, DFN, DIR, DIV, DL, DT, EM, ENTITY_TYPE, FIELDSET, FONT, FORM, FRAME, FRAMESET, H1, H2, H3, H4, H5, H6, HEAD, HR, HTML, I, IFRAME, IMG, INPUT, INS, ISINDEX, KBD, LABEL, LEGEND, LI, LINK, MAP, MENU, META, NOFRAMES, NOSCRIPT, OBJECT, OL, OPTGROUP, OPTION, P, PARAM, PRE, Q, S, SAMP, SCRIPT, SELECT, SMALL, SPAN, STRIKE, STRONG, STYLE, SUB, SUP, TABLE, TAG_TYPE_END, TAG_TYPE_SIMPLE, TAG_TYPE_START, TBODY, TD, TEXTAREA, TFOOT, TH, THEAD, TITLE, TR, TT, U, UL, VAR
- Fields inherited from interface org.apache.maven.doxia.markup.XmlMarkup
  BANG, CDATA, DOCTYPE_START, ENTITY_START, XML_NAMESPACE
- Fields inherited from interface org.apache.maven.doxia.markup.Markup
  COLON, EOL, EQUAL, GREATER_THAN, LEFT_CURLY_BRACKET, LEFT_SQUARE_BRACKET, LESS_THAN, MINUS, PLUS, QUOTE, RIGHT_CURLY_BRACKET, RIGHT_SQUARE_BRACKET, SEMICOLON, SLASH, SPACE, STAR
- Fields inherited from interface org.apache.maven.doxia.parser.Parser
  ROLE, TXT_TYPE, UNKNOWN_TYPE, XML_TYPE

Constructor Summary

Constructors
Constructor and Description

XhtmlBaseParser()

Constructors
Constructor and Description
`XhtmlBaseParser()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected boolean`	`baseEndTag(XmlPullParser parser, Sink sink)` Goes through a common list of possible html end tags.
`protected boolean`	`baseStartTag(XmlPullParser parser, Sink sink)` Goes through a common list of possible html start tags.
`protected void`	`consecutiveSections(int newLevel, Sink sink)` Make sure sections are nested consecutively.
`protected int`	`getSectionLevel()` Return the current section level.
`protected void`	`handleCdsect(XmlPullParser parser, Sink sink)` Handles CDATA sections.
`protected void`	`handleComment(XmlPullParser parser, Sink sink)` Handles comments.
`protected void`	`handleEndTag(XmlPullParser parser, Sink sink)` Goes through the possible end tags.
`protected void`	`handleStartTag(XmlPullParser parser, Sink sink)` Goes through the possible start tags.
`protected void`	`handleText(XmlPullParser parser, Sink sink)` Handles text events.
`protected void`	`init()` Initialize the parser.
`protected void`	`initXmlParser(XmlPullParser parser)` Initializes the parser with custom entities or other options.
`protected boolean`	`isScriptBlock()` Checks if we are currently inside a <script> tag.
`protected boolean`	`isVerbatim()` Checks if we are currently inside a <pre> tag.
`void`	`parse(Reader source, Sink sink)` Parses the given source model and emits Doxia events into the given sink.
`protected void`	`setSectionLevel(int newLevel)` Set the current section level.
`protected String`	`validAnchor(String id)` Checks if the given id is a valid Doxia id and if not, returns a transformed one.
`protected void`	`verbatim_()` Stop verbatim mode.
`protected void`	`verbatim()` Start verbatim mode.

Methods inherited from class org.apache.maven.doxia.parser.AbstractXmlParser
getAttributesFromParser, getLocalEntities, getText, getType, handleEntity, handleUnknown, isCollapsibleWhitespace, isIgnorableWhitespace, isTrimmableWhitespace, isValidate, parse, setCollapsibleWhitespace, setIgnorableWhitespace, setTrimmableWhitespace, setValidate

Methods inherited from class org.apache.maven.doxia.parser.AbstractParser
doxiaVersion, enableLogging, executeMacro, getBasedir, getLog, getMacroManager, isSecondParsing, setSecondParsing

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - XhtmlBaseParser
```
public XhtmlBaseParser()
```
- Method Detail
  - parse
```
public void parse(Reader source,
         Sink sink)
           throws ParseException
```
    Parses the given source model and emits Doxia events into the given sink.
    
    Specified by:
    
    parse in interface Parser
    
    Overrides:
    
    parse in class AbstractXmlParser
    
    Parameters:
    source - not null reader that provides the source document. You could use newReader methods from ReaderFactory.
    sink - A sink that consumes the Doxia events.
    
    Throws:
    
    ParseException - if the model could not be parsed.
  - initXmlParser
```
protected void initXmlParser(XmlPullParser parser)
                      throws XmlPullParserException
```
    Initializes the parser with custom entities or other options. Adds all XHTML (HTML 4.0) entities to the parser so that they can be recognized and resolved without additional DTD.
    
    Overrides:
    
    initXmlParser in class AbstractXmlParser
    
    Parameters:
    parser - A parser, not null.
    
    Throws:
    
    XmlPullParserException - if there's a problem initializing the parser
  - baseStartTag
```
protected boolean baseStartTag(XmlPullParser parser,
                   Sink sink)
```
    Goes through a common list of possible html start tags. These include only tags that can go into the body of a xhtml document and so should be re-usable by different xhtml-based parsers.
    
    The currently handled tags are:
    
    <h2>, <h3>, <h4>, <h5>, <h6>, <p>, <pre>, <ul>, <ol>, <li>, <dl>, <dt>, <dd>, <b>, <strong>, <i>, <em>, <code>, <samp>, <tt>, <a>, <table>, <tr>, <th>, <td>, <caption>, <br/>, <hr/>, <img/>.
    
    Parameters:
    parser - A parser.
    sink - the sink to receive the events.
    
    Returns:
    True if the event has been handled by this method, i.e. the tag was recognized, false otherwise.
  - baseEndTag
```
protected boolean baseEndTag(XmlPullParser parser,
                 Sink sink)
```
    Goes through a common list of possible html end tags. These should be re-usable by different xhtml-based parsers. The tags handled here are the same as for baseStartTag(XmlPullParser,Sink), except for the empty elements (<br/>, <hr/>, <img/>).
    Parameters: parser - A parser. sink - the sink to receive the events. Returns: True if the event has been handled by this method, false otherwise.
  handleStartTag protected void handleStartTag(XmlPullParser parser, Sink sink) throws XmlPullParserException, MacroExecutionException Goes through the possible start tags. Just calls baseStartTag(XmlPullParser,Sink), this should be overridden by implementing parsers to include additional tags. Specified by: handleStartTag in class AbstractXmlParser Parameters: parser - A parser, not null. sink - the sink to receive the events. Throws: XmlPullParserException - if there's a problem parsing the model MacroExecutionException - if there's a problem executing a macro handleEndTag protected void handleEndTag(XmlPullParser parser, Sink sink) throws XmlPullParserException, MacroExecutionException Goes through the possible end tags. Just calls baseEndTag(XmlPullParser,Sink), this should be overridden by implementing parsers to include additional tags. Specified by: handleEndTag in class AbstractXmlParser Parameters: parser - A parser, not null. sink - the sink to receive the events. Throws: XmlPullParserException - if there's a problem parsing the model MacroExecutionException - if there's a problem executing a macro handleText protected void handleText(XmlPullParser parser, Sink sink) throws XmlPullParserException Handles text events. This is a default implementation, if the parser points to a non-empty text element, it is emitted as a text event into the specified sink. Overrides: handleText in class AbstractXmlParser Parameters: parser - A parser, not null. sink - the sink to receive the events. Not null. Throws: XmlPullParserException - if there's a problem parsing the model handleComment protected void handleComment(XmlPullParser parser, Sink sink) throws XmlPullParserException Handles comments. This is a default implementation, all data are emitted as comment events into the specified sink. Overrides: handleComment in class AbstractXmlParser Parameters: parser - A parser, not null. sink - the sink to receive the events. Not null. Throws: XmlPullParserException - if there's a problem parsing the model handleCdsect protected void handleCdsect(XmlPullParser parser, Sink sink) throws XmlPullParserException Handles CDATA sections. This is a default implementation, all data are emitted as text events into the specified sink. Overrides: handleCdsect in class AbstractXmlParser Parameters: parser - A parser, not null. sink - the sink to receive the events. Not null. Throws: XmlPullParserException - if there's a problem parsing the model consecutiveSections protected void consecutiveSections(int newLevel, Sink sink) Make sure sections are nested consecutively. HTML doesn't have any sections, only sectionTitles (<h2> etc), that means we have to open close any sections that are missing in between. For instance, if the following sequence is parsed: <h3></h3> <h6></h6> we have to insert two section starts before we open the <h6>. In the following sequence <h6></h6> <h3></h3> we have to close two sections before we open the <h3>. The current level is set to newLevel afterwards. Parameters: newLevel - the new section level, all upper levels have to be closed. sink - the sink to receive the events. getSectionLevel protected int getSectionLevel() Return the current section level. Returns: the current section level. setSectionLevel protected void setSectionLevel(int newLevel) Set the current section level. Parameters: newLevel - the new section level. verbatim_ protected void verbatim_() Stop verbatim mode. verbatim protected void verbatim() Start verbatim mode. isVerbatim protected boolean isVerbatim() Checks if we are currently inside a <pre> tag. Returns: true if we are currently in verbatim mode. isScriptBlock protected boolean isScriptBlock() Checks if we are currently inside a <script> tag. Returns: true if we are currently inside <script> tags. Since: 1.1.1. validAnchor protected String validAnchor(String id) Checks if the given id is a valid Doxia id and if not, returns a transformed one. Parameters: id - The id to validate. Returns: A transformed id or the original id if it was already valid. See Also: DoxiaUtils.encodeId(String) init protected void init() Initialize the parser. This is called first by Parser.parse(java.io.Reader, org.apache.maven.doxia.sink.Sink) and can be used to set the parser into a clear state so it can be re-used. Overrides: init in class AbstractParser

Class XhtmlBaseParser

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.maven.doxia.parser.AbstractXmlParser

Field Summary

Fields inherited from interface org.apache.maven.doxia.markup.HtmlMarkup

Fields inherited from interface org.apache.maven.doxia.markup.XmlMarkup

Fields inherited from interface org.apache.maven.doxia.markup.Markup

Fields inherited from interface org.apache.maven.doxia.parser.Parser

Constructor Summary

Method Summary

Methods inherited from class org.apache.maven.doxia.parser.AbstractXmlParser

Methods inherited from class org.apache.maven.doxia.parser.AbstractParser

Methods inherited from class java.lang.Object

Constructor Detail

XhtmlBaseParser

Method Detail

parse

initXmlParser

baseStartTag

baseEndTag

handleStartTag

handleEndTag

handleText

handleComment

handleCdsect

consecutiveSections

getSectionLevel

setSectionLevel

verbatim_

verbatim

isVerbatim

isScriptBlock

validAnchor

init