public class XhtmlBaseParser extends AbstractXmlParser implements HtmlMarkup
AbstractXmlParser.CachedFileEntityResolver
A, ABBR, ACRONYM, ADDRESS, APPLET, AREA, B, BASE, BASEFONT, BDO, BIG, BLOCKQUOTE, BODY, BR, BUTTON, CAPTION, CDATA_TYPE, CENTER, CITE, CODE, COL, COLGROUP, DD, DEL, DFN, DIR, DIV, DL, DT, EM, ENTITY_TYPE, FIELDSET, FONT, FORM, FRAME, FRAMESET, H1, H2, H3, H4, H5, H6, HEAD, HR, HTML, I, IFRAME, IMG, INPUT, INS, ISINDEX, KBD, LABEL, LEGEND, LI, LINK, MAP, MENU, META, NOFRAMES, NOSCRIPT, OBJECT, OL, OPTGROUP, OPTION, P, PARAM, PRE, Q, S, SAMP, SCRIPT, SELECT, SMALL, SPAN, STRIKE, STRONG, STYLE, SUB, SUP, TABLE, TAG_TYPE_END, TAG_TYPE_SIMPLE, TAG_TYPE_START, TBODY, TD, TEXTAREA, TFOOT, TH, THEAD, TITLE, TR, TT, U, UL, VAR
BANG, CDATA, DOCTYPE_START, ENTITY_START, XML_NAMESPACE
COLON, EOL, EQUAL, GREATER_THAN, LEFT_CURLY_BRACKET, LEFT_SQUARE_BRACKET, LESS_THAN, MINUS, PLUS, QUOTE, RIGHT_CURLY_BRACKET, RIGHT_SQUARE_BRACKET, SEMICOLON, SLASH, SPACE, STAR
ROLE, TXT_TYPE, UNKNOWN_TYPE, XML_TYPE
Constructor and Description |
---|
XhtmlBaseParser() |
Modifier and Type | Method and Description |
---|---|
protected boolean |
baseEndTag(XmlPullParser parser,
Sink sink)
Goes through a common list of possible html end tags.
|
protected boolean |
baseStartTag(XmlPullParser parser,
Sink sink)
Goes through a common list of possible html start tags.
|
protected void |
consecutiveSections(int newLevel,
Sink sink)
Make sure sections are nested consecutively.
|
protected int |
getSectionLevel()
Return the current section level.
|
protected void |
handleCdsect(XmlPullParser parser,
Sink sink)
Handles CDATA sections.
|
protected void |
handleComment(XmlPullParser parser,
Sink sink)
Handles comments.
|
protected void |
handleEndTag(XmlPullParser parser,
Sink sink)
Goes through the possible end tags.
|
protected void |
handleStartTag(XmlPullParser parser,
Sink sink)
Goes through the possible start tags.
|
protected void |
handleText(XmlPullParser parser,
Sink sink)
Handles text events.
|
protected void |
init()
Initialize the parser.
|
protected void |
initXmlParser(XmlPullParser parser)
Initializes the parser with custom entities or other options.
|
protected boolean |
isScriptBlock()
Checks if we are currently inside a <script> tag.
|
protected boolean |
isVerbatim()
Checks if we are currently inside a <pre> tag.
|
void |
parse(Reader source,
Sink sink)
Parses the given source model and emits Doxia events into the given sink.
|
protected void |
setSectionLevel(int newLevel)
Set the current section level.
|
protected String |
validAnchor(String id)
Checks if the given id is a valid Doxia id and if not, returns a transformed one.
|
protected void |
verbatim_()
Stop verbatim mode.
|
protected void |
verbatim()
Start verbatim mode.
|
getAttributesFromParser, getLocalEntities, getText, getType, handleEntity, handleUnknown, isCollapsibleWhitespace, isIgnorableWhitespace, isTrimmableWhitespace, isValidate, parse, setCollapsibleWhitespace, setIgnorableWhitespace, setTrimmableWhitespace, setValidate
doxiaVersion, enableLogging, executeMacro, getBasedir, getLog, getMacroManager, isSecondParsing, setSecondParsing
public void parse(Reader source, Sink sink) throws ParseException
parse
in interface Parser
parse
in class AbstractXmlParser
source
- not null reader that provides the source document.
You could use newReader
methods from ReaderFactory
.sink
- A sink that consumes the Doxia events.ParseException
- if the model could not be parsed.protected void initXmlParser(XmlPullParser parser) throws XmlPullParserException
initXmlParser
in class AbstractXmlParser
parser
- A parser, not null.XmlPullParserException
- if there's a problem initializing the parserprotected boolean baseStartTag(XmlPullParser parser, Sink sink)
Goes through a common list of possible html start tags. These include only tags that can go into the body of a xhtml document and so should be re-usable by different xhtml-based parsers.
The currently handled tags are:
<h2>, <h3>, <h4>, <h5>, <h6>, <p>, <pre>,
<ul>, <ol>, <li>, <dl>, <dt>, <dd>, <b>, <strong>,
<i>, <em>, <code>, <samp>, <tt>, <a>, <table>, <tr>,
<th>, <td>, <caption>, <br/>, <hr/>, <img/>.
parser
- A parser.sink
- the sink to receive the events.protected boolean baseEndTag(XmlPullParser parser, Sink sink)
Goes through a common list of possible html end tags.
These should be re-usable by different xhtml-based parsers.
The tags handled here are the same as for baseStartTag(XmlPullParser,Sink)
,
except for the empty elements (<br/>, <hr/>, <img/>
).
parser
- A parser.sink
- the sink to receive the events.protected void handleStartTag(XmlPullParser parser, Sink sink) throws XmlPullParserException, MacroExecutionException
baseStartTag(XmlPullParser,Sink)
, this should be
overridden by implementing parsers to include additional tags.handleStartTag
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events.XmlPullParserException
- if there's a problem parsing the modelMacroExecutionException
- if there's a problem executing a macroprotected void handleEndTag(XmlPullParser parser, Sink sink) throws XmlPullParserException, MacroExecutionException
baseEndTag(XmlPullParser,Sink)
, this should be
overridden by implementing parsers to include additional tags.handleEndTag
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events.XmlPullParserException
- if there's a problem parsing the modelMacroExecutionException
- if there's a problem executing a macroprotected void handleText(XmlPullParser parser, Sink sink) throws XmlPullParserException
This is a default implementation, if the parser points to a non-empty text element, it is emitted as a text event into the specified sink.
handleText
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events. Not null.XmlPullParserException
- if there's a problem parsing the modelprotected void handleComment(XmlPullParser parser, Sink sink) throws XmlPullParserException
This is a default implementation, all data are emitted as comment events into the specified sink.
handleComment
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events. Not null.XmlPullParserException
- if there's a problem parsing the modelprotected void handleCdsect(XmlPullParser parser, Sink sink) throws XmlPullParserException
This is a default implementation, all data are emitted as text events into the specified sink.
handleCdsect
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events. Not null.XmlPullParserException
- if there's a problem parsing the modelprotected void consecutiveSections(int newLevel, Sink sink)
HTML doesn't have any sections, only sectionTitles (<h2> etc), that means we have to open close any sections that are missing in between.
For instance, if the following sequence is parsed:
<h3></h3> <h6></h6>we have to insert two section starts before we open the
<h6>
.
In the following sequence
<h6></h6> <h3></h3>we have to close two sections before we open the
<h3>
.
The current level is set to newLevel afterwards.
newLevel
- the new section level, all upper levels have to be closed.sink
- the sink to receive the events.protected int getSectionLevel()
protected void setSectionLevel(int newLevel)
newLevel
- the new section level.protected void verbatim_()
protected void verbatim()
protected boolean isVerbatim()
protected boolean isScriptBlock()
<script>
tags.protected String validAnchor(String id)
id
- The id to validate.DoxiaUtils.encodeId(String)
protected void init()
Parser.parse(java.io.Reader, org.apache.maven.doxia.sink.Sink)
and can be used
to set the parser into a clear state so it can be re-used.init
in class AbstractParser
Copyright © 2005-2013 The Apache Software Foundation. All Rights Reserved.