public class Xhtml5BaseParser extends AbstractXmlParser implements HtmlMarkup
AbstractXmlParser.CachedFileEntityResolver
A, ABBR, ADDRESS, AREA, ARTICLE, ASIDE, AUDIO, B, BASE, BDI, BDO, BLOCKQUOTE, BODY, BR, BUTTON, CANVAS, CAPTION, CDATA_TYPE, CITE, CODE, COL, COLGROUP, COMMAND, DATA, DATALIST, DD, DEL, DETAILS, DFN, DIALOG, DIV, DL, DT, EM, EMBED, ENTITY_TYPE, FIELDSET, FIGCAPTION, FIGURE, FOOTER, FORM, H1, H2, H3, H4, H5, H6, HEAD, HEADER, HGROUP, HR, HTML, I, IFRAME, IMG, INPUT, INS, KBD, KEYGEN, LABEL, LEGEND, LI, LINK, MAIN, MAP, MARK, MENU, MENUITEM, META, METER, NAV, NOSCRIPT, OBJECT, OL, OPTGROUP, OPTION, OUTPUT, P, PARAM, PICTURE, PRE, PROGRESS, Q, RB, RP, RT, RTC, RUBY, S, SAMP, SCRIPT, SECTION, SELECT, SMALL, SOURCE, SPAN, STRONG, STYLE, SUB, SUMMARY, SUP, SVG, TABLE, TAG_TYPE_END, TAG_TYPE_SIMPLE, TAG_TYPE_START, TBODY, TD, TEMPLATE, TEXTAREA, TFOOT, TH, THEAD, TIME, TITLE, TR, TRACK, U, UL, VAR, VIDEO, WBR
BANG, CDATA, DOCTYPE_START, ENTITY_START, XML_NAMESPACE
COLON, EOL, EQUAL, GREATER_THAN, LEFT_CURLY_BRACKET, LEFT_SQUARE_BRACKET, LESS_THAN, MINUS, PLUS, QUOTE, RIGHT_CURLY_BRACKET, RIGHT_SQUARE_BRACKET, SEMICOLON, SLASH, SPACE, STAR
TXT_TYPE, UNKNOWN_TYPE, XML_TYPE
Constructor and Description |
---|
Xhtml5BaseParser() |
Modifier and Type | Method and Description |
---|---|
protected boolean |
baseEndTag(String elementName,
SinkEventAttributeSet attribs,
Sink sink) |
protected boolean |
baseEndTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
Sink sink)
Goes through a common list of possible html end tags.
|
protected boolean |
baseStartTag(String elementName,
SinkEventAttributeSet attribs,
Sink sink) |
protected boolean |
baseStartTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
Sink sink)
Goes through a common list of possible html5 start tags.
|
protected void |
consecutiveSections(int newLevel,
Sink sink,
SinkEventAttributeSet attribs)
Deprecated.
Use
emitHeadingSections(int, Sink, boolean) instead. |
protected void |
emitHeadingSections(int newLevel,
Sink sink,
boolean enforceNewSection)
Make sure sections are nested consecutively and correctly inserted for the given heading level
|
protected int |
getSectionLevel()
Return the current section level.
|
protected void |
handleCdsect(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
Sink sink)
Handles CDATA sections.
|
protected void |
handleComment(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
Sink sink)
Handles comments.
|
protected void |
handleEndTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
Sink sink)
Goes through the possible end tags.
|
protected void |
handleStartTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
Sink sink)
Goes through the possible start tags.
|
protected void |
handleText(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
Sink sink)
Handles text events.
|
protected void |
init()
Initialize the parser.
|
protected void |
initXmlParser(org.codehaus.plexus.util.xml.pull.XmlPullParser parser)
Initializes the parser with custom entities or other options.
|
protected boolean |
isScriptBlock()
Checks if we are currently inside a <script> tag.
|
protected boolean |
isVerbatim()
Checks if we are currently inside a <pre> tag.
|
void |
parse(Reader source,
Sink sink,
String reference)
Parses the given source model and emits Doxia events into the given sink.
|
protected void |
setSectionLevel(int newLevel)
Set the current section level.
|
protected String |
validAnchor(String id)
Checks if the given id is a valid Doxia id and if not, returns a transformed one.
|
protected void |
verbatim_()
Stop verbatim mode.
|
protected void |
verbatim()
Start verbatim mode.
|
getAddDefaultEntities, getAttributesFromParser, getLocalEntities, getText, getType, handleEntity, handleUnknown, handleUnknown, isCollapsibleWhitespace, isIgnorableWhitespace, isTrimmableWhitespace, isValidate, setAddDefaultEntities, setCollapsibleWhitespace, setIgnorableWhitespace, setTrimmableWhitespace, setValidate
addSinkWrapperFactory, doxiaVersion, executeMacro, getBasedir, getMacroManager, getSinkWrapperFactories, getWrappedSink, isEmitAnchorsForIndexableEntries, isEmitComments, isSecondParsing, parse, parse, parse, setEmitAnchorsForIndexableEntries, setEmitComments, setSecondParsing
public Xhtml5BaseParser()
public void parse(Reader source, Sink sink, String reference) throws ParseException
parse
in interface Parser
parse
in class AbstractXmlParser
source
- not null reader that provides the source document.
You could use newReader
methods from ReaderFactory
.sink
- A sink that consumes the Doxia events.reference
- a string identifying the source (for file based documents the source file path)ParseException
- if the model could not be parsed.protected void initXmlParser(org.codehaus.plexus.util.xml.pull.XmlPullParser parser) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
initXmlParser
in class AbstractXmlParser
parser
- A parser, not null.org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem initializing the parserprotected boolean baseStartTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink)
Goes through a common list of possible html5 start tags. These include only tags that can go into the body of an xhtml5 document and so should be re-usable by different xhtml-based parsers.
The currently handled tags are:
<article>, <nav>, <aside>, <section>, <h1>, <h2>, <h3>,
<h4>, <h5>, <header>, <main>, <footer>, <em>, <strong>,
<small>, <s>, <cite>, <q>, <dfn>, <abbr>, <i>,
<b>, <code>, <samp>, <kbd>, <sub>, <sup>, <u>,
<mark>, <ruby>, <rb>, <rt>, <rtc>, <rp>, <bdi>,
<bdo>, <span>, <ins>, <del>, <p>, <pre>, <ul>,
<ol>, <li>, <dl>, <dt>, <dd>, <a>, <table>,
<tr>, <th>, <td>, <caption>, <br/>, <wbr/>, <hr/>,
<img/>.
parser
- A parser.sink
- the sink to receive the events.protected boolean baseStartTag(String elementName, SinkEventAttributeSet attribs, Sink sink)
protected boolean baseEndTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink)
Goes through a common list of possible html end tags.
These should be re-usable by different xhtml-based parsers.
The tags handled here are the same as for baseStartTag(XmlPullParser,Sink)
,
except for the empty elements (<br/>, <hr/>, <img/>
).
parser
- A parser.sink
- the sink to receive the events.protected boolean baseEndTag(String elementName, SinkEventAttributeSet attribs, Sink sink)
protected void handleStartTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException, MacroExecutionException
baseStartTag(XmlPullParser,Sink)
, this should be
overridden by implementing parsers to include additional tags.handleStartTag
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events.org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the modelMacroExecutionException
- if there's a problem executing a macroprotected void handleEndTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException, MacroExecutionException
baseEndTag(XmlPullParser,Sink)
, this should be
overridden by implementing parsers to include additional tags.handleEndTag
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events.org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the modelMacroExecutionException
- if there's a problem executing a macroprotected void handleText(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
This is a default implementation, if the parser points to a non-empty text element, it is emitted as a text event into the specified sink.
handleText
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events. Not null.org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the modelprotected void handleComment(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
This is a default implementation, all data are emitted as comment events into the specified sink.
handleComment
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events. Not null.org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the modelprotected void handleCdsect(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
This is a default implementation, all data are emitted as text events into the specified sink.
handleCdsect
in class AbstractXmlParser
parser
- A parser, not null.sink
- the sink to receive the events. Not null.org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the model@Deprecated protected void consecutiveSections(int newLevel, Sink sink, SinkEventAttributeSet attribs)
emitHeadingSections(int, Sink, boolean)
instead.emitHeadingSections(int, Sink, boolean)
with last argument being true
newLevel
- sink
- attribs
- protected void emitHeadingSections(int newLevel, Sink sink, boolean enforceNewSection)
HTML5 heading tags H1 to H5 imply same level sections in Sink API (compare with Sink.sectionTitle(int, SinkEventAttributes)
).
However (X)HTML5 allows headings without explicit surrounding section elements and is also
less strict with non-consecutive heading levels.
This methods both closes open sections which have been added for previous headings and/or opens
sections necessary for the new heading level.
At least one section needs to be opened directly prior the heading due to Sink API restrictions.
For instance, if the following sequence is parsed:
<h2></h2> <h5></h5>
we have to insert two section starts before we open the <h5>
.
In the following sequence
<h5></h5> <h2></h2>
we have to close two sections before we open the <h2>
.
The current heading level is set to newLevel afterwards.
newLevel
- the new section level, all upper levels have to be closed.sink
- the sink to receive the events.enforceNewSection
- whether to enforce a new section or notprotected int getSectionLevel()
protected void setSectionLevel(int newLevel)
newLevel
- the new section level.protected void verbatim_()
protected void verbatim()
protected boolean isVerbatim()
protected boolean isScriptBlock()
<script>
tags.protected String validAnchor(String id)
id
- The id to validate.DoxiaUtils.encodeId(String)
protected void init()
AbstractParser.parse(java.io.Reader, org.apache.maven.doxia.sink.Sink)
and can be used
to set the parser into a clear state so it can be re-used.init
in class AbstractParser
Copyright © 2005–2024 The Apache Software Foundation. All rights reserved.