Html2XdocBean (Maven Html2XDoc Plugin 1.5-SNAPSHOT API)

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.maven.html2xdoc
Class Html2XdocBean

java.lang.Object
  org.apache.maven.html2xdoc.Html2XdocBean

public class Html2XdocBean
extends Object
extends Object

A simple bean for converting a HTML document into an XDoc compliant XML document. This could be done via XSLT but is a little more complex than it might first appear so its done via Java code instead.

Author:: James Strachan

Constructor Summary
`Html2XdocBean()`

Method Summary
`protected void`	`addSections(org.dom4j.Element output, org.dom4j.Element body)` Iterates thorugh the given body looking for h1, h2, h3 nodes and creating the associated section elements.
`protected org.dom4j.Node`	`cloneNode(org.dom4j.Node node)` Normalizes the whitespace of any Elements
`org.dom4j.Document`	`convert(org.dom4j.Document html)` Converts the given HTML document into the corresponding XDoc format of XML
`protected int`	`determineHeadingLevel(org.dom4j.Node node)` Determines the heading level of the node.
`protected List`	`getBodyContent(List content)` Returns a copy of the body content, removing any whitespace from the beginning and end.
`protected String`	`getSectionText(org.dom4j.Node node)`
`protected boolean`	`isCharacterData(org.dom4j.Node node)` Specifies whether the node is character data and should be passed as straight text to the resultant html.
`protected boolean`	`isHeading(org.dom4j.Node node)` Specifies whether the node is a heading node.
`protected boolean`	`isPre(org.dom4j.Node node)`
`protected boolean`	`isTextFormatting(org.dom4j.Node node)` Specifies whether the node is a text modifying construct that should be passed as is to the resultant html.
`protected boolean`	`isWhitespace(org.dom4j.Node node)`
`protected void`	`makeSection(org.dom4j.Element output, org.dom4j.Node node)` Creates a section or subsection as necessary based on the node for the output document.
`protected boolean`	`needsNewSection(org.dom4j.Node node)` Determines if a new section is needed which is based on whether the node's a heading level and equal to or less than the current section's heading level.
`protected boolean`	`shouldBreakPara(org.dom4j.Node node)`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

Html2XdocBean

public Html2XdocBean()

Method Detail

convert

public org.dom4j.Document convert(org.dom4j.Document html)

Converts the given HTML document into the corresponding XDoc format of XML

Parameters:: html - the input html document
Returns:: Document

addSections

protected void addSections(org.dom4j.Element output,
                           org.dom4j.Element body)

Iterates thorugh the given body looking for h1, h2, h3 nodes and creating the associated section elements. Any text nodes contained inside the body are wrapped in a <p> element

Parameters:: output - the output destination; body - the block of HTML markup to convert

isTextFormatting

protected boolean isTextFormatting(org.dom4j.Node node)

Specifies whether the node is a text modifying construct that should be passed as is to the resultant html. Such as an anchor '<a>'.

Parameters:: node - the node to check
Returns:: true if the node is used to modify the formatting of the text; otherwise, false

isCharacterData

protected boolean isCharacterData(org.dom4j.Node node)

Specifies whether the node is character data and should be passed as straight text to the resultant html.

Parameters:: node - the node to check
Returns:: true if the node is a text node; otherwise, false.

isHeading

protected boolean isHeading(org.dom4j.Node node)

Specifies whether the node is a heading node.

Parameters:: node - the node to check
Returns:: true if the given node is a heading element (h1, h2, h3 etc); otherwise, false

determineHeadingLevel

protected int determineHeadingLevel(org.dom4j.Node node)

Determines the heading level of the node.

Parameters:: node - the node to check
Returns:: the integer level of the heading

makeSection

protected void makeSection(org.dom4j.Element output,
                           org.dom4j.Node node)

Creates a section or subsection as necessary based on the node for the output document.

Parameters:: output - the output document to attach the section; node - the node to base making a section on

getSectionText

protected String getSectionText(org.dom4j.Node node)

Returns:: the section text for the given node. If the node contains an embedded element (such as an <a> element) then return its text

needsNewSection

protected boolean needsNewSection(org.dom4j.Node node)

Determines if a new section is needed which is based on whether the node's a heading level and equal to or less than the current section's heading level.

Parameters:: node - the node to check
Returns:: true if the current node's information means for a new section; otherwise, false

shouldBreakPara

protected boolean shouldBreakPara(org.dom4j.Node node)

Returns:: true if the paragraph should be split, such as for a br or p tag

getBodyContent

protected List getBodyContent(List content)

Returns a copy of the body content, removing any whitespace from the beginning and end.

Parameters:: content - the content node list to obtain body content from
Returns:: List

isPre

protected boolean isPre(org.dom4j.Node node)

Parameters:: node - the node to check
Returns:: true if the node is a pre tag; otherwise false.

isWhitespace

protected boolean isWhitespace(org.dom4j.Node node)

Parameters:: node - the node to check
Returns:: true if the given node is a whitespace text node

cloneNode

protected org.dom4j.Node cloneNode(org.dom4j.Node node)

Normalizes the whitespace of any Elements

Parameters:: node - the node to clone
Returns:: Node the cloned node