org.apache.maven.html2xdoc
Class Html2XdocBean

java.lang.Object
  extended by org.apache.maven.html2xdoc.Html2XdocBean

public class Html2XdocBean
extends Object

A simple bean for converting a HTML document into an XDoc compliant XML document. This could be done via XSLT but is a little more complex than it might first appear so its done via Java code instead.

Author:
James Strachan

Constructor Summary
Html2XdocBean()
           
 
Method Summary
protected  void addSections(org.dom4j.Element output, org.dom4j.Element body)
          Iterates thorugh the given body looking for h1, h2, h3 nodes and creating the associated section elements.
protected  org.dom4j.Node cloneNode(org.dom4j.Node node)
          Normalizes the whitespace of any Elements
 org.dom4j.Document convert(org.dom4j.Document html)
          Converts the given HTML document into the corresponding XDoc format of XML
protected  int determineHeadingLevel(org.dom4j.Node node)
          Determines the heading level of the node.
protected  List getBodyContent(List content)
          Returns a copy of the body content, removing any whitespace from the beginning and end.
protected  String getSectionText(org.dom4j.Node node)
           
protected  boolean isCharacterData(org.dom4j.Node node)
          Specifies whether the node is character data and should be passed as straight text to the resultant html.
protected  boolean isHeading(org.dom4j.Node node)
          Specifies whether the node is a heading node.
protected  boolean isPre(org.dom4j.Node node)
           
protected  boolean isTextFormatting(org.dom4j.Node node)
          Specifies whether the node is a text modifying construct that should be passed as is to the resultant html.
protected  boolean isWhitespace(org.dom4j.Node node)
           
protected  void makeSection(org.dom4j.Element output, org.dom4j.Node node)
          Creates a section or subsection as necessary based on the node for the output document.
protected  boolean needsNewSection(org.dom4j.Node node)
          Determines if a new section is needed which is based on whether the node's a heading level and equal to or less than the current section's heading level.
protected  boolean shouldBreakPara(org.dom4j.Node node)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Html2XdocBean

public Html2XdocBean()
Method Detail

convert

public org.dom4j.Document convert(org.dom4j.Document html)
Converts the given HTML document into the corresponding XDoc format of XML

Parameters:
html - the input html document
Returns:
Document

addSections

protected void addSections(org.dom4j.Element output,
                           org.dom4j.Element body)
Iterates thorugh the given body looking for h1, h2, h3 nodes and creating the associated section elements. Any text nodes contained inside the body are wrapped in a <p> element

Parameters:
output - the output destination
body - the block of HTML markup to convert

isTextFormatting

protected boolean isTextFormatting(org.dom4j.Node node)
Specifies whether the node is a text modifying construct that should be passed as is to the resultant html. Such as an anchor '<a>'.

Parameters:
node - the node to check
Returns:
true if the node is used to modify the formatting of the text; otherwise, false

isCharacterData

protected boolean isCharacterData(org.dom4j.Node node)
Specifies whether the node is character data and should be passed as straight text to the resultant html.

Parameters:
node - the node to check
Returns:
true if the node is a text node; otherwise, false.

isHeading

protected boolean isHeading(org.dom4j.Node node)
Specifies whether the node is a heading node.

Parameters:
node - the node to check
Returns:
true if the given node is a heading element (h1, h2, h3 etc); otherwise, false

determineHeadingLevel

protected int determineHeadingLevel(org.dom4j.Node node)
Determines the heading level of the node.

Parameters:
node - the node to check
Returns:
the integer level of the heading

makeSection

protected void makeSection(org.dom4j.Element output,
                           org.dom4j.Node node)
Creates a section or subsection as necessary based on the node for the output document.

Parameters:
output - the output document to attach the section
node - the node to base making a section on

getSectionText

protected String getSectionText(org.dom4j.Node node)
Returns:
the section text for the given node. If the node contains an embedded element (such as an <a> element) then return its text

needsNewSection

protected boolean needsNewSection(org.dom4j.Node node)
Determines if a new section is needed which is based on whether the node's a heading level and equal to or less than the current section's heading level.

Parameters:
node - the node to check
Returns:
true if the current node's information means for a new section; otherwise, false

shouldBreakPara

protected boolean shouldBreakPara(org.dom4j.Node node)
Returns:
true if the paragraph should be split, such as for a br or p tag

getBodyContent

protected List getBodyContent(List content)
Returns a copy of the body content, removing any whitespace from the beginning and end.

Parameters:
content - the content node list to obtain body content from
Returns:
List

isPre

protected boolean isPre(org.dom4j.Node node)
Parameters:
node - the node to check
Returns:
true if the node is a pre tag; otherwise false.

isWhitespace

protected boolean isWhitespace(org.dom4j.Node node)
Parameters:
node - the node to check
Returns:
true if the given node is a whitespace text node

cloneNode

protected org.dom4j.Node cloneNode(org.dom4j.Node node)
Normalizes the whitespace of any Elements

Parameters:
node - the node to clone
Returns:
Node the cloned node


Copyright © 2001-2006 Apache Software Foundation. All Rights Reserved.