treewalkers Package¶

A collection of modules for iterating through different kinds of tree, generating tokens identical to those produced by the tokenizer module.

To create a tree walker for a new type of tree, you need to implement a tree walker object (called TreeWalker by convention) that implements a ‘serialize’ method which takes a tree as sole argument and returns an iterator which generates tokens.

html5lib.treewalkers.getTreeWalker(treeType, implementation=None, **kwargs)[source]¶

Get a TreeWalker class for various types of tree with built-in support

Parameters:

treeType (str) –
the name of the tree type required (case-insensitive). Supported values are:
- ”dom”: The xml.dom.minidom DOM implementation
- ”etree”: A generic walker for tree implementations exposing an elementtree-like interface (known to work with ElementTree, cElementTree and lxml.etree).
- ”lxml”: Optimized walker for lxml.etree
- ”genshi”: a Genshi stream
implementation – A module implementing the tree type e.g. xml.etree.ElementTree or cElementTree (Currently applies to the “etree” tree type only).
kwargs – keyword arguments passed to the etree walker–for other walkers, this has no effect

Returns:

a TreeWalker class

html5lib.treewalkers.pprint(walker)[source]¶

Pretty printer for tree walkers

Takes a TreeWalker instance and pretty prints the output of walking the tree.

Parameters:	walker – a TreeWalker instance

`base` Module¶

class html5lib.treewalkers.base.TreeWalker(tree)[source]¶

Bases: object

Walks a tree yielding tokens

Tokens are dicts that all have a type field specifying the type of the token.

__init__(tree)[source]¶

Creates a TreeWalker

Parameters:	tree – the tree to walk

comment(data)[source]¶

Generates a Comment token

Parameters:	data – the comment
Returns:	Comment token

doctype(name, publicId=None, systemId=None)[source]¶

Generates a Doctype token

Parameters:	name – publicId – systemId –
Returns:	the Doctype token

emptyTag(namespace, name, attrs, hasChildren=False)[source]¶

Generates an EmptyTag token

Parameters:	namespace – the namespace of the token–can be `None` name – the name of the element attrs – the attributes of the element as a dict hasChildren – whether or not to yield a SerializationError because this tag shouldn’t have children
Returns:	EmptyTag token

endTag(namespace, name)[source]¶

Generates an EndTag token

Parameters:	namespace – the namespace of the token–can be `None` name – the name of the element
Returns:	EndTag token

entity(name)[source]¶

Generates an Entity token

Parameters:	name – the entity name
Returns:	an Entity token

error(msg)[source]¶

Generates an error token with the given message

Parameters:	msg – the error message
Returns:	SerializeError token

startTag(namespace, name, attrs)[source]¶

Generates a StartTag token

Parameters:	namespace – the namespace of the token–can be `None` name – the name of the element attrs – the attributes of the element as a dict
Returns:	StartTag token

text(data)[source]¶

Generates SpaceCharacters and Characters tokens

Depending on what’s in the data, this generates one or more SpaceCharacters and Characters tokens.

For example:

>>> from html5lib.treewalkers.base import TreeWalker
>>> # Give it an empty tree just so it instantiates
>>> walker = TreeWalker([])
>>> list(walker.text(''))
[]
>>> list(walker.text('  '))
[{u'data': '  ', u'type': u'SpaceCharacters'}]
>>> list(walker.text(' abc '))  # doctest: +NORMALIZE_WHITESPACE
[{u'data': ' ', u'type': u'SpaceCharacters'},
{u'data': u'abc', u'type': u'Characters'},
{u'data': u' ', u'type': u'SpaceCharacters'}]

Parameters:	data – the text data
Returns:	one or more `SpaceCharacters` and `Characters` tokens

unknown(nodeType)[source]¶: Handles unknown node types

class html5lib.treewalkers.base.NonRecursiveTreeWalker(tree)[source]¶: Bases: html5lib.treewalkers.base.TreeWalker

`dom` Module¶

class html5lib.treewalkers.dom.TreeWalker(tree)[source]¶: Bases: html5lib.treewalkers.base.NonRecursiveTreeWalker

`etree` Module¶

`etree_lxml` Module¶

class html5lib.treewalkers.etree_lxml.TreeWalker(tree)[source]¶

Bases: html5lib.treewalkers.base.NonRecursiveTreeWalker

__init__(tree)[source]¶

Creates a TreeWalker

Parameters:	tree – the tree to walk

`genshi` Module¶

class html5lib.treewalkers.genshi.TreeWalker(tree)[source]¶: Bases: html5lib.treewalkers.base.TreeWalker

treewalkers Package¶

base Module¶

dom Module¶

etree Module¶

etree_lxml Module¶

genshi Module¶

`base` Module¶

`dom` Module¶

`etree` Module¶

`etree_lxml` Module¶

`genshi` Module¶