Usage¶
inxs
is designed to allow Pythonistas and other cultivated folks to write sparse and readable
transformations that take delb objects as input. Most likely they will return the same, but
there’s no limitation into what the data can be mangled.
It does so by providing a framework that traverses an XML tree, tests tag nodes, pulls and
manipulates data in a series of steps. It supports the combination of reusable and generalized
logical units of expressions and actions. Therefore there’s also a library with functions to deploy
and a module with contributed transformations.
Though inxs
should be usable for any problem that XSLT could solve, it is not modeled to
address XSLT users to get a quick grip on it. Anyone who enjoys XSLT should continue to do so.
So far the framework performs with acceptable speed with uses on text documents from the
humanities.
Let’s break its usage down with the second example from the README
:
1 2 3 4 5 6 | transformation = Transformation(
generate_skeleton,
Rule('person', extract_person),
lib.sort('persons', itemgetter(1)),
list_persons,
result_object='context.html', context={'persons': []})
|
A transformation is set up by instantiating a inxs.Transformation
(line 1) with a series
of transformation steps (lines 2-5) passed as positional argument s and two
configuration values (line 6) provided as keyword arguments.
The first step (line 2) is a function that creates a skeleton for the resulting HTML markup and
stores it in the context
namespace:
def generate_skeleton(context):
context.html = new_tag_node(
"html", namespace='http://www.w3.org/1999/xhtml',
children=(
tag("head",
tag("title", "Testing XML Example")),
tag("body", (
tag("h1", "Persons"),
tag("ul")
)),
)
)
When a transformation calls a handler function it does so by applying dependency injection as may
be known from pytest’s fixtures. The passed arguments are resolved from
inxs.Transformation._available_symbols
where any object that has previously been added to
the context
namespace is available as well as the context
itself.
Line 3 defines something that is used more often in real world uses than here. A inxs.Rule
that tests the transformation root and its descendants for defined properties. In the
example all nodes with a person
tag will be passed to the associated
handler function:
def extract_person(node: TagNode, persons):
persons.append(
(first(node.css_select("name")).full_text,
first(node.css_select("family-name")).full_text)
)
delb’s API is used to fetch child nodes of the matching nodes, extract their text and appends
them in a tuple to a list that was defined in the context
argument of the configuration values
(line 7).
Rules can also test anything outside the scope of a node, the utilized functions however aren’t
‘dependency injected’ to avoid overhead. They are called with node
and transformation
as
arguments and take it from there. See inxs.If()
for an example.
The last two steps (line 4 and 5) eventually sort (inxs.lib.sort()
with
operator.itemgetter()
) and append the data to the HTML tree that was prepared by the step in
line 2:
def list_persons(previous_result, html: TagNode):
first(html.css_select("html|body html|ul")).append_child(
*(html.new_tag_node("li", children=[f'{x[1]}, {x[0]}'])
for x in previous_result)
)
The argument previous_result
is resolved to the object that the previous function returned,
again the delb
API and Python’s f-string s are used to generate the result.
As the transformation was configured with context.html
as result object, the transformation
returns the object referenced as html
(see handler function in line 2) from the context. If the
transformation hasn’t explicitly configured a result object, (per default a copy of) the
transformation root is returned. Any other data is discarded.
The initialized transformation can now be called with a delb.Document
or
delb.TagNode
instance as transformation root:
>>> result = transformation(document) # doctest: +SKIP
A transformation root can be any node within a document, leaving siblings and ancestors
untouched. A transformation works on a copy of the document’s tree unless the configuration
contains a key copy
set to False
or the transformation is called with such keyword
argument.
Transformations can also be used as simple steps - then invoked with the
transformation root - or as rule handlers - then invoked with each matching node.
Per default these do not operate on copies, to do so inxs.lib.f()
can be employed:
# as a simple step
f(sub_transformation, 'root', copy=True)
# as a rule handler
f(sub_transformation, 'node', copy=True)
Any transformation step, condition or handler can be grouped into sequence s to encourage code recycling - But don’t take that as a permission to barbarously patching fragments of existing solutions together that you might feel are similar to your problem. It’s taken care that the items are retained as when a transformation was initialized if groups were mutable types.
Now that the authoritarian part is reached, be advised that using expressive and unambiguous names is essential when designing transformations and their components. As a rule of thumb, a simple transformation step should fit into one line, rules into two, maybe up to four. If it gets confusing to read, use variables, grouping (more reusability) or dedicated functions (more performance) - again, mind the names! Reciting the Zen of Python on a daily basis makes you a beautiful person. Yes, even more.
To get a grip on implementing own condition test functions and handler function s, it’s
advised to study the inxs.lib
module.
And now, space for some spots-on-.. sections.
Traversal strategies¶
When a rule is evaluated, the document (sub-)tree is traversed in a specified order. There are three aspects that must be combined to define that order and are available as constants that are to be or’ed bitwise:
inxs.TRAVERSE_DEPTH_FIRST
/inxs.TRAVERSE_WIDTH_FIRST
inxs.TRAVERSE_LEFT_TO_RIGHT
/inxs.TRAVERSE_RIGHT_TO_LEFT
inxs.TRAVERSE_TOP_TO_BOTTOM
/inxs.TRAVERSE_BOTTOM_TO_TOP
Rules can be initiated with such value as traversal_order
argument and override the
transformation’s one (that one defaults to …_DEPTH_FIRST | …_LEFT_TO_RIGHT | …_TOP_TO_BOTTOM
).
Not all strategies are are implemented yet.
inxs.TRAVERSE_ROOT_ONLY
sets a strategy that only considers the transformation root. It
is also set implicitly for rules that contain a '/'
as condition (see
Rule condition shortcuts).
Rule condition shortcuts¶
Strings can be used to specify certain rule conditions:
/
selects only the transformation root*
selects all nodes - should only be used if there are no other conditions- any string that contains
://
selects nodes with a namespace that matches the string - strings that contain only letters select nodes whose local name matches the string
- if a string can be translated to an XPath expression with cssselect and thus can be considered a valid css selector, the result is used like the following; mind that you can use namespace prefixes if you know the prefixes, otherwise this is not an option to match a node from a namespace that’s not the transformation root’s default
- all other strings will select all nodes that an XPath evaluation of that string on the transformation root returns
Another shortcut is to pass a dictionary to test an node’s attributes, see
inxs.MatchesAttributes()
for details.
Speaking of conditions, see inxs.Any()
, inxs.OneOf()
and inxs.Not()
to overcome
the logical and
evaluation of all tests.
Global configuration¶
inxs
caches and reuses evaluator and handler functions with identical arguments where possible.
By default these caches are not limited in size and they might eventually grow larger than the
memory that was saved in big, long-running applications that create a lot of short-living
transformations. To limit the size of each of these last-recently-used-caches, the environment
variable HANDLER_CACHES_SIZE
can be set. The value should be a power of two.
Caveats¶
Modifications during iteration¶
Similar to iteration over mutable types in Python, adding, moving or deleting nodes to the
tree breaks the iteration of a rule over nodes. Thus such modifications must be applied in a
simple transformation step; e.g. to remove all <br>
nodes from a document:
def collect_trash(node, trashbin):
trashbin.append(node)
transformation = Transformation(
Rule('br', collect_trash),
lib.remove_nodes('trashbin'),
context={'trashbin': []})
Debugging / Logging¶
There are functions in the inxs.lib
module to log information about a transformation’s state
at info level. There’s a logger
object in that module too that needs to be set up with a
handler and a log level in order to get the output (see logging
). inxs
itself produces
very noisy messages at debug level.
inxs.lib.debug_dump_document()
, inxs.lib.debug_message()
and
inxs.lib.debug_symbols()
can be used as handler function.
inxs.lib.dbg()
and inxs.lib.nfo()
can be used within test and handler functions.
Due to its rather sparse and dynamic design, the exception tracebacks that are produced aren’t very helpful as they contain no information about the context of an exception. To tackle one of those, a minimal non-working example is preferred to debug.
Glossary¶
- configuration
- The configuration of a transformation is a
types.SimpleNamespace
object that is bound as itsconfig
property and is populated by passing keywords arguments to its initialization. It is intended to be an immutable container for key-value-pairs that persist through transformation’s executions. Mind that it’s immutability isn’t completely enforced, manipulating it or its members might result in unexpected behaviour. It can be referred to in handler function’s signatures asconfig
, the same is true for its member unless overridden ininxs.Transformation._available_symbols
. Seeinxs.Transformation
for details on reserved names in the configuration namespace. - context
- The context of a transformation is a
types.SimpleNamespace
instance and intended to hold any mutable values during a transformation. It is initialized from the values stored in the configuration’scontext
value and the overriding keywords provided when calling ainxs.Transformation
instance. - handler function
- Handler functions can be employed as simple transformation steps
or as conditionally executed
handlers
of ainxs.Rule
. Any of their signature’s argument s must be available ininxs.Transformation._available_symbols
upon the time the function gets called. - transformation root
- This is the node that a transformation instance is called with. Any traverser will return neither its ancestors nor its siblings.
- transformation steps
- Transformation steps are handler functions or
inxs.Rule
s that define the actions taken when a transformation is processed. The steps are stored as a linear graph, rudimentary branching can be achieved by using rules that call other transformations.