Why ranges, not spans
Why IDML models styled text as nested paragraph and character ranges rather than the inline spans of HTML, and what that means for walking the tree.
IDML models styled text as nested ranges, not inline spans, because formatting in a paged layout follows paragraph and run boundaries.
In short: HTML models styled text as inline spans wrapping characters, free to
start and end anywhere. IDML instead nests ranges: a ParagraphStyleRange
containing CharacterStyleRange blocks containing Content. Because each paragraph
owns one paragraph style and each run owns one set of character formatting, the range
shape encodes those boundaries directly, which fits a paginated, style-driven layout
better than overlapping spans. This page explains why and what it implies for anyone
walking the tree.
HTML models styled text as inline spans wrapping characters. IDML instead models it as nested ranges — a paragraph-style range containing character-style ranges containing content. This page explains why that shape fits a paginated, style-driven layout model better, and what it implies for anyone walking the tree.
🚧 Being written.
Frequently asked questions
What is the difference between a range and a span?
A span (as in HTML) wraps an arbitrary stretch of characters and can start or end
anywhere. A range in IDML is tied to a structural boundary: a ParagraphStyleRange
is exactly one paragraph and a CharacterStyleRange is exactly one run of uniform
character formatting.
Why does IDML nest ranges instead of wrapping spans?
Because formatting in a page layout follows paragraph and run boundaries: each
paragraph has one paragraph style and each run has one set of character formatting.
Nesting CharacterStyleRange blocks inside a ParagraphStyleRange encodes those
boundaries directly, with no overlapping or ambiguous nesting to resolve.
What does the range model mean for walking the tree?
Each level maps cleanly to a unit you iterate over: paragraphs are
ParagraphStyleRange blocks, runs are CharacterStyleRange blocks, and text is in
Content. Counting paragraphs or runs is just counting those blocks — see
story structure.
Extract all text from a document
A recipe for pulling the plain text out of every story in reading order — walk StoryList, then each story tree, concatenating Content.
Styles
IDML keeps formatting in named, reusable style sheets, and the parser resolves a style reference plus its chain of parents into the values a run actually renders with.