Tagged XML inside IDML
Tagged XML is IDML's second, structured-content view — a user-defined XML tree drawn over the same text and page items as the layout. This chapter documents it as a format reference; the Paged renderer does not read it yet.
Tagged XML is IDML's second view of a document — its content organized as a user-defined XML tree, alongside the layout that places it on pages.
In short: An IDML document carries two parallel views of the same content. The layout view places frames on spreads and flows text through stories; the tagged-XML view is a separate tree of XML tags an author drew over those same words and page items so the document also reads as structured data. This chapter documents the tagged-XML layer as a format reference — its parts, element names, and how it points back at placed content. It is a true part of every IDML package, but the Paged renderer does not read it today; the pages here describe the format, not current renderer behaviour.
This chapter documents that second view: the tagged-XML layer. It is a true part of the format, present in every package, and worth knowing if you generate or post-process IDML. But it is the one place in this reference where we have to be plain up front:
Not yet parsedThe Paged renderer reads the layout view, not the tagged-XML tree.What the layer is
In InDesign you can tag content with your own XML structure: wrap a heading and
its body in a chapter element, mark a price as a price element, and so on.
That structure is independent of how the page looks. The same heading can be an
h1 in your XML and a "Heading 1" paragraph in the layout at the same time —
they are two labels on one piece of content.
When the document is written out as IDML, that structure does not vanish into the stories. It is recorded explicitly:
- The tags themselves — the vocabulary of element names you may apply — live
in
XML/Tags.xml. - The structure — which content sits inside which tagged element, and in what
order — lives in
XML/BackingStory.xml, as a tree of<XMLElement>nodes. - The mapping between a tag and a paragraph or character style lives in
XML/Mapping.xml.
A tagged element does not duplicate the content it covers. It points at it: text that has been placed in the layout is referenced by id, so the XML tree and the layout stories describe the same characters from two directions. Text that has been tagged but not yet placed on a page is held in the backing story itself, waiting.
Why a reader cares
You will meet this layer for one of two reasons.
If you author or generate IDML — say, to feed a publishing pipeline — the tagged-XML layer is how downstream tools find content by meaning rather than by position. Knowing the element names lets you read or write that structure deliberately.
If you render or inspect IDML with Paged, you mostly care about the opposite: knowing that this layer exists, that it is safe to ignore for layout, and that its parts in the package are not a sign of content you are missing. The text you render comes from the stories, not from the backing story.
The honest part
Paged does not parse the tagged-XML tree. The parser opens the package, reads the
design map, and follows it to the spreads, stories, and resources it needs to lay
out pages. XML/BackingStory.xml, XML/Tags.xml, and XML/Mapping.xml are
present in the archive — the parser even unzips them into memory — but nothing
walks them, and the design map's reference to the backing story is dropped on the
floor. A document renders identically whether or not it carries any XML tags.
That is a deliberate scoping decision, not a bug, and this chapter treats it as a first-class fact rather than a footnote. The two pages that follow split the work cleanly:
- The structured-content layer
documents the element vocabulary —
<XMLElement>,<XMLAttribute>,<XMLComment>,<XMLInstruction>, the<XMLStory>inBackingStory.xml, the<XMLTag>entries inTags.xml, and the maps inMapping.xml— as format facts. - Why it's not parsed yet explains the parser's actual behaviour, points at exactly where in the engine the layer is skipped, and sketches what supporting it would take.
Throughout, every construct carries a Not yet parsed badge. These pages describe the IDML format; they do not describe anything the Paged renderer acts on today.
Frequently asked questions
What is the tagged-XML layer in IDML?
It is IDML's structured-content view: a tree of user-defined XML tags that an
author draws over a document's text and page items, recorded in XML/Tags.xml,
XML/BackingStory.xml, and XML/Mapping.xml. It runs parallel to the layout
view, describing the same content by meaning rather than by position.
Does the Paged renderer read the tagged-XML layer? No. Paged reads the layout view — spreads, stories, and styles — and produces identical pages whether or not a document carries any XML tags. This chapter documents the layer as a format reference, not as current renderer behaviour.
If Paged ignores it, can I delete the XML/ parts from a package?
This reference does not advise editing the parts away. The point for a Paged user
is the opposite reassurance: those parts being present is not a sign of content
you are missing, since the text you render comes from the stories, not from the
backing story.
Where should I look for the element names and the parser details? The structured-content layer documents the element vocabulary as format facts, and why it's not parsed yet points at the exact places in the engine where the layer is skipped.
Visibility resolution
How our renderer decides whether a conditional run is laid out — the drop-before-layout rule, and the editor-only pieces it deliberately leaves undrawn.
The structured-content layer
The element vocabulary of IDML's tagged-XML layer — XMLElement, XMLAttribute, XMLComment, XMLInstruction, the XMLStory in BackingStory.xml, the XMLTag entries in Tags.xml, and the maps in Mapping.xml — documented as format facts.