Work in progress — this reference is being written in the open. Unfinished pages are excluded from search engines.
Paged · IDML Reference
Parser internals

Validation and recovery

The two hard checks every IDML must pass to open, the full ParseError and OpenError model, and Paged's forgiving recovery rules — missing becomes None, unknown becomes None, and cycles are capped so a malformed document still renders.

Pro· reference

Two things make a file fatal — it isn't a readable ZIP, or it isn't IDML — and almost everything else is recovered from rather than rejected.

In short: Paged's parser is strict at exactly two gates and forgiving everywhere else. To open at all, a file must be a readable ZIP archive and must carry the IDML mimetype; a missing mimetype or designmap.xml is also fatal. Past those gates the parser is deliberately tolerant: a missing attribute becomes None and inherits, an unrecognised enum value becomes None, a referenced resource that isn't there falls back to a default, and reference chains that could loop are capped at a fixed depth so a malformed document can never hang. This page is the reference for that model — the error variants and what triggers each, then the recovery rules that govern everything non-fatal.

The hard checks

Opening a container, in Container::open, applies two validations before it will return a document.

It must be a readable ZIP. The bytes are opened as a ZIP archive. If they are not a valid archive — truncated, corrupt, not a ZIP at all — the ZIP layer raises an error and the open stops immediately.

It must be IDML, by mimetype. IDML packages carry a mimetype part whose contents name the format. The parser looks up that part, decodes it as UTF-8, trims it, and compares it against Adobe's constant:

application/vnd.adobe.indesign-idml-package

A mimetype part that is absent, isn't valid UTF-8, or holds any other string stops the open. This is the check that cleanly separates an IDML package from an ordinary ZIP that merely happens to contain XML — the mimetype is the package's declaration of what it is, and the parser takes it at its word.

One more part is mandatory: designmap.xml, the root manifest. A package that passes the mimetype check but has no designmap is missing the table of contents the whole document is walked from, so its absence is fatal too.

Everything in this section is a fatal condition — it produces an error and no document. Note what is not here: a missing color, an unknown justification value, a style that names a parent that doesn't exist. None of those stop the open. They are governed by the recovery model below.

The error model

Errors surface as two enums, one per layer. The parser layer raises ParseError; the scene layer above it raises OpenError, which wraps ParseError and adds one variant of its own.

ParseError — the parser layer

Attribute · ParseErrorType / valuesSupportNotes
NotIdmlString (reason)The mimetype part is present but isn’t the IDML constant, or isn’t valid UTF-8. The string carries the offending value.
MissingEntrystatic nameA mandatory archive part is absent — "mimetype" or "designmap.xml".
Iostd::io::ErrorAn underlying I/O failure while reading the bytes (e.g. decompressing a ZIP entry).
Zipzip errorThe bytes are not a readable ZIP archive — corrupt, truncated, or not a ZIP.
Xmlquick-xml errorA mandatory part that must be parsed up front (the designmap) is not well-formed XML.

The first two variants are Paged's own validation; the last three are pass-throughs from the underlying ZIP, I/O, and XML machinery, lifted into ParseError so a caller handles one error type rather than three.

OpenError — the scene layer

When paged-scene builds a full Document, it walks the designmap and parses every part the manifest references. That introduces one new failure the container open can't have: the manifest names a part that the archive doesn't actually contain.

Attribute · OpenErrorType / valuesSupportNotes
MissingEntryString (manifest path)The designmap lists a spread or story, but the archive has no entry at that path. The string is the missing path.
ParseParseErrorAny error from the parser layer, wrapped through — including all the ParseError variants above.

There is a meaningful distinction between the two MissingEntry variants. At the parser layer, the parser's own behaviour for an entry it can't find is tolerant — it skips. The scene layer deliberately lifts that to a structured error for the parts the manifest promises: if the designmap says a story exists and it doesn't, that is a broken package and the caller should know, so Document::open turns the missing entry into OpenError::MissingEntry rather than silently dropping the story.

The recovery model

Past the hard checks, the parser's governing principle is that a malformed or unfamiliar document should still render as much as it faithfully can, rather than fail. Recovery falls into three rules.

Missing → None, then inherit

When an attribute the parser reads isn't on the element, the field becomes None. The shared attr helper that every parser uses returns None when an attribute is "absent or non-UTF-8" — there is no error path for a missing attribute, by design.

A None is not "zero" or "empty" — it means unset, and unset is exactly what the style cascade needs. A paragraph that doesn't carry its own PointSize leaves that field None, and at resolve time the value is inherited from the applied style, and from that style's parent, and so on. Missing-becomes-None is what makes inheritance work: the parser records the absence of an override, and the cascade fills it in.

Unknown → None

When an attribute is present but its value isn't one the parser recognises, the result is also None. Enum-valued attributes are parsed by a from_idml-style matcher that maps each known string to a variant and ends with a catch-all:

// Justification::from_idml (story.rs)
match s {
    "LeftAlign"      => Some(Self::LeftAlign),
    "CenterAlign"    => Some(Self::CenterAlign),
    "RightAlign"     => Some(Self::RightAlign),
    // ... the rest of the known values ...
    _ => None,
}

An unrecognised value — a typo, a newer InDesign enum the parser doesn't know yet — falls into that _ => None arm and is treated as unset, which then inherits like any other missing value. The same shape governs corner options, color models, and the other enum-valued attributes across the parser.

This rule is also a deliberate safety valve in the numeric parsers. A tint percentage outside the valid 0..=100 range returns None rather than being clamped or passed through, "so a malformed document can't silently distort the renderer's output," and a non-finite float (a NaN or infinity that slipped into a coordinate) is likewise rejected to None. Out-of-range is treated the same as unknown: discard the bad value, fall back to inheritance or default.

Parsed, not yet renderedAn unknown attribute value is silently dropped to None — the parser reads the value but does not warn, log, or surface that it didn't recognise it. A typo'd Justification looks identical to an omitted one, so a mis-typed enum quietly inherits instead of erroring.

Missing resource → default

When a parse references a resource that isn't present, the result is a sensible default rather than a failure — within the parser layer. If a package has no Resources/Graphic.xml, the palette becomes an empty Graphic::default(); with no Resources/Styles.xml, the stylesheet becomes an empty StyleSheet::default(). A page item that names a color or style which the (possibly empty) table doesn't define resolves to the default for that property.

This is the rule the scene layer tightens for manifest-named parts: a missing optional resource defaults, but a missing part the designmap explicitly lists is an OpenError. The line is between "the document didn't supply this" (default) and "the document promised this and lied" (error).

Cycle caps

Two kinds of reference in IDML can, in a malformed file, form a loop — and a naive walk of either would never terminate. The parser bounds both with a fixed iteration cap.

Style BasedOn chains are walked at resolve time, folding each parent's unset attributes into the child. IDML does not forbid a style from being based on itself (directly or through a cycle), so the resolver caps the walk at MAX_BASED_ON_DEPTH = 16 hops — far beyond the 1–3 hops real documents use — and short-circuits once it hits that depth. The same cap protects the character, paragraph, object, cell, and table cascades.

Frame chains — the NextTextFrame links that thread a story across multiple frames — are walked the same way in paged-scene, capped at MAX_FRAME_CHAIN = 256 frames, "so a malformed document can't hang." The walk also tracks the frames it has already seen and stops if a link points back to one, so a cycle terminates at the loop point rather than running to the full cap.

Parsed, not yet renderedBoth cycle caps protect against hangs but are silent: hitting MAX_BASED_ON_DEPTH (16) or MAX_FRAME_CHAIN (256) simply stops the walk with no warning. A pathological document with a genuine cycle renders a truncated result rather than reporting that its references loop.

What the parser deliberately skips

Recovery is also about what the parser chooses not to carry into the AST at all. Some IDML constructs are read and then intentionally dropped, because they are metadata or annotations rather than body content that reaches the page. The story parser suppresses a few subtrees outright:

  • <Note> — sticky-note annotations. The marker and its content are dropped; the surrounding text flow is uninterrupted (the suppressed wrapper inserts no character into the run). The parser carries a dedicated test, track5c_note_skipped, pinning this behaviour.
  • <HiddenText> — text that is authored but not flowed. Suppressed for the same reason: it isn't part of the visible copy.
  • <Index> / <IndexEntry> — index markers. The marker is a zero-width metadata point and the entry text is metadata, not body copy, so both are suppressed from the flow.

The standalone Tagged XML structure (IDML's XMLElement / XMLAttribute document-structure tree) is likewise outside what the parser maps into the rendered AST — it is structural metadata layered over the content rather than geometry or styling the renderer acts on. Anchored objects nested inside a <Group> are another deliberate omission at the spread layer: the count of frames dropped this way is even surfaced as skipped_nested_frames so a caller can flag a lossy parse without reading logs.

Parsed, not yet renderedSkipped constructs (Note, HiddenText, Index markers, the tagged-XML structure tree, and group-nested frames) are read past but dropped from the rendered AST. This is correct for annotations and metadata, but it means a document relying on those constructs for visible output will render without them.

For the document-level patterns these rules produce — the malformed, partial, and unusual files and exactly how each one comes out — see Edge cases, which catalogues the behaviour from the reader's side.

Frequently asked questions

What actually makes a file fail to open? Only a few things: the bytes aren't a readable ZIP, the mimetype part is missing or isn't Adobe's IDML constant, the mandatory designmap.xml is absent, or — at the scene layer — the designmap names a spread or story the archive doesn't contain. Everything else (missing attributes, unknown values, absent optional resources, looping references) is recovered from, not rejected.

If I typo a style or attribute value, will the parser tell me? No. An unrecognised enum value is silently treated as unset (None) and inherits like a missing value would. There is no warning or log, so a typo'd Justification is indistinguishable from an omitted one. If a value you set seems to have no effect, suspect a value the parser doesn't recognise.

Why cap reference chains instead of detecting cycles exactly? A fixed cap is simpler and strictly bounds the work, which is the goal: a malformed document can't make the parser hang. The frame-chain walk does also track seen frames and stop early on a real loop; the BasedOn cap of 16 is set far above the 1–3 hops real styles use, so it only ever bites on a pathological chain.

Is a missing Resources/Styles.xml an error? Not at the parser layer — it defaults to an empty stylesheet, and a document with no custom styles opens fine. The scene layer only raises an error for parts the designmap explicitly lists and the archive then fails to provide. "Not supplied" defaults; "promised but absent" errors.

On this page