Validation and recovery
The two hard checks every IDML must pass to open, the full ParseError and OpenError model, and Paged's forgiving recovery rules — missing becomes None, unknown becomes None, and cycles are capped so a malformed document still renders.
Two things make a file fatal — it isn't a readable ZIP, or it isn't IDML — and almost everything else is recovered from rather than rejected.
In short: Paged's parser is strict at exactly two gates and forgiving
everywhere else. To open at all, a file must be a readable ZIP archive and must
carry the IDML mimetype; a missing mimetype or designmap.xml is also fatal. Past
those gates the parser is deliberately tolerant: a missing attribute becomes None
and inherits, an unrecognised enum value becomes None, a referenced resource that
isn't there falls back to a default, and reference chains that could loop are capped
at a fixed depth so a malformed document can never hang. This page is the reference
for that model — the error variants and what triggers each, then the recovery rules
that govern everything non-fatal.
The hard checks
Opening a container, in Container::open, applies two validations before it will
return a document.
It must be a readable ZIP. The bytes are opened as a ZIP archive. If they are not a valid archive — truncated, corrupt, not a ZIP at all — the ZIP layer raises an error and the open stops immediately.
It must be IDML, by mimetype. IDML packages carry a mimetype part whose contents
name the format. The parser looks up that part, decodes it as UTF-8, trims it, and
compares it against Adobe's constant:
application/vnd.adobe.indesign-idml-packageA mimetype part that is absent, isn't valid UTF-8, or holds any other string stops
the open. This is the check that cleanly separates an IDML package from an ordinary
ZIP that merely happens to contain XML — the mimetype is the package's declaration of
what it is, and the parser takes it at its word.
One more part is mandatory: designmap.xml, the root manifest. A package that passes
the mimetype check but has no designmap is missing the table of contents the whole
document is walked from, so its absence is fatal too.
Everything in this section is a fatal condition — it produces an error and no document. Note what is not here: a missing color, an unknown justification value, a style that names a parent that doesn't exist. None of those stop the open. They are governed by the recovery model below.
The error model
Errors surface as two enums, one per layer. The parser layer raises ParseError; the
scene layer above it raises OpenError, which wraps ParseError and adds one variant
of its own.
ParseError — the parser layer
| Attribute · ParseError | Type / values | Support | Notes |
|---|---|---|---|
| NotIdml | String (reason) | — | The mimetype part is present but isn’t the IDML constant, or isn’t valid UTF-8. The string carries the offending value. |
| MissingEntry | static name | — | A mandatory archive part is absent — "mimetype" or "designmap.xml". |
| Io | std::io::Error | — | An underlying I/O failure while reading the bytes (e.g. decompressing a ZIP entry). |
| Zip | zip error | — | The bytes are not a readable ZIP archive — corrupt, truncated, or not a ZIP. |
| Xml | quick-xml error | — | A mandatory part that must be parsed up front (the designmap) is not well-formed XML. |
The first two variants are Paged's own validation; the last three are pass-throughs
from the underlying ZIP, I/O, and XML machinery, lifted into ParseError so a caller
handles one error type rather than three.
OpenError — the scene layer
When paged-scene builds a full Document, it walks the designmap and parses every
part the manifest references. That introduces one new failure the container open
can't have: the manifest names a part that the archive doesn't actually contain.
| Attribute · OpenError | Type / values | Support | Notes |
|---|---|---|---|
| MissingEntry | String (manifest path) | — | The designmap lists a spread or story, but the archive has no entry at that path. The string is the missing path. |
| Parse | ParseError | — | Any error from the parser layer, wrapped through — including all the ParseError variants above. |
There is a meaningful distinction between the two MissingEntry variants. At the
parser layer, the parser's own behaviour for an entry it can't find is tolerant — it
skips. The scene layer deliberately lifts that to a structured error for the parts
the manifest promises: if the designmap says a story exists and it doesn't, that is a
broken package and the caller should know, so Document::open turns the missing entry
into OpenError::MissingEntry rather than silently dropping the story.
The recovery model
Past the hard checks, the parser's governing principle is that a malformed or unfamiliar document should still render as much as it faithfully can, rather than fail. Recovery falls into three rules.
Missing → None, then inherit
When an attribute the parser reads isn't on the element, the field becomes None. The
shared attr helper that every parser uses returns None when an attribute is "absent
or non-UTF-8" — there is no error path for a missing attribute, by design.
A None is not "zero" or "empty" — it means unset, and unset is exactly what the
style cascade needs. A paragraph that doesn't carry its own
PointSize leaves that field None, and at resolve time the value is inherited from
the applied style, and from that style's parent, and so on. Missing-becomes-None is
what makes inheritance work: the parser records the absence of an override, and the
cascade fills it in.
Unknown → None
When an attribute is present but its value isn't one the parser recognises, the
result is also None. Enum-valued attributes are parsed by a from_idml-style
matcher that maps each known string to a variant and ends with a catch-all:
// Justification::from_idml (story.rs)
match s {
"LeftAlign" => Some(Self::LeftAlign),
"CenterAlign" => Some(Self::CenterAlign),
"RightAlign" => Some(Self::RightAlign),
// ... the rest of the known values ...
_ => None,
}An unrecognised value — a typo, a newer InDesign enum the parser doesn't know yet —
falls into that _ => None arm and is treated as unset, which then inherits like any
other missing value. The same shape governs corner options, color models, and the
other enum-valued attributes across the parser.
This rule is also a deliberate safety valve in the numeric parsers. A tint percentage
outside the valid 0..=100 range returns None rather than being clamped or passed
through, "so a malformed document can't silently distort the renderer's output," and a
non-finite float (a NaN or infinity that slipped into a coordinate) is likewise
rejected to None. Out-of-range is treated the same as unknown: discard the bad value,
fall back to inheritance or default.
Missing resource → default
When a parse references a resource that isn't present, the result is a sensible
default rather than a failure — within the parser layer. If a package has no
Resources/Graphic.xml, the palette becomes an empty Graphic::default(); with no
Resources/Styles.xml, the stylesheet becomes an empty StyleSheet::default(). A
page item that names a color or style which the (possibly empty) table doesn't define
resolves to the default for that property.
This is the rule the scene layer tightens for manifest-named parts: a missing
optional resource defaults, but a missing part the designmap explicitly lists is an
OpenError. The line is between "the document didn't supply this" (default) and "the
document promised this and lied" (error).
Cycle caps
Two kinds of reference in IDML can, in a malformed file, form a loop — and a naive walk of either would never terminate. The parser bounds both with a fixed iteration cap.
Style BasedOn chains are walked at resolve time, folding each parent's unset
attributes into the child. IDML does not forbid a style from being based on itself
(directly or through a cycle), so the resolver caps the walk at
MAX_BASED_ON_DEPTH = 16 hops — far beyond the 1–3 hops real documents use — and
short-circuits once it hits that depth. The same cap protects the character,
paragraph, object, cell, and table cascades.
Frame chains — the NextTextFrame links that thread a story across multiple
frames — are walked the same way in paged-scene, capped at MAX_FRAME_CHAIN = 256
frames, "so a malformed document can't hang." The walk also tracks the frames it has
already seen and stops if a link points back to one, so a cycle terminates at the
loop point rather than running to the full cap.
What the parser deliberately skips
Recovery is also about what the parser chooses not to carry into the AST at all. Some IDML constructs are read and then intentionally dropped, because they are metadata or annotations rather than body content that reaches the page. The story parser suppresses a few subtrees outright:
<Note>— sticky-note annotations. The marker and its content are dropped; the surrounding text flow is uninterrupted (the suppressed wrapper inserts no character into the run). The parser carries a dedicated test,track5c_note_skipped, pinning this behaviour.<HiddenText>— text that is authored but not flowed. Suppressed for the same reason: it isn't part of the visible copy.<Index>/<IndexEntry>— index markers. The marker is a zero-width metadata point and the entry text is metadata, not body copy, so both are suppressed from the flow.
The standalone Tagged XML structure (IDML's XMLElement /
XMLAttribute document-structure tree) is likewise outside what the parser maps into
the rendered AST — it is structural metadata layered over the content rather than
geometry or styling the renderer acts on. Anchored objects nested inside a <Group>
are another deliberate omission at the spread layer: the count of frames dropped this
way is even surfaced as skipped_nested_frames so a caller can flag a lossy parse
without reading logs.
For the document-level patterns these rules produce — the malformed, partial, and unusual files and exactly how each one comes out — see Edge cases, which catalogues the behaviour from the reader's side.
Frequently asked questions
What actually makes a file fail to open?
Only a few things: the bytes aren't a readable ZIP, the mimetype part is missing or
isn't Adobe's IDML constant, the mandatory designmap.xml is absent, or — at the
scene layer — the designmap names a spread or story the archive doesn't contain.
Everything else (missing attributes, unknown values, absent optional resources,
looping references) is recovered from, not rejected.
If I typo a style or attribute value, will the parser tell me?
No. An unrecognised enum value is silently treated as unset (None) and inherits like
a missing value would. There is no warning or log, so a typo'd Justification is
indistinguishable from an omitted one. If a value you set seems to have no effect,
suspect a value the parser doesn't recognise.
Why cap reference chains instead of detecting cycles exactly?
A fixed cap is simpler and strictly bounds the work, which is the goal: a malformed
document can't make the parser hang. The frame-chain walk does also track seen frames
and stop early on a real loop; the BasedOn cap of 16 is set far above the 1–3 hops
real styles use, so it only ever bites on a pathological chain.
Is a missing Resources/Styles.xml an error?
Not at the parser layer — it defaults to an empty stylesheet, and a document with no
custom styles opens fine. The scene layer only raises an error for parts the designmap
explicitly lists and the archive then fails to provide. "Not supplied" defaults;
"promised but absent" errors.
The reader
How Paged opens an IDML container — a single ZIP read into zero-copy byte slices, a mimetype check, the root designmap, and per-resource parsers that run only when a story or spread is actually needed.
Performance and memory
Why Paged's parser is fast and light — a single forward XML pass with no second AST walk, zero-copy byte slices over one decompressed buffer, parse-on-demand, and a style cascade resolved lazily at render time.