The reader

How Paged opens an IDML container — a single ZIP read into zero-copy byte slices, a mimetype check, the root designmap, and per-resource parsers that run only when a story or spread is actually needed.

The reader opens the ZIP once, keeps every part as a cheap byte slice, and parses each XML resource only when something asks for it.

In short: Opening an IDML file in Paged happens in two distinct steps. First the container is opened: the ZIP is decompressed once into a map of named byte slices, the mimetype part is checked, and the root designmap.xml is parsed into a manifest of what the package contains. Nothing else is parsed yet. Then, as the higher layers walk the document, individual stories, spreads, styles, and resources are parsed on demand — each XML part read in a single forward pass by the event-driven quick-xml reader, and only if some part of the render actually needs it. This page traces that open path and explains the two ideas that make it fast: zero-copy byte slices and parse-on-demand.

Opening the container

The entry point is Container::open, which takes the raw bytes of an .idml file. The IDML package is a ZIP archive, so the first thing it does is open that archive and walk every entry once:

for each entry in the ZIP archive:
    skip directories
    decompress the entry into a buffer
    store it as Bytes, keyed by its path inside the archive

The result is a single map — entries: BTreeMap<String, Bytes> — from archive path ("Stories/Story_u1f8.xml", "Resources/Styles.xml", "designmap.xml") to the decompressed contents of that part. This is the whole of the up-front work: every part of the package is decompressed and held in memory, but none of it is parsed into typed structs yet. At this stage the engine knows what bytes each part contains, not what those bytes mean.

With the entries in hand, the open path does exactly three things before returning:

Confirms the mimetype. It looks up the mimetype part and checks its trimmed contents against Adobe's IDML constant, application/vnd.adobe.indesign-idml-package. A missing or wrong mimetype stops the open here. (The full check, and what it rejects, is the subject of Validation and recovery.)
Reads the designmap. It pulls designmap.xml out of the entry map and parses it into a DesignMap — the manifest that lists which spreads, stories, masters, and resource parts the package references. This is the table of contents the rest of the document is walked from.
Returns the Container. The returned struct holds the mimetype string, the raw designmap bytes, the parsed DesignMap, and the full entry map. From here, any part can be fetched by path with container.entry("Stories/Story_u1f8.xml").

So a freshly opened container has parsed exactly two things: the mimetype string and the designmap. Everything else is still sitting in the entry map as undecoded bytes, waiting to be asked for.

Zero-copy byte slices

The entry map stores each part as Bytes, not as a Vec<u8> or a String. That choice is deliberate, and it is what lets the rest of the parse stay cheap.

Bytes is a reference-counted, shareable view over a buffer. Handing a sub-resource to a parser, or slicing one part out of the archive to parse separately, is a pointer-and-counter operation — no bytes are copied. The crate documents the reason directly: the entry map "keeps Bytes so downstream crates can slice sub-resources (individual Stories/Story_*.xml etc.) without copying." A 400-page book with hundreds of stories is decompressed once; after that, every story handed to a parser is a cheap clone of a slice into that one buffer, not a fresh allocation.

This is why the open step can afford to decompress everything eagerly. Decompression is unavoidable — the bytes have to be ungzipped to be read at all — but holding them as shared slices means the cost of keeping them around is just the decompressed size, paid once, with no per-reader duplication.

Single-pass XML reading

Every XML part — the designmap, each story, each spread, the styles and graphics resources — is read the same way: with quick-xml, an event-driven streaming reader.

An event-driven reader does not build a DOM. It walks the XML forward from start to finish, emitting a small event for each thing it encounters — a start tag, an end tag, a run of text, the end of the file — and the parser reacts to each event as it arrives. A typical resource parser is a single loop:

loop:
    read the next event
    match it:
        Start(tag)  → note we entered an element; pull the attributes we care about
        Text(bytes) → append to the current run
        End(tag)    → close the element; attach it to its parent
        Eof         → stop
    clear the buffer and continue

This means each part is read in one forward pass, with no backtracking and no intermediate tree. The parser pulls the handful of attributes it consumes off each start tag as it goes (using the shared attr helper, covered in Validation and recovery) and assembles the typed struct as the events flow by. When the Eof event arrives, the struct is complete. There is no second walk to resolve or normalise anything inside the part.

Parsing on demand

The container open parses only the mimetype and the designmap. Everything else is parsed lazily, by the layer above — paged-scene's Document::open — and only the parts the manifest actually points at.

When a document is opened for rendering, Document::open walks the designmap and parses each referenced part by fetching its bytes from the container and running the matching parser:

a Resources/Graphic.xml entry, if present, through the graphic parser (the color and gradient palette);
a Resources/Styles.xml entry, if present, through the style parser;
each master spread, then each body spread, through Spread::parse;
each story the manifest lists, through Story::parse.

Each parser runs against a Bytes slice fetched by key. A story is parsed when the document is built; a part the manifest never references is never parsed at all, even though its bytes sit in the entry map. The parse work is keyed to what the document declares it uses, not to what happens to be in the ZIP.

Two consequences are worth holding onto. First, the resource parts (graphics and styles) are optional at this layer — if the manifest's package omits them, the parser substitutes an empty default rather than failing, so a document with no custom styles still opens cleanly. Second, the graphics and styles parts parse before the spreads and stories that reference them, so by the time a page item names a color or a style, the table that defines it is already in typed form.

How the pieces fit

Putting the path end to end:

Container::open(bytes)
  ├─ decompress every ZIP entry  → entries: { path → Bytes }   (zero-copy slices)
  ├─ check "mimetype"            → reject if not the IDML constant
  └─ parse "designmap.xml"       → DesignMap (the manifest)

Document::open(bytes)            (one layer up, in paged-scene)
  ├─ Container::open(...)        (the above)
  ├─ parse Resources/Graphic.xml (if present)   ┐
  ├─ parse Resources/Styles.xml  (if present)   │  each: fetch Bytes by key,
  ├─ parse each master + spread                 │  read in one quick-xml pass
  └─ parse each story                           ┘

The reader's whole personality is in that shape: open the archive once, keep the parts as cheap slices, check just enough to know it is IDML, and defer the real parsing until a typed document is actually being assembled. The cost model that falls out of these choices — and the trade-offs they imply — is the subject of Performance and memory.

Frequently asked questions

Does opening a container parse the whole document? No. Container::open decompresses every ZIP entry into memory but only parses two things: the mimetype part (to validate it) and designmap.xml (the manifest). Stories, spreads, styles, and graphics stay as undecoded byte slices until the layer above parses them on demand — and a part the manifest never references is never parsed at all.

What does "zero-copy" actually save here? The bytes still have to be decompressed once, but they are stored as shareable Bytes slices rather than owned buffers. Handing a story or spread to its parser is then a cheap reference-counted slice, not a fresh copy of the data. On a large document with many stories, that turns hundreds of potential copies into hundreds of pointer bumps into one shared buffer.

Why use a streaming XML reader instead of building a DOM? Because the parser only needs a single forward pass to pull the attributes and text it consumes and build its typed struct. A DOM would allocate a full intermediate tree that is then walked a second time and thrown away. The event-driven quick-xml reader does it in one pass with no intermediate tree — faster, and far lighter on memory for big stories.