renderpx
Theme: auto

System Design

Notion-style document editor

Block-based content model, contenteditable pitfalls, slash commands, drag-to-reorder, autosave, and optional real-time collaboration with Yjs CRDT.

What we're building

notion.so / workspace
PageNavigator
Project Overview
API Design
Components
Data Model
State Management
page IDs in flat Map
BlockList → BlockRenderer
Project Overviewh1
Architecture for the new file management system using normalized state.p1
Key Decisionsh2
Flat Map<id, Block>, not a nested treeli1
Zustand for local mutations + React Query for persistenceli2
Yjs CRDT for real-time collaborationli3
switch on block.type · autosave on change
Zustand Store
blocks: {
h1: { heading1 }
p1: { paragraph }
h2: { heading2 }
li1, li2,
li3 ← cursor }
rootBlockIds:
['h1','p1'...
● autosaving
flat Map<id, Block>

A PageNavigator sidebar, a BlockList that renders each block by type via BlockRenderer, and a Zustand store holding a flat Map<id, Block>, not a nested tree.

The Challenge

A document editor like Notion is one of the hardest frontend problems because every obvious approach has a hidden trap. Raw contenteditable conflicts with React's virtual DOM. Nested tree state makes every block update a recursive clone. Cursor management across block boundaries requires ProseMirror-level knowledge. And then there's real-time collaboration, which invalidates most naive assumptions about state ownership.

The scope here is a collaborative document editor with: a block-based content model, multiple block types (text, heading, list, code, image, embed), slash commands to insert blocks, drag-to-reorder, autosave, and optional live collaboration with presence indicators.

Each section covers the key decision, the naive approach, and what you'd actually build.

Data Model

The most important decision is the shape of a block. The instinct is a nested tree: each block has a children array. That works until you need to update a deeply nested block - you have to clone every ancestor to maintain immutability.

Use a flat map instead. Each block stores an array of child IDs, not child objects. The document has a blocks: Record<string, Block> map and a rootBlockIds: string[]. Any block can be looked up in O(1), updated without recursive cloning, and the structure maps naturally to how CRDT libraries like Yjs store data.

ts
Loading...

Fractional indexing for order

If block order is stored in a database as an integer column, inserting between two blocks requires renumbering siblings. Use fractional indexing instead - order values are strings that sort lexicographically, so any insert is a single-row write. The fractional-indexing npm package handles this.

Architecture Map

Loading diagram…
SurfacePattern / approachWhy
Block renderingBlock registry + recursive renderExtensible without a giant switch; new types = new entry
Text editingTipTap / ProseMirror per blockRaw contenteditable + React fight the DOM; ProseMirror handles cursor, marks, composition
Block mutationsOptimistic updates + debounced syncEvery keystroke can't wait for a server round-trip; buffer 500ms then flush
Slash commandsLocal UI state + floating menuDetect "/", position using getBoundingClientRect, filter + keyboard nav
Block reorderdnd-kit + optimistic reorderMouse + touch + keyboard; DragOverlay avoids clipping; rollback on error
Page tree (sidebar)React Query (separate query)Different staleTime from doc content; page tree changes less often
CollaborationYjs CRDT + WebSocket providerConcurrent edits merged conflict-free; awareness API handles presence
AutosaveDebounced mutation + save indicator500ms debounce; show "Saving..." / "Saved"; flush on blur
Long documentsTanStack Virtual1000+ blocks - render only what's in the viewport
Heavy block typesReact.lazy per block typeEmbed, syntax-highlighted code - don't ship their bundles unless the block type is used

Block Rendering

Each block type is a separate React component. The naive approach is a giant switch or if-else chain in a single render function - it becomes unmaintainable as you add types. Use a block registry instead: a plain object mapping BlockType to a component. Adding a new type is one object entry.

Since blocks nest (toggle blocks contain children, list items contain sub-items), each block component receives the full BlockMap and renders its child IDs recursively. This avoids threading child nodes through props - each component looks up what it needs by ID.

tsx
Loading...

The Editor Core

The most common mistake in document editor implementations: using contenteditable with dangerouslySetInnerHTML and letting React manage the DOM. React reconciles on every state change and resets the cursor to position 0. The user's typing position is lost on every keystroke.

The manual fix (shown below) suppresses React's reconciliation for the editable node and syncs content only when the block is not focused. This works for simple plaintext blocks. For rich text (bold, links, marks), you need ProseMirror or TipTap - the browser's selection model across marked text is too complex to manage manually.

tsx
Loading...

Don't build your own rich text editor

ProseMirror took years to develop and handles hundreds of edge cases: IME composition (Chinese, Japanese, Korean), bidirectional text, cross-browser selection, Safari's unique input events, screen reader compatibility. TipTap wraps ProseMirror in a React-friendly API and adds a Yjs extension for collaboration. Use it. Notion built their own - you shouldn't need to.

Slash Commands

Slash commands are the primary block insertion UX. When the user types / at the start of an empty block (or anywhere in text), a command palette appears at the cursor position. As the user types more characters, the list filters. Pressing Enter or clicking inserts the selected block type and replaces the /query text.

The tricky part is positioning the menu at the cursor - window.getSelection().getRangeAt(0).getBoundingClientRect() gives you the cursor's viewport coordinates. The menu is rendered with position: fixed at those coordinates. If you render it in the React tree near the block, it may clip inside an overflow: hidden container. A portal at the document body is safer.

tsx
Loading...

Drag-and-Drop Block Reorder

Blocks need to be reorderable via a drag handle. The most common mistake: attaching drag listeners to the entire block. This means clicking anywhere on the block (including inside the text editor) triggers a drag, swallowing click events.

Use a dedicated grip icon that appears on hover. Attach drag listeners only to that icon. The block content - the editor, buttons, links - remains fully interactive. The full dnd-kit setup with DragOverlay, PointerSensor activation constraint, and optimistic reorder with rollback is covered in the Drag-and-Drop pattern.

Nested block drag

This example handles top-level block reorder. For nested blocks (inside a toggle or list), you need one SortableContext per parent block, each with its own items array. Cross-list dragging (moving a block out of a toggle into the root level) requires DragOverlay and custom collision detection.

Real-Time Collaboration

Yjs is the production choice for collaborative document editing. It implements a CRDT (conflict-free replicated data type): every client holds a full copy of the document, edits are represented as operations that can be applied in any order, and concurrent edits from two users always merge to the same result with no server-side conflict resolution logic.

The WebSocket provider syncs operations in real time; when a user is offline, changes queue locally and sync on reconnect. The awareness API tracks who is in the document and where their cursor is, which powers the collaborator avatars and cursor overlays.

tsx
Loading...
Loading diagram…

Ship without Yjs first

Yjs adds complexity: a new mental model for state, a separate WebSocket server, and a migration from your existing mutation-based writes. Build the single-user version first with React Query and optimistic updates. Add Yjs when users actually need live collaboration - the TipTap Yjs extension makes it a near drop-in if you planned block IDs as UUIDs from the start.

State Architecture

This is the surface where wrong state placement causes the most pain. The rule: server state in React Query, ephemeral UI state local to the component that owns it, and global client state (auth, theme, UI prefs) in Zustand.

tsx
Loading...

Don't put selection in global state

Cursor position and text selection are ephemeral, device-local, and change thousands of times per minute during typing. Putting them in a global store causes every subscribed component to re-render on every keystroke. Keep selection in the editor component's local state.

Performance

Four performance concerns are specific to document editors.

  • Virtualized lists: Render only the blocks currently in the viewport. TanStack Virtual handles variable-height items. See the tradeoffs below before adding this.
  • Lazy-load heavy block types: Embed blocks (Figma, YouTube) and syntax-highlighted code pull in large dependencies. React.lazy splits them out of the initial bundle.
  • Debounce writes: Buffer keystrokes for 500ms then flush to the server. Also flush on blur when the user leaves a block. Prevents a round-trip on every character.
  • Memoize block components: When one block changes, only that block should re-render. Use React.memo with a custom comparator on blockId + updatedAt so siblings don't re-render unnecessarily.

Measure before virtualizing

Virtualization adds complexity: the virtualizer needs block heights before rendering, variable-height items require measurement, and dnd-kit has known issues with TanStack Virtual during drag (items scroll out of view and are unmounted). For most documents (under 200 blocks), React handles it fine without virtualization. Add it only when you have real user complaints about scroll performance.

Building Blocks

The patterns and frameworks that this system design applies - each covers one piece of the architecture above.

What I'd Do Differently

Use TipTap from day one, not raw contenteditable

The browser's editing model is a decade of accumulated quirks. IME composition events (Chinese, Japanese input), Safari's non-standard input event behavior, bidirectional text, and screen reader compatibility will consume weeks if you handle them manually. TipTap gives you a schema-validated, extension-based editor that already handles all of this. The cost is bundle size (~50kB gzipped) and a new mental model; both are worth it.

Design the data model for CRDT before you need it

Adding Yjs to an existing data model means rewriting every mutation. The changes are not large individually, but they touch every place that writes state. If you use UUID block IDs from the start (instead of sequential integers), use a flat map (not nested tree), and separate block content from block metadata, the migration is mostly mechanical. If you used integers and nested state, the migration requires a data transformation and a period where the old and new models coexist.

Don't virtualize until you have to

Virtualization complicates drag-and-drop (items scroll out of view and unmount), breaks Cmd+F in-page search (hidden items aren't in the DOM), and makes block height estimation fragile. For most documents, 200 blocks renders fast. Add virtualization only when you have measured scroll jank on real devices with real content. Premature optimization here causes bugs that are genuinely hard to debug.

The block-as-flat-map pattern pays off across the whole stack

The flat map is slightly awkward to assemble for rendering, but it eliminates an entire class of bugs: no recursive state updates, no deeply nested immer patches, no "why did this ancestor re-render" mystery. When you add Yjs, you realize Yjs internally stores everything as a flat map anyway. Getting there from a nested tree is the migration you don't want to do under deadline pressure.