System Design
Notion-style document editor
Block-based content model, contenteditable pitfalls, slash commands, drag-to-reorder, autosave, and optional real-time collaboration with Yjs CRDT.
What we're building
A PageNavigator sidebar, a BlockList that renders each block by type via BlockRenderer, and a Zustand store holding a flat Map<id, Block>, not a nested tree.
The Challenge
A document editor like Notion is one of the hardest frontend problems because every obvious approach has a hidden trap. Raw contenteditable conflicts with React's virtual DOM. Nested tree state makes every block update a recursive clone. Cursor management across block boundaries requires ProseMirror-level knowledge. And then there's real-time collaboration, which invalidates most naive assumptions about state ownership.
The scope here is a collaborative document editor with: a block-based content model, multiple block types (text, heading, list, code, image, embed), slash commands to insert blocks, drag-to-reorder, autosave, and optional live collaboration with presence indicators.
Each section covers the key decision, the naive approach, and what you'd actually build.
Data Model
The most important decision is the shape of a block. The instinct is a nested tree: each block has a children array. That works until you need to update a deeply nested block - you have to clone every ancestor to maintain immutability.
Use a flat map instead. Each block stores an array of child IDs, not child objects. The document has a blocks: Record<string, Block> map and a rootBlockIds: string[]. Any block can be looked up in O(1), updated without recursive cloning, and the structure maps naturally to how CRDT libraries like Yjs store data.
Fractional indexing for order
fractional-indexing npm package handles this.Architecture Map
| Surface | Pattern / approach | Why |
|---|---|---|
| Block rendering | Block registry + recursive render | Extensible without a giant switch; new types = new entry |
| Text editing | TipTap / ProseMirror per block | Raw contenteditable + React fight the DOM; ProseMirror handles cursor, marks, composition |
| Block mutations | Optimistic updates + debounced sync | Every keystroke can't wait for a server round-trip; buffer 500ms then flush |
| Slash commands | Local UI state + floating menu | Detect "/", position using getBoundingClientRect, filter + keyboard nav |
| Block reorder | dnd-kit + optimistic reorder | Mouse + touch + keyboard; DragOverlay avoids clipping; rollback on error |
| Page tree (sidebar) | React Query (separate query) | Different staleTime from doc content; page tree changes less often |
| Collaboration | Yjs CRDT + WebSocket provider | Concurrent edits merged conflict-free; awareness API handles presence |
| Autosave | Debounced mutation + save indicator | 500ms debounce; show "Saving..." / "Saved"; flush on blur |
| Long documents | TanStack Virtual | 1000+ blocks - render only what's in the viewport |
| Heavy block types | React.lazy per block type | Embed, syntax-highlighted code - don't ship their bundles unless the block type is used |
Block Rendering
Each block type is a separate React component. The naive approach is a giant switch or if-else chain in a single render function - it becomes unmaintainable as you add types. Use a block registry instead: a plain object mapping BlockType to a component. Adding a new type is one object entry.
Since blocks nest (toggle blocks contain children, list items contain sub-items), each block component receives the full BlockMap and renders its child IDs recursively. This avoids threading child nodes through props - each component looks up what it needs by ID.
The Editor Core
The most common mistake in document editor implementations: using contenteditable with dangerouslySetInnerHTML and letting React manage the DOM. React reconciles on every state change and resets the cursor to position 0. The user's typing position is lost on every keystroke.
The manual fix (shown below) suppresses React's reconciliation for the editable node and syncs content only when the block is not focused. This works for simple plaintext blocks. For rich text (bold, links, marks), you need ProseMirror or TipTap - the browser's selection model across marked text is too complex to manage manually.
Don't build your own rich text editor
Slash Commands
Slash commands are the primary block insertion UX. When the user types / at the start of an empty block (or anywhere in text), a command palette appears at the cursor position. As the user types more characters, the list filters. Pressing Enter or clicking inserts the selected block type and replaces the /query text.
The tricky part is positioning the menu at the cursor - window.getSelection().getRangeAt(0).getBoundingClientRect() gives you the cursor's viewport coordinates. The menu is rendered with position: fixed at those coordinates. If you render it in the React tree near the block, it may clip inside an overflow: hidden container. A portal at the document body is safer.
Drag-and-Drop Block Reorder
Blocks need to be reorderable via a drag handle. The most common mistake: attaching drag listeners to the entire block. This means clicking anywhere on the block (including inside the text editor) triggers a drag, swallowing click events.
Use a dedicated grip icon that appears on hover. Attach drag listeners only to that icon. The block content - the editor, buttons, links - remains fully interactive. The full dnd-kit setup with DragOverlay, PointerSensor activation constraint, and optimistic reorder with rollback is covered in the Drag-and-Drop pattern.
Nested block drag
SortableContext per parent block, each with its own items array. Cross-list dragging (moving a block out of a toggle into the root level) requires DragOverlay and custom collision detection.Real-Time Collaboration
Yjs is the production choice for collaborative document editing. It implements a CRDT (conflict-free replicated data type): every client holds a full copy of the document, edits are represented as operations that can be applied in any order, and concurrent edits from two users always merge to the same result with no server-side conflict resolution logic.
The WebSocket provider syncs operations in real time; when a user is offline, changes queue locally and sync on reconnect. The awareness API tracks who is in the document and where their cursor is, which powers the collaborator avatars and cursor overlays.
Ship without Yjs first
State Architecture
This is the surface where wrong state placement causes the most pain. The rule: server state in React Query, ephemeral UI state local to the component that owns it, and global client state (auth, theme, UI prefs) in Zustand.
Don't put selection in global state
Performance
Four performance concerns are specific to document editors.
- Virtualized lists: Render only the blocks currently in the viewport. TanStack Virtual handles variable-height items. See the tradeoffs below before adding this.
- Lazy-load heavy block types: Embed blocks (Figma, YouTube) and syntax-highlighted code pull in large dependencies.
React.lazysplits them out of the initial bundle. - Debounce writes: Buffer keystrokes for 500ms then flush to the server. Also flush on blur when the user leaves a block. Prevents a round-trip on every character.
- Memoize block components: When one block changes, only that block should re-render. Use
React.memowith a custom comparator onblockId + updatedAtso siblings don't re-render unnecessarily.
Measure before virtualizing
Building Blocks
The patterns and frameworks that this system design applies - each covers one piece of the architecture above.
Patterns used
Frameworks applied
What I'd Do Differently
Use TipTap from day one, not raw contenteditable
The browser's editing model is a decade of accumulated quirks. IME composition events (Chinese, Japanese input), Safari's non-standard input event behavior, bidirectional text, and screen reader compatibility will consume weeks if you handle them manually. TipTap gives you a schema-validated, extension-based editor that already handles all of this. The cost is bundle size (~50kB gzipped) and a new mental model; both are worth it.
Design the data model for CRDT before you need it
Adding Yjs to an existing data model means rewriting every mutation. The changes are not large individually, but they touch every place that writes state. If you use UUID block IDs from the start (instead of sequential integers), use a flat map (not nested tree), and separate block content from block metadata, the migration is mostly mechanical. If you used integers and nested state, the migration requires a data transformation and a period where the old and new models coexist.
Don't virtualize until you have to
Virtualization complicates drag-and-drop (items scroll out of view and unmount), breaks Cmd+F in-page search (hidden items aren't in the DOM), and makes block height estimation fragile. For most documents, 200 blocks renders fast. Add virtualization only when you have measured scroll jank on real devices with real content. Premature optimization here causes bugs that are genuinely hard to debug.
The block-as-flat-map pattern pays off across the whole stack
The flat map is slightly awkward to assemble for rendering, but it eliminates an entire class of bugs: no recursive state updates, no deeply nested immer patches, no "why did this ancestor re-render" mystery. When you add Yjs, you realize Yjs internally stores everything as a flat map anyway. Getting there from a nested tree is the migration you don't want to do under deadline pressure.