System Design

Notion-style document editor

Block-based content model, contenteditable pitfalls, slash commands, drag-to-reorder, autosave, and optional real-time collaboration with Yjs CRDT.

What we're building

notion.so / workspace

PageNavigator

▪ Project Overview

▪ API Design

▪ Components

▪ Data Model

▪ State Management

page IDs in flat Map

BlockList → BlockRenderer

Project Overviewh1

Architecture for the new file management system using normalized state.p1

Key Decisionsh2

•Flat Map<id, Block>, not a nested treeli1

•Zustand for local mutations + React Query for persistenceli2

•Yjs CRDT for real-time collaborationli3

switch on block.type · autosave on change

Zustand Store

blocks: {

h1: { heading1 }

p1: { paragraph }

h2: { heading2 }

li1, li2,

li3 ← cursor }

rootBlockIds:

['h1','p1'...

● autosaving

flat Map<id, Block>

A PageNavigator sidebar, a BlockList that renders each block by type via BlockRenderer, and a Zustand store holding a flat Map<id, Block>, not a nested tree.

The Challenge

A document editor like Notion is one of the hardest frontend problems because every obvious approach has a hidden trap. Raw contenteditable conflicts with React's virtual DOM. Nested tree state makes every block update a recursive clone. Cursor management across block boundaries requires ProseMirror-level knowledge. And then there's real-time collaboration, which invalidates most naive assumptions about state ownership.

The scope here is a collaborative document editor with: a block-based content model, multiple block types (text, heading, list, code, image, embed), slash commands to insert blocks, drag-to-reorder, autosave, and optional live collaboration with presence indicators.

Each section covers the key decision, the naive approach, and what you'd actually build.

Data Model

The most important decision is the shape of a block. The instinct is a nested tree: each block has a children array. That works until you need to update a deeply nested block - you have to clone every ancestor to maintain immutability.

Use a flat map instead. Each block stores an array of child IDs, not child objects. The document has a blocks: Record<string, Block> map and a rootBlockIds: string[]. Any block can be looked up in O(1), updated without recursive cloning, and the structure maps naturally to how CRDT libraries like Yjs store data.

Fractional indexing for order

If block order is stored in a database as an integer column, inserting between two blocks requires renumbering siblings. Use fractional indexing instead - order values are strings that sort lexicographically, so any insert is a single-row write. The fractional-indexing npm package handles this.

Architecture Map

Loading diagram…

Surface	Pattern / approach	Why
Block rendering	Block registry + recursive render	Extensible without a giant switch; new types = new entry
Text editing	TipTap / ProseMirror per block	Raw contenteditable + React fight the DOM; ProseMirror handles cursor, marks, composition
Block mutations	Optimistic updates + debounced sync	Every keystroke can't wait for a server round-trip; buffer 500ms then flush
Slash commands	Local UI state + floating menu	Detect "/", position using `getBoundingClientRect`, filter + keyboard nav
Block reorder	dnd-kit + optimistic reorder	Mouse + touch + keyboard; `DragOverlay` avoids clipping; rollback on error
Page tree (sidebar)	React Query (separate query)	Different staleTime from doc content; page tree changes less often
Collaboration	Yjs CRDT + WebSocket provider	Concurrent edits merged conflict-free; awareness API handles presence
Autosave	Debounced mutation + save indicator	500ms debounce; show "Saving..." / "Saved"; flush on blur
Long documents	TanStack Virtual	1000+ blocks - render only what's in the viewport
Heavy block types	React.lazy per block type	Embed, syntax-highlighted code - don't ship their bundles unless the block type is used

Block Rendering

Each block type is a separate React component. The naive approach is a giant switch or if-else chain in a single render function - it becomes unmaintainable as you add types. Use a block registry instead: a plain object mapping BlockType to a component. Adding a new type is one object entry.

Since blocks nest (toggle blocks contain children, list items contain sub-items), each block component receives the full BlockMap and renders its child IDs recursively. This avoids threading child nodes through props - each component looks up what it needs by ID.

tsx

The Editor Core

The most common mistake in document editor implementations: using contenteditable with dangerouslySetInnerHTML and letting React manage the DOM. React reconciles on every state change and resets the cursor to position 0. The user's typing position is lost on every keystroke.

The manual fix (shown below) suppresses React's reconciliation for the editable node and syncs content only when the block is not focused. This works for simple plaintext blocks. For rich text (bold, links, marks), you need ProseMirror or TipTap - the browser's selection model across marked text is too complex to manage manually.

tsx

Don't build your own rich text editor

ProseMirror took years to develop and handles hundreds of edge cases: IME composition (Chinese, Japanese, Korean), bidirectional text, cross-browser selection, Safari's unique input events, screen reader compatibility. TipTap wraps ProseMirror in a React-friendly API and adds a Yjs extension for collaboration. Use it. Notion built their own - you shouldn't need to.

Slash Commands

Slash commands are the primary block insertion UX. When the user types / at the start of an empty block (or anywhere in text), a command palette appears at the cursor position. As the user types more characters, the list filters. Pressing Enter or clicking inserts the selected block type and replaces the /query text.

The tricky part is positioning the menu at the cursor - window.getSelection().getRangeAt(0).getBoundingClientRect() gives you the cursor's viewport coordinates. The menu is rendered with position: fixed at those coordinates. If you render it in the React tree near the block, it may clip inside an overflow: hidden container. A portal at the document body is safer.

tsx

Drag-and-Drop Block Reorder

Blocks need to be reorderable via a drag handle. The most common mistake: attaching drag listeners to the entire block. This means clicking anywhere on the block (including inside the text editor) triggers a drag, swallowing click events.

Use a dedicated grip icon that appears on hover. Attach drag listeners only to that icon. The block content - the editor, buttons, links - remains fully interactive. The full dnd-kit setup with DragOverlay, PointerSensor activation constraint, and optimistic reorder with rollback is covered in the Drag-and-Drop pattern.

Nested block drag

This example handles top-level block reorder. For nested blocks (inside a toggle or list), you need one SortableContext per parent block, each with its own items array. Cross-list dragging (moving a block out of a toggle into the root level) requires DragOverlay and custom collision detection.

Real-Time Collaboration

Yjs is the production choice for collaborative document editing. It implements a CRDT (conflict-free replicated data type): every client holds a full copy of the document, edits are represented as operations that can be applied in any order, and concurrent edits from two users always merge to the same result with no server-side conflict resolution logic.

The WebSocket provider syncs operations in real time; when a user is offline, changes queue locally and sync on reconnect. The awareness API tracks who is in the document and where their cursor is, which powers the collaborator avatars and cursor overlays.

tsx

Loading diagram…

Ship without Yjs first

Yjs adds complexity: a new mental model for state, a separate WebSocket server, and a migration from your existing mutation-based writes. Build the single-user version first with React Query and optimistic updates. Add Yjs when users actually need live collaboration - the TipTap Yjs extension makes it a near drop-in if you planned block IDs as UUIDs from the start.

State Architecture

This is the surface where wrong state placement causes the most pain. The rule: server state in React Query, ephemeral UI state local to the component that owns it, and global client state (auth, theme, UI prefs) in Zustand.

tsx

Don't put selection in global state

Cursor position and text selection are ephemeral, device-local, and change thousands of times per minute during typing. Putting them in a global store causes every subscribed component to re-render on every keystroke. Keep selection in the editor component's local state.

Performance

Four performance concerns are specific to document editors.

Virtualized lists: Render only the blocks currently in the viewport. TanStack Virtual handles variable-height items. See the tradeoffs below before adding this.
Lazy-load heavy block types: Embed blocks (Figma, YouTube) and syntax-highlighted code pull in large dependencies. React.lazy splits them out of the initial bundle.
Debounce writes: Buffer keystrokes for 500ms then flush to the server. Also flush on blur when the user leaves a block. Prevents a round-trip on every character.
Memoize block components: When one block changes, only that block should re-render. Use React.memo with a custom comparator on blockId + updatedAt so siblings don't re-render unnecessarily.

Measure before virtualizing

Virtualization adds complexity: the virtualizer needs block heights before rendering, variable-height items require measurement, and dnd-kit has known issues with TanStack Virtual during drag (items scroll out of view and are unmounted). For most documents (under 200 blocks), React handles it fine without virtualization. Add it only when you have real user complaints about scroll performance.

Building Blocks

The patterns and frameworks that this system design applies - each covers one piece of the architecture above.

Patterns used

Optimistic Updates

Block edits and reorder reflect immediately; rollback on sync failure.

Autosave / Draft

Debounced write to server on every block change; save indicator.

Debouncing & Throttling

500ms debounce on keystroke sync; prevents per-character round-trips.

Drag-and-Drop

Block reorder with dnd-kit; optimistic reorder + rollback.

Virtualized Lists

Render only visible blocks for documents with 1000+ entries.

Polling vs WebSockets

WebSockets for Yjs sync; polling as a fallback for presence.

Code Splitting & Lazy Loading

Embed and syntax-highlighted code blocks load on demand.

Loading States

Skeleton layout during initial document fetch.

Frameworks applied

State Architecture

Where block state, selection, presence, and UI prefs live.

Data Fetching & Sync

React Query for document and page tree; Yjs for collaborative state.

Rendering Strategy

SSR for initial page load; client-side for editor interactions.

Performance Architecture

Virtualization, memoization, lazy block types, debounced writes.

What I'd Do Differently

Use TipTap from day one, not raw contenteditable

The browser's editing model is a decade of accumulated quirks. IME composition events (Chinese, Japanese input), Safari's non-standard input event behavior, bidirectional text, and screen reader compatibility will consume weeks if you handle them manually. TipTap gives you a schema-validated, extension-based editor that already handles all of this. The cost is bundle size (~50kB gzipped) and a new mental model; both are worth it.

Design the data model for CRDT before you need it

Adding Yjs to an existing data model means rewriting every mutation. The changes are not large individually, but they touch every place that writes state. If you use UUID block IDs from the start (instead of sequential integers), use a flat map (not nested tree), and separate block content from block metadata, the migration is mostly mechanical. If you used integers and nested state, the migration requires a data transformation and a period where the old and new models coexist.

Don't virtualize until you have to

Virtualization complicates drag-and-drop (items scroll out of view and unmount), breaks Cmd+F in-page search (hidden items aren't in the DOM), and makes block height estimation fragile. For most documents, 200 blocks renders fast. Add virtualization only when you have measured scroll jank on real devices with real content. Premature optimization here causes bugs that are genuinely hard to debug.

The block-as-flat-map pattern pays off across the whole stack

The flat map is slightly awkward to assemble for rendering, but it eliminates an entire class of bugs: no recursive state updates, no deeply nested immer patches, no "why did this ancestor re-render" mystery. When you add Yjs, you realize Yjs internally stores everything as a flat map anyway. Getting there from a nested tree is the migration you don't want to do under deadline pressure.