Skip to content
Intermediate
Back to Redis

Redis Architecture Grand Tour

How Redis processes commands, stores data, and persists to disk

25 min 8 stops
What you will learn
  • How Redis bootstraps from a single main() call into a fully wired server with signal handlers, databases, and a network listener
  • Why Redis can handle tens of thousands of clients with a single thread: the ae event loop and its epoll/kqueue abstraction
  • How the RESP protocol is parsed incrementally from a raw socket read into a command ready for execution
  • How every keyspace write is dispatched through a central call() function that handles propagation to AOF and replicas in one place
  • How Redis avoids O(n) rehashing pauses by spreading hash table migration across regular read and write operations
Prerequisites
  • Comfortable reading C (you don't need to write it, just follow the logic)
  • Basic understanding of key-value stores: what GET/SET do
  • Familiarity with the concept of an event loop is helpful but not required
Step 1 of 8 Server Lifecycle
src/server.c:8088
src/server.c:8088

            
                serverLog
          
            
                serverLog
          
            
                    "Redis version=
          
            
                        REDIS_VERSION,
          
            
                        (
          
            
                        redisGitSHA1
          
            
                        strtol
          
            
                        (
          
            
            
          
            
                if
          
            
                    serverLog
          
            
                }
          
            
            
          
            
                initServer
          
            
                if
          
            
                if
          
            
                redisAsciiArt
          
            
                checkTcpBacklogSettings
          
            
                if
          
            
                    clusterCommonInit
          
            
                    clusterInit
          
            
                }
          
            
                if
          
            
                    moduleInitModulesSystemLast
          
            
                    moduleLoadInternalModules
          
            
                    moduleLoadFromQueue
          
            
                }
          
            
                ACLLoadUsersAtStartup
          
            
                initListeners
          
            
            
          
            
                /* ... load data from disk ... */
          
            
            
          
            
                serverLog
          
            
                loadDataFromDisk
          
            
            
          
            
                aeMain
          
            
                aeDeleteEventLoop
          
            
                return
          
1 of 8

Server Lifecycle

main() in server.c is 355 lines of orchestration. By the time you reach line 8103, the config has been parsed, the locale is set, and the entropy pool is seeded. The real wiring happens in initServer(): signal handlers, database allocation, the event loop, and the listening socket all come up inside that one call. After that, main() loads persisted data from disk (RDB or AOF), then hands control to aeMain() -- a while loop that never returns until the process is told to stop. The return 0 at line 8176 is practically unreachable in production; it only runs on a clean shutdown. Notice how loadDataFromDisk() happens after the listener is configured but before the event loop starts -- clients cannot connect and run commands against a half-loaded dataset.

Key takeaway

Redis startup is a strict ordered sequence: configure, wire, load data, then enter the event loop -- and that order is not accidental.

src/ae.c:360
src/ae.c:360

            
            int
          
            
            {
          
            
                int
          
            
            
          
            
                /* Nothing to do? return ASAP */
          
            
                if
          
            
            
          
            
                if
          
            
                    ((flags 
          
            
                    int
          
            
                    struct
          
            
                    int64_t
          
            
            
          
            
                    if
          
            
                        eventLoop->
          
            
            
          
            
                    if
          
            
                        tv.tv_sec 
          
            
                        tvp 
          
            
                    } 
          
            
                        usUntilTimer 
          
            
                        if
          
            
                            tv.tv_sec 
          
            
                            tv.tv_usec 
          
            
                            tvp 
          
            
                        }
          
            
                    }
          
            
                    /* Call the multiplexing API, will return only on timeout or when
          
            
                     * some event fires. */
          
            
                    numevents 
          
            
            
          
            
                    /* After sleep callback. */
          
            
                    if
          
            
                        eventLoop->
          
            
            
          
            
                    for
          
            
                        int
          
            
                        aeFileEvent 
          
            
                        int
          
            
                        /* dispatch to read or write handler */
          
            
                    }
          
            
                }
          
            
                /* ... time event processing ... */
          
            
            }
          
            
            
          
            
            void
          
            
                eventLoop->stop 
          
            
                while
          
            
                    aeProcessEvents
          
            
                                               AE_CALL_BEFORE_SLEEP
          
            
                                               AE_CALL_AFTER_SLEEP);
          
            
                }
          
            
            }
          
2 of 8

The Event Loop

The entire Redis concurrency model lives in these 140 lines. aeMain() is a while loop that calls aeProcessEvents() on every iteration. Inside aeProcessEvents(), the key move is aeApiPoll() -- a thin abstraction over epoll on Linux or kqueue on macOS. The poll call blocks for at most as long as the next scheduled timer event (usUntilEarliestTimer). When it returns, Redis has a list of file descriptors that are ready for I/O. Each one gets dispatched to its registered read or write handler. No threads, no callbacks in the Node.js sense -- just one tight loop. The beforesleep and aftersleep hooks let other subsystems (replication, AOF flushing) piggyback on the loop without breaking the single-threaded model.

Key takeaway

Redis's legendary throughput comes from a seven-line while loop backed by the OS's own I/O readiness notification -- the event loop is not Redis magic, it is Unix done right.

src/networking.c:3718
src/networking.c:3718

            
            void
          
            
                client 
          
            
                int
          
            
                size_t
          
            
            
          
            
                if
          
            
                    atomicSetWithSync
          
            
                    return
          
            
                }
          
            
            
          
            
                readlen 
          
            
                /* If this is a multi bulk request and we are processing a large argument,
          
            
                 * try to size the read exactly to the argument boundary to avoid copying. */
          
            
                if
          
            
                    &&
          
            
                {
          
            
                    ssize_t
          
            
                    big_arg 
          
            
                    if
          
            
                } 
          
            
                    /* Use thread-local reusable query buffer to avoid allocation. */
          
            
                    if
          
            
                        thread_reusable_qb 
          
            
                        sdsclear
          
            
                    }
          
            
                    c->querybuf 
          
            
                    c->io_flags 
          
            
                    thread_reusable_qb_used 
          
            
                }
          
            
            
          
            
                qblen 
          
            
                nread 
          
            
                if
          
            
                    if
          
            
                        goto
          
            
                    } 
          
            
                        c->read_error 
          
            
                        freeClientAsync
          
            
                        goto
          
            
                    }
          
            
                } 
          
            
                    c->read_error 
          
            
                    freeClientAsync
          
            
                    goto
          
            
                }
          
            
            
          
            
                sdsIncrLen
          
            
            
          
            
                /* Parse the buffer and execute if a complete command is present. */
          
            
                if
          
            
                    c 
          
            
            }
          
3 of 8

RESP Protocol Parsing

Every client connection in Redis has a client struct that carries its query buffer, parse state, and I/O flags. When the event loop fires a read event on a socket, readQueryFromClient() is the registered handler. The function reads raw bytes into c->querybuf using connRead() -- a thin wrapper that abstracts plain TCP from TLS. The read length is usually PROTO_IOBUF_LEN (16KB), but for large RESP bulk arguments the function calculates the exact remaining bytes to avoid unnecessary copying. After the read, processInputBuffer() walks the buffer looking for complete RESP frames. If it finds one, it parses the command name and arguments and queues execution -- all without blocking the event loop on a slow client.

Key takeaway

RESP parsing in Redis is incremental and stateful: each call to readQueryFromClient() may process a partial command, and the client struct retains position across reads.

src/server.c:3878
src/server.c:3878

            
            void
          
            
                long
          
            
                uint64_t
          
            
                struct
          
            
                client 
          
            
                server.executing_client 
          
            
            
          
            
                int
          
            
            
          
            
                /* Clear propagation flags before execution. */
          
            
                c->flags 
          
            
            
          
            
                /* Snapshot dirty counter to detect mutations. */
          
            
                dirty 
          
            
                long
          
            
                incrCommandStatsOnError
          
            
            
          
            
                const
          
            
                monotime monotonic_start 
          
            
                if
          
            
                    monotonic_start 
          
            
                    /* Sync cached time periodically to avoid repeated syscalls. */
          
            
                    if
          
            
                        server.accum_call_count_since_ustime
          
            
                        if
          
            
                            server.accum_call_count_since_ustime 
          
            
                        {
          
            
                            updateCachedTime
          
            
                            monotonic_start 
          
            
                            server.monotonic_us_when_ustime 
          
            
                            server.accum_call_count_since_ustime 
          
            
                        }
          
            
                    }
          
            
                }
          
            
            
          
            
                const
          
            
                enterExecutionUnit
          
            
            
          
            
                c->flags 
          
            
            
          
            
                /* THE call: one function pointer dispatch */
          
            
                c->cmd->
          
            
            
          
            
                exitExecutionUnit
          
            
            
          
            
                if
          
            
            
          
            
                ustime_t
          
            
                if
          
            
                    duration 
          
            
                else
          
            
                    duration 
          
            
            
          
            
                c->duration 
          
            
                dirty 
          
            
                if
          
4 of 8

Command Dispatch

call() is the single chokepoint through which every Redis command passes. The actual execution is one line: c->cmd->proc(c) -- a function pointer in the command table calling, for example, setCommand or getCommand. Everything else around that line is instrumentation and propagation setup. Before the call, Redis snapshots server.dirty (a counter of keyspace changes). After the call, it computes the delta: if dirty increased, this command mutated data and needs to be written to the AOF buffer and propagated to replicas. The CLIENT_FORCE_AOF and CLIENT_FORCE_REPL flags let individual commands override the default propagation rules. The slowlog and latency monitoring hooks also fire here, making call() the natural place to measure command latency.

Key takeaway

One function pointer dispatch followed by dirty-counter arithmetic is how Redis decides whether to write to the AOF and propagate to replicas -- simplicity that scales.

src/dict.c:405
src/dict.c:405

            
            int
          
            
                int
          
            
                unsigned
          
            
                unsigned
          
            
                if
          
            
                if
          
            
                    ((s1 
          
            
                     (s1 
          
            
                {
          
            
                    return
          
            
                }
          
            
            
          
            
                while
          
            
                    assert
          
            
                    while
          
            
                        d->rehashidx
          
            
                        if
          
            
                    }
          
            
                    /* Move all the keys in this bucket from the old to the new hash table. */
          
            
                    rehashEntriesInBucketAtIndex
          
            
                    d->rehashidx
          
            
                }
          
            
            
          
            
                return
          
            
            }
          
            
            
          
            
            /* This function is called by common lookup or update operations in the
          
            
             * dictionary so that the hash table automatically migrates from H1 to H2
          
            
             * while it is actively used. */
          
            
            static
          
            
                if
          
            
            }
          
            
            
          
            
            /* Add an element to the target hash table */
          
            
            int
          
            
            {
          
            
                dictEntry 
          
            
            
          
            
                if
          
            
                if
          
            
                return
          
            
            }
          
5 of 8

Incremental Rehashing

Redis's dict is a chained hash table with two internal tables: ht_table[0] holds the current data and ht_table[1] is the resized target. When the load factor crosses a threshold, Redis does not stop and rehash everything -- it sets rehashidx to 0 and starts migrating one bucket per normal operation. Every call to dictAdd(), dictFind(), or dictDelete() routes through _dictRehashStep(), which calls dictRehash(d, 1) to move exactly one bucket. The empty_visits cap (ten times the requested steps) prevents the function from stalling on a sparse table with many empty slots. The DICT_RESIZE_AVOID flag lets Redis skip rehashing during BGSAVE or BGREWRITEAOF -- fork-based persistence works best when pages are not mutated, so rehashing is deferred.

Key takeaway

Redis rehashing is amortized across every subsequent operation: there is no rehash pause, only a steady trickle of bucket migrations piggybacked on normal reads and writes.

src/t_zset.c:254
src/t_zset.c:254

            
            /* Returns a level between 1 and ZSKIPLIST_MAXLEVEL with a powerlaw
          
            
             * distribution where higher levels are less likely. */
          
            
            static
          
            
                static
          
            
                int
          
            
                while
          
            
                    level 
          
            
                return
          
            
            }
          
            
            
          
            
            /* Insert an already-created node into the skiplist at the correct position. */
          
            
            static
          
            
                zskiplistNode 
          
            
                unsigned
          
            
                zskiplistNode 
          
            
                int
          
            
                double
          
            
                sds ele 
          
            
            
          
            
                /* Walk down from the top level, tracking the last node at each level
          
            
                 * that is still less than the insertion point. */
          
            
                x 
          
            
                for
          
            
                    rank
          
            
                    while
          
            
                        rank
          
            
                        x 
          
            
                    }
          
            
                    update
          
            
                }
          
            
            
          
            
                /* Splice the new node into all levels up to its randomly chosen height. */
          
            
                for
          
            
                    node->level[i].forward 
          
            
                    update
          
            
            
          
            
                    /* Adjust span counts so rank queries remain accurate. */
          
            
                    zslSetNodeSpanAtLevel
          
            
                        zslGetNodeSpanAtLevel
          
            
                    zslSetNodeSpanAtLevel
          
            
                }
          
            
            
          
            
                /* Update backward pointer for reverse iteration. */
          
            
                node->backward 
          
            
                if
          
            
                    node->level[
          
            
                else
          
            
                    zsl->tail 
          
            
            
          
            
                zsl->length
          
            
            }
          
6 of 8

Sorted Set Internals

Redis sorted sets (ZSET) use a dual index: a skip list for ordered range queries and a hash table for O(1) score lookups by member. The skip list is the interesting half. zslRandomLevel() decides the height of each new node by flipping a biased coin (ZSKIPLIST_P is 0.25, meaning each level has a 25% chance of growing to the next). In zslInsertNode(), a top-down traversal finds the insert position at each level, storing the predecessor nodes in update[]. The rank[] array tracks cumulative span counts so that after insertion, each node's span -- the number of elements it skips over at a given level -- is updated in O(log N) time. This span count is what makes ZRANK an O(log N) operation rather than O(N).

Key takeaway

The span counters in each skip list node are what turn ZRANK and ZRANGE from O(N) scans into O(log N) traversals -- insertion is slightly more expensive to keep rank queries cheap.

src/rdb.c:2404
src/rdb.c:2404

            
            int
          
            
                char
          
            
                char
          
            
            
          
            
                startSaving
          
            
                snprintf
          
            
            
          
            
                if
          
            
                    stopSaving
          
            
                    return
          
            
                }
          
            
            
          
            
                /* Atomic rename -- clients never see a partial RDB file. */
          
            
                if
          
            
                    serverLog
          
            
                    unlink
          
            
                    stopSaving
          
            
                    return
          
            
                }
          
            
                /* ... */
          
            
            }
          
            
            
          
            
            int
          
            
                pid_t
          
            
            
          
            
                if
          
            
                server.stat_rdb_saves
          
            
                server.dirty_before_bgsave 
          
            
                server.lastbgsave_try 
          
            
            
          
            
                if
          
            
                    /* Child process: serialize the dataset and exit. */
          
            
                    redisSetProcTitle
          
            
                    redisSetCpuAffinity
          
            
                    int
          
            
                    if
          
            
                        sendChildCowInfo
          
            
                    }
          
            
                    exitFromChild
          
            
                } 
          
            
                    /* Parent process: record child PID and return immediately. */
          
            
                    if
          
            
                        server.lastbgsave_status 
          
            
                        serverLog
          
            
                        return
          
            
                    }
          
            
                    serverLog
          
            
                    server.rdb_save_time_start 
          
            
                    server.rdb_child_type 
          
            
                    return
          
            
                }
          
            
                return
          
            
            }
          
7 of 8

Fork-Based Snapshotting

BGSAVE is three ideas working together: fork(), copy-on-write (COW), and atomic rename. rdbSaveBackground() calls redisFork(), which on success returns 0 in the child and the child PID in the parent. The parent returns C_OK immediately and keeps serving clients -- it never blocks. The child process has an exact copy of the parent's virtual address space courtesy of the OS's COW semantics: pages are shared until one side modifies them. The child then serializes the entire dataset to a temp file (temp-<pid>.rdb) using rdbSaveInternal(). When done, it calls rename() -- an atomic operation on POSIX systems -- to replace the live RDB file. Clients reading the old RDB file see a consistent snapshot; the new one appears instantaneously.

Key takeaway

Redis achieves non-blocking snapshots by delegating the entire serialization to a forked child; the OS's copy-on-write mechanism means the snapshot is consistent without any locking on the parent.

src/aof.c:2748
src/aof.c:2748

            
            void
          
            
                sds buf 
          
            
            
          
            
                serverAssert
          
            
            
          
            
                /* Prepend a timestamp annotation if aof-timestamp-enabled is set. */
          
            
                if
          
            
                    sds ts 
          
            
                    if
          
            
                        buf 
          
            
                        sdsfree
          
            
                    }
          
            
                }
          
            
            
          
            
                /* If the target DB changed since the last appended command,
          
            
                 * emit a SELECT so the AOF is self-contained. */
          
            
                if
          
            
                    char
          
            
                    snprintf
          
            
                    buf 
          
            
                        (
          
            
                    server.aof_selected_db 
          
            
                }
          
            
            
          
            
                /* Serialize the command in RESP format and append to the buffer. */
          
            
                buf 
          
            
            
          
            
                /* Append to the in-memory AOF buffer. It will be flushed to disk just
          
            
                 * before the next event loop iteration, after the client gets its reply. */
          
            
                if
          
            
                    (server.aof_state 
          
            
                     server.child_type 
          
            
                {
          
            
                    server.aof_buf 
          
            
                }
          
            
            
          
            
                sdsfree
          
            
            }
          
8 of 8

AOF Persistence

Every write command that passes through call() is also handed to feedAppendOnlyFile(). The function does not write to disk -- it appends to server.aof_buf, an in-memory SDS string. The actual write() syscall happens in flushAppendOnlyFile(), which runs in the beforesleep hook of the event loop, just before Redis blocks on aeApiPoll(). This means the disk write always happens after the client reply is queued but before Redis sleeps -- so durability lags client response by at most one event loop iteration. The SELECT injection is a subtle correctness detail: the AOF file must be self-contained so that redis-check-aof and BGREWRITEAOF can replay it on any database state. The AOF_WAIT_REWRITE state handles the case where an AOF rewrite is in progress and new commands must be buffered for both the live AOF and the rewrite child.

Key takeaway

AOF durability is a two-stage pipeline: commands land in an in-memory buffer inside call(), then flush to disk in the event loop's pre-sleep hook -- decoupling command latency from disk I/O.

Step 1 of 8

Try this tour interactively

Walk through this tour step-by-step in VS Code with Intraview. The AI agent will guide you through the actual codebase.

Start Interactive Tour in VS Code

Free for up to 25 tours/month. Works with VS Code, Cursor, and Roo Code.

Last verified against Redis@a0bad9a on

architectureevent-looppersistencedata-structures

Keyboard Shortcuts