Redis Architecture Grand Tour
How Redis processes commands, stores data, and persists to disk
What you will learn
- How Redis bootstraps from a single main() call into a fully wired server with signal handlers, databases, and a network listener
- Why Redis can handle tens of thousands of clients with a single thread: the ae event loop and its epoll/kqueue abstraction
- How the RESP protocol is parsed incrementally from a raw socket read into a command ready for execution
- How every keyspace write is dispatched through a central call() function that handles propagation to AOF and replicas in one place
- How Redis avoids O(n) rehashing pauses by spreading hash table migration across regular read and write operations
Prerequisites
- Comfortable reading C (you don't need to write it, just follow the logic)
- Basic understanding of key-value stores: what GET/SET do
- Familiarity with the concept of an event loop is helpful but not required
src/server.c:8088
serverLog
serverLog
"Redis version=
REDIS_VERSION,
(
redisGitSHA1
strtol
(
if
serverLog
}
initServer
if
if
redisAsciiArt
checkTcpBacklogSettings
if
clusterCommonInit
clusterInit
}
if
moduleInitModulesSystemLast
moduleLoadInternalModules
moduleLoadFromQueue
}
ACLLoadUsersAtStartup
initListeners
/* ... load data from disk ... */
serverLog
loadDataFromDisk
aeMain
aeDeleteEventLoop
return
Server Lifecycle
main() in server.c is 355 lines of orchestration. By the time you reach line 8103, the config has been parsed, the locale is set, and the entropy pool is seeded. The real wiring happens in initServer(): signal handlers, database allocation, the event loop, and the listening socket all come up inside that one call. After that, main() loads persisted data from disk (RDB or AOF), then hands control to aeMain() -- a while loop that never returns until the process is told to stop. The return 0 at line 8176 is practically unreachable in production; it only runs on a clean shutdown. Notice how loadDataFromDisk() happens after the listener is configured but before the event loop starts -- clients cannot connect and run commands against a half-loaded dataset.
Redis startup is a strict ordered sequence: configure, wire, load data, then enter the event loop -- and that order is not accidental.
src/ae.c:360
int
{
int
/* Nothing to do? return ASAP */
if
if
((flags
int
struct
int64_t
if
eventLoop->
if
tv.tv_sec
tvp
}
usUntilTimer
if
tv.tv_sec
tv.tv_usec
tvp
}
}
/* Call the multiplexing API, will return only on timeout or when
* some event fires. */
numevents
/* After sleep callback. */
if
eventLoop->
for
int
aeFileEvent
int
/* dispatch to read or write handler */
}
}
/* ... time event processing ... */
}
void
eventLoop->stop
while
aeProcessEvents
AE_CALL_BEFORE_SLEEP
AE_CALL_AFTER_SLEEP);
}
}
The Event Loop
The entire Redis concurrency model lives in these 140 lines. aeMain() is a while loop that calls aeProcessEvents() on every iteration. Inside aeProcessEvents(), the key move is aeApiPoll() -- a thin abstraction over epoll on Linux or kqueue on macOS. The poll call blocks for at most as long as the next scheduled timer event (usUntilEarliestTimer). When it returns, Redis has a list of file descriptors that are ready for I/O. Each one gets dispatched to its registered read or write handler. No threads, no callbacks in the Node.js sense -- just one tight loop. The beforesleep and aftersleep hooks let other subsystems (replication, AOF flushing) piggyback on the loop without breaking the single-threaded model.
Redis's legendary throughput comes from a seven-line while loop backed by the OS's own I/O readiness notification -- the event loop is not Redis magic, it is Unix done right.
src/networking.c:3718
void
client
int
size_t
if
atomicSetWithSync
return
}
readlen
/* If this is a multi bulk request and we are processing a large argument,
* try to size the read exactly to the argument boundary to avoid copying. */
if
&&
{
ssize_t
big_arg
if
}
/* Use thread-local reusable query buffer to avoid allocation. */
if
thread_reusable_qb
sdsclear
}
c->querybuf
c->io_flags
thread_reusable_qb_used
}
qblen
nread
if
if
goto
}
c->read_error
freeClientAsync
goto
}
}
c->read_error
freeClientAsync
goto
}
sdsIncrLen
/* Parse the buffer and execute if a complete command is present. */
if
c
}
RESP Protocol Parsing
Every client connection in Redis has a client struct that carries its query buffer, parse state, and I/O flags. When the event loop fires a read event on a socket, readQueryFromClient() is the registered handler. The function reads raw bytes into c->querybuf using connRead() -- a thin wrapper that abstracts plain TCP from TLS. The read length is usually PROTO_IOBUF_LEN (16KB), but for large RESP bulk arguments the function calculates the exact remaining bytes to avoid unnecessary copying. After the read, processInputBuffer() walks the buffer looking for complete RESP frames. If it finds one, it parses the command name and arguments and queues execution -- all without blocking the event loop on a slow client.
RESP parsing in Redis is incremental and stateful: each call to readQueryFromClient() may process a partial command, and the client struct retains position across reads.
src/server.c:3878
void
long
uint64_t
struct
client
server.executing_client
int
/* Clear propagation flags before execution. */
c->flags
/* Snapshot dirty counter to detect mutations. */
dirty
long
incrCommandStatsOnError
const
monotime monotonic_start
if
monotonic_start
/* Sync cached time periodically to avoid repeated syscalls. */
if
server.accum_call_count_since_ustime
if
server.accum_call_count_since_ustime
{
updateCachedTime
monotonic_start
server.monotonic_us_when_ustime
server.accum_call_count_since_ustime
}
}
}
const
enterExecutionUnit
c->flags
/* THE call: one function pointer dispatch */
c->cmd->
exitExecutionUnit
if
ustime_t
if
duration
else
duration
c->duration
dirty
if
Command Dispatch
call() is the single chokepoint through which every Redis command passes. The actual execution is one line: c->cmd->proc(c) -- a function pointer in the command table calling, for example, setCommand or getCommand. Everything else around that line is instrumentation and propagation setup. Before the call, Redis snapshots server.dirty (a counter of keyspace changes). After the call, it computes the delta: if dirty increased, this command mutated data and needs to be written to the AOF buffer and propagated to replicas. The CLIENT_FORCE_AOF and CLIENT_FORCE_REPL flags let individual commands override the default propagation rules. The slowlog and latency monitoring hooks also fire here, making call() the natural place to measure command latency.
One function pointer dispatch followed by dirty-counter arithmetic is how Redis decides whether to write to the AOF and propagate to replicas -- simplicity that scales.
src/dict.c:405
int
int
unsigned
unsigned
if
if
((s1
(s1
{
return
}
while
assert
while
d->rehashidx
if
}
/* Move all the keys in this bucket from the old to the new hash table. */
rehashEntriesInBucketAtIndex
d->rehashidx
}
return
}
/* This function is called by common lookup or update operations in the
* dictionary so that the hash table automatically migrates from H1 to H2
* while it is actively used. */
static
if
}
/* Add an element to the target hash table */
int
{
dictEntry
if
if
return
}
Incremental Rehashing
Redis's dict is a chained hash table with two internal tables: ht_table[0] holds the current data and ht_table[1] is the resized target. When the load factor crosses a threshold, Redis does not stop and rehash everything -- it sets rehashidx to 0 and starts migrating one bucket per normal operation. Every call to dictAdd(), dictFind(), or dictDelete() routes through _dictRehashStep(), which calls dictRehash(d, 1) to move exactly one bucket. The empty_visits cap (ten times the requested steps) prevents the function from stalling on a sparse table with many empty slots. The DICT_RESIZE_AVOID flag lets Redis skip rehashing during BGSAVE or BGREWRITEAOF -- fork-based persistence works best when pages are not mutated, so rehashing is deferred.
Redis rehashing is amortized across every subsequent operation: there is no rehash pause, only a steady trickle of bucket migrations piggybacked on normal reads and writes.
src/t_zset.c:254
/* Returns a level between 1 and ZSKIPLIST_MAXLEVEL with a powerlaw
* distribution where higher levels are less likely. */
static
static
int
while
level
return
}
/* Insert an already-created node into the skiplist at the correct position. */
static
zskiplistNode
unsigned
zskiplistNode
int
double
sds ele
/* Walk down from the top level, tracking the last node at each level
* that is still less than the insertion point. */
x
for
rank
while
rank
x
}
update
}
/* Splice the new node into all levels up to its randomly chosen height. */
for
node->level[i].forward
update
/* Adjust span counts so rank queries remain accurate. */
zslSetNodeSpanAtLevel
zslGetNodeSpanAtLevel
zslSetNodeSpanAtLevel
}
/* Update backward pointer for reverse iteration. */
node->backward
if
node->level[
else
zsl->tail
zsl->length
}
Sorted Set Internals
Redis sorted sets (ZSET) use a dual index: a skip list for ordered range queries and a hash table for O(1) score lookups by member. The skip list is the interesting half. zslRandomLevel() decides the height of each new node by flipping a biased coin (ZSKIPLIST_P is 0.25, meaning each level has a 25% chance of growing to the next). In zslInsertNode(), a top-down traversal finds the insert position at each level, storing the predecessor nodes in update[]. The rank[] array tracks cumulative span counts so that after insertion, each node's span -- the number of elements it skips over at a given level -- is updated in O(log N) time. This span count is what makes ZRANK an O(log N) operation rather than O(N).
The span counters in each skip list node are what turn ZRANK and ZRANGE from O(N) scans into O(log N) traversals -- insertion is slightly more expensive to keep rank queries cheap.
src/rdb.c:2404
int
char
char
startSaving
snprintf
if
stopSaving
return
}
/* Atomic rename -- clients never see a partial RDB file. */
if
serverLog
unlink
stopSaving
return
}
/* ... */
}
int
pid_t
if
server.stat_rdb_saves
server.dirty_before_bgsave
server.lastbgsave_try
if
/* Child process: serialize the dataset and exit. */
redisSetProcTitle
redisSetCpuAffinity
int
if
sendChildCowInfo
}
exitFromChild
}
/* Parent process: record child PID and return immediately. */
if
server.lastbgsave_status
serverLog
return
}
serverLog
server.rdb_save_time_start
server.rdb_child_type
return
}
return
}
Fork-Based Snapshotting
BGSAVE is three ideas working together: fork(), copy-on-write (COW), and atomic rename. rdbSaveBackground() calls redisFork(), which on success returns 0 in the child and the child PID in the parent. The parent returns C_OK immediately and keeps serving clients -- it never blocks. The child process has an exact copy of the parent's virtual address space courtesy of the OS's COW semantics: pages are shared until one side modifies them. The child then serializes the entire dataset to a temp file (temp-<pid>.rdb) using rdbSaveInternal(). When done, it calls rename() -- an atomic operation on POSIX systems -- to replace the live RDB file. Clients reading the old RDB file see a consistent snapshot; the new one appears instantaneously.
Redis achieves non-blocking snapshots by delegating the entire serialization to a forked child; the OS's copy-on-write mechanism means the snapshot is consistent without any locking on the parent.
src/aof.c:2748
void
sds buf
serverAssert
/* Prepend a timestamp annotation if aof-timestamp-enabled is set. */
if
sds ts
if
buf
sdsfree
}
}
/* If the target DB changed since the last appended command,
* emit a SELECT so the AOF is self-contained. */
if
char
snprintf
buf
(
server.aof_selected_db
}
/* Serialize the command in RESP format and append to the buffer. */
buf
/* Append to the in-memory AOF buffer. It will be flushed to disk just
* before the next event loop iteration, after the client gets its reply. */
if
(server.aof_state
server.child_type
{
server.aof_buf
}
sdsfree
}
AOF Persistence
Every write command that passes through call() is also handed to feedAppendOnlyFile(). The function does not write to disk -- it appends to server.aof_buf, an in-memory SDS string. The actual write() syscall happens in flushAppendOnlyFile(), which runs in the beforesleep hook of the event loop, just before Redis blocks on aeApiPoll(). This means the disk write always happens after the client reply is queued but before Redis sleeps -- so durability lags client response by at most one event loop iteration. The SELECT injection is a subtle correctness detail: the AOF file must be self-contained so that redis-check-aof and BGREWRITEAOF can replay it on any database state. The AOF_WAIT_REWRITE state handles the case where an AOF rewrite is in progress and new commands must be buffered for both the live AOF and the rewrite child.
AOF durability is a two-stage pipeline: commands land in an in-memory buffer inside call(), then flush to disk in the event loop's pre-sleep hook -- decoupling command latency from disk I/O.
Try this tour interactively
Walk through this tour step-by-step in VS Code with Intraview. The AI agent will guide you through the actual codebase.
Start Interactive Tour in VS CodeFree for up to 25 tours/month. Works with VS Code, Cursor, and Roo Code.