Interview prep · Junior → Senior

Redis Interview Questions

The questions a Redis interviewer actually asks — the data model and the right data type for a job, keys, expiration and eviction, the single-threaded execution model and where its speed comes from, persistence and the honest limits of durability (RDB vs AOF), transactions and Lua scripting, replication and Sentinel, Redis Cluster and the 16384 hash slots, caching and messaging patterns (pub/sub vs Streams), and the design cluster — rate limiters, leaderboards, session stores, and the much-argued distributed lock — each answered with a redis-cli sequence, a config directive, a short Lua snippet, or a worked example, and a link to the source. This page covers Redis the open-source in-memory data-structure server; for what changed in each release — when Streams landed, when Redis went source-available and then open source again — see the Redis version reference.

Difficulty

Junior — the core vocabulary: what Redis is, the data types, keys and TTLs, GET/SET.

Mid — Redis in practice: picking a structure, eviction, pipelining, MULTI/Lua, pub/sub vs Streams.

Senior — the hard parts: persistence vs durability, Cluster cross-slot rules, Sentinel failover, and the design tradeoffs (Redlock).

Difficulty

Topic

Showing 0 of 0 questions

1 · Core concepts & the data model

What is Redis? Junior

Redis is an in-memory data-structure store. The common one-liner “a key-value cache” undersells it: the values aren't just strings, they're server-side data structures — hashes, lists, sets, sorted sets, streams, and more — each with its own commands that run atomically on the server. Because everything lives in RAM, operations are typically microsecond-fast; because it optionally persists to disk and replicates, it can be more than a throwaway cache. That combination is why the same engine is used three different ways: as a cache in front of a slower store, as a primary datastore for data that fits in memory, and as a message broker (pub/sub and Streams).

# values are structures, not just opaque blobs:
127.0.0.1:6379> SET user:42:name "Ada"          # a string
127.0.0.1:6379> ZADD leaderboard 100 ada        # a sorted set — server keeps it ordered
127.0.0.1:6379> HSET session:abc uid 42 csrf x9  # a hash — one key, many fields

Why it's asked / follow-up: it sets the frame — if you think of Redis as “just a cache,” the data-type, persistence, and scaling answers all come out shallow. Follow-up: “is it a database or a cache?” — it can be either; see using Redis as a primary store.

Source: Redis docs — Get started / Introduction. Release history and the licensing story: the Redis version reference.

Is Redis only a cache, or can it be a primary database? Junior

It can be either — the honest answer is “it depends on the data and your durability tolerance.” As a cache, you put a TTL on keys and treat a loss as a miss that re-populates from the source of truth. As a primary store, you turn on persistence (RDB and/or AOF) and replication and rely on Redis to hold the data. The two hard constraints on using it as a primary store are (1) the working set must fit in memory (Redis is RAM-first, not disk-first), and (2) its default durability can lose the last writes on a crash — so it's a poor fit for “must never lose a single committed write” data unless you tune persistence and accept the throughput cost.

# cache mode: every key has a TTL, a miss is survivable
SET page:home "<html>..." EX 60
# primary-store mode: persistence + replication on, working set fits in RAM
# (durability is still bounded — see the persistence section)

Why it's asked / follow-up: it checks that you know Redis's limits, not just its speed. Follow-up: “when would you not use it?” — that's its own scenario question (dataset > RAM, strict durability/consistency, complex queries).

Source: Redis docs — Persistence.

How does Redis differ from a relational database? Mid

An RDBMS is disk-first, stores rows in tables, and lets you ask ad-hoc questions in SQL — joins, aggregations, filters — that the engine plans and optimizes at query time. Redis is memory-first and has no query planner, no SQL, and no joins: you don't ask Redis to find data, you reach for a key you already know and operate on the structure under it. That means you design the access pattern up front — you store data in the shape you'll read it (a sorted set for a ranking, a hash for an object) rather than normalizing and querying. The upside is predictable, single-digit-microsecond operations; the tradeoff is that a question you didn't design a key for is expensive or impossible (there's no WHERE age > 30 across all users).

// RDBMS:  SELECT * FROM users WHERE city = 'Rome';   ← engine finds rows
// Redis:  SMEMBERS users:by_city:Rome                 ← YOU maintained that set
// design the key for the read; there is no query planner to fall back on.

Why it's asked / follow-up: it tests whether you understand access-pattern-first modeling, the core mental shift when moving from SQL. Follow-up: “how do you do a secondary lookup?” — you maintain your own index (an extra set or sorted set) at write time.

Source: Redis docs — Understand Redis data types.

How is the keyspace organized, and what's the namespace:key convention? Junior

Redis has one flat global keyspace — keys are binary-safe strings with no built-in hierarchy. There are numbered logical databases (0–15 by default, selected with SELECT), but they share one process and one memory pool and are discouraged in modern practice (and Cluster mode only allows db 0). Because there's no real namespacing, the community convention is to fake it with a colon-delimited prefix: user:42:sessions, cart:99, rate:ip:1.2.3.4. It's purely a naming discipline — Redis treats the colon as an ordinary byte — but it keeps keys grouped, greppable, and easy to reason about (and it's what SCAN MATCH user:* leans on).

# convention only — the colon has no special meaning to Redis:
SET user:42:name "Ada"
SET user:42:email "ada@x.io"
SCAN 0 MATCH user:42:* COUNT 100   # group by prefix

Why it's asked / follow-up: it checks basic hygiene — interviewers have seen keyspaces with no convention become unmanageable. Follow-up: “why not use the numbered databases to separate concerns?” — they don't isolate memory or CPU, break under Cluster, and a key prefix or a separate instance is the cleaner boundary.

Source: Redis docs — Keyspace.

2 · Data types & when to use each

What data types does Redis provide, and how do you pick one? Junior

The core types are strings (bytes / numbers, plus bitmaps and bitfields), hashes (field→value maps, i.e. objects), lists (ordered, push/pop from either end), sets (unordered unique members), sorted sets / ZSETs (unique members ordered by a score), and streams (an append-only log with consumer groups). Alongside them are HyperLogLog (cardinality estimate in tiny space) and geospatial indexes. Picking the structure is the modeling step: a leaderboard is a sorted set, a session or object is a hash, a job queue is a list or a stream, a set of tags or unique visitors is a set (or HLL if you only need the count), a rate counter is a string with INCR.

# the structure IS the design decision:
ZADD  leaderboard 4200 ada       # ranking      → sorted set
HSET  session:abc uid 42          # object       → hash
LPUSH jobs "{...}"                # queue        → list (or a stream)
SADD  tags:post:9 redis nosql       # unique set   → set
INCR  page:home:views              # counter      → string

Why it's asked / follow-up: choosing the wrong structure is the most common Redis design mistake (e.g. scanning a list where a sorted set would be O(log N)). Follow-up: “which are ordered?” — lists (insertion order) and sorted sets (by score); sets and hashes are not.

Source: Redis docs — Data types.

What is a Sorted Set (ZSET), and why is it the leaderboard structure? Mid

A sorted set holds unique members each with a floating-point score, and Redis keeps them ordered by that score at all times. That's exactly a leaderboard: the member is the player, the score is the points, and the ranking is maintained for you. Adds and updates (ZADD, ZINCRBY) are O(log N), and reading a range or a rank (ZREVRANGE, ZRANK) is cheap, so “top 10” and “what rank is this player” are both fast even with millions of members. The same structure powers priority queues, time-ordered indexes (score = timestamp), and sliding-window rate limiters.

ZADD leaderboard 100 ada 250 bob 175 cy
ZINCRBY leaderboard 50 ada          # ada now 150 — re-sorted automatically
ZREVRANGE leaderboard 0 2 WITHSCORES  # top 3: bob 250, cy 175, ada 150
ZREVRANK leaderboard bob             # → 0  (bob is rank #1)

Why it's asked / follow-up: “design a leaderboard” is a near-guaranteed Redis question and it hinges on knowing this type exists. Follow-up: “how do you do a sliding-window rate limiter with it?” — store request timestamps as scores and ZREMRANGEBYSCORE the old ones (see the rate-limiter scenario).

Source: Redis docs — Sorted sets.

When do you use a Hash instead of many string keys? Junior

A hash maps fields to values under a single key — it's the natural fit for an object (a user, a session, a config record). Instead of user:42:name, user:42:email, user:42:age as three separate keys, you keep one user:42 hash with three fields. That's tidier, lets you read or write the whole object atomically (HGETALL / HSET with many pairs), and — for small hashes — is stored in a compact encoding (a listpack) that uses far less memory than the equivalent standalone keys. The tradeoff: a hash has no per-field expiration in classic Redis (whole-key TTL only), though newer versions added hash-field TTLs.

HSET user:42 name "Ada" email "ada@x.io" age 36
HGET user:42 email                # → "ada@x.io"  (read one field)
HGETALL user:42                    # → the whole object, one round-trip
# small hashes use a compact listpack encoding — cheaper than 3 separate keys

Why it's asked / follow-up: it tests memory-aware modeling — hashes are the everyday object store and the memory win is real at scale. Follow-up: “can I expire one field?” — classic Redis expires whole keys; per-field TTL (HEXPIRE) is a later addition, so the common pattern is a separate key when you need field-level expiry.

Source: Redis docs — Hashes.

What did Redis 8.0 fold into core from Redis Stack (JSON, probabilistic types, vector sets)? Mid

Historically the “extra” structures shipped as separate modules bundled as Redis Stack (RedisJSON, RediSearch, RedisTimeSeries, RedisBloom). Redis 8.0 (May 2025) folded those data structures into the core server, so a stock Redis now includes JSON (documents with path access), time series, the probabilistic family (Bloom filter, Cuckoo filter, Count-Min sketch, Top-K, t-digest — all trading exactness for tiny, fixed memory), and a new vector set type (beta) for similarity search over embeddings. For an interview you don't need module internals — you need to know these exist in modern Redis and what each is for (e.g. a Bloom filter answers “have I probably seen this before?” without storing every value).

# examples of the now-in-core structures (Redis 8.0+):
JSON.SET u:42 $ '{"name":"Ada","tags":["redis"]}'   # JSON document
BF.ADD   seen:emails ada@x.io                     # Bloom filter (probabilistic membership)
VADD    embeddings VALUES 4 0.1 0.2 0.9 0.4 item:1 # vector set (beta) for similarity search

Why it's asked / follow-up: it separates candidates who know current Redis from those frozen at “strings, lists, sets.” Follow-up: “why did they move into core?” — part of the 8.0 release that also re-opened the license; see the arrival on the version reference (8.0).

Source: Redis 8 GA announcement; the 8.0 row on the version reference.

How do you reason about the Big-O cost of a Redis command? Mid

Every command's page in the docs lists its time complexity, and on Redis it matters more than on most systems because a slow command blocks every other client (single-threaded execution). Rough map: GET/SET/HGET/INCR are O(1); sorted-set position ops (ZADD, ZRANK) are O(log N); and the dangerous ones are O(N) in the size of a value or the keyspace — LRANGE key 0 -1, SMEMBERS, HGETALL on a huge object, and above all KEYS *. The senior instinct is: prefer O(1)/O(log N) access, avoid returning giant values, and never run an O(N)-over-the-keyspace command in production.

# O(1)      → GET, SET, HGET, INCR, LPUSH, SADD, EXPIRE
# O(log N)  → ZADD, ZRANK, ZRANGEBYSCORE (per element)
# O(N)      → LRANGE 0 -1, SMEMBERS, HGETALL (big), SORT — size of the value
# O(N) over the whole keyspace → KEYS  ← blocks the server; use SCAN

Why it's asked / follow-up: it ties data-type choice to the single-threaded model — the two most important ideas on the page. Follow-up: “how do you iterate all keys safely?” — SCAN, not KEYS (see SCAN vs KEYS).

Source: Redis command reference (each command lists its time complexity).

3 · Keys, expiration & eviction

How does key expiration actually work? Mid

You attach a TTL with EXPIRE, PEXPIRE, SET ... EX/PX, or SETEX; TTL/PTTL read it, PERSIST removes it. The subtlety interviewers probe is when the key is actually deleted. Redis uses two mechanisms together: lazy (passive) expiration — when a client touches a key, Redis checks if it's expired and drops it then — and active expiration — a background cycle that samples a batch of keys with TTLs several times a second and deletes the expired ones, repeating if it finds many. So an expired key that nobody reads is reclaimed by the sampler, not instantly at its expiry instant — which means a small amount of already-expired data can briefly sit in memory.

SET otp:42 "913820" EX 120   # expires in 120s
TTL otp:42                    # → 118 (seconds left)
PERSIST otp:42                # remove the TTL — key is now permanent
# deletion = lazy (on access) + active (background sampling), not a per-key timer

Why it's asked / follow-up: the “is it deleted exactly at expiry?” nuance separates people who've read the docs from those guessing. Follow-up: “does a replica expire keys on its own?” — no; replicas wait for the primary's DEL, so a read on a replica can briefly return a logically-expired key (older versions) — the primary is the clock.

Source: Redis docs — EXPIRE (“How Redis expires keys”).

What is maxmemory, and what are the eviction policies? Mid

maxmemory caps how much RAM Redis uses; maxmemory-policy decides what happens when that cap is hit. The default is noeviction — Redis stops accepting writes and returns an error, keeping all data (right for a datastore, wrong for a cache). The eviction policies split on two axes: which keys are candidates — allkeys-* (any key) vs volatile-* (only keys with a TTL) — and how a victim is chosen — lru (least recently used), lfu (least frequently used), random, or ttl (soonest to expire). For a cache, allkeys-lru is the usual pick (or allkeys-lfu when popularity, not recency, should decide). Eviction is approximate — Redis samples a handful of keys rather than scanning all of them, so it's LRU/LFU-ish, not exact, which is a deliberate speed tradeoff.

maxmemory 2gb
maxmemory-policy allkeys-lru   # evict least-recently-used across ALL keys
# noeviction (default) → writes error out when full, nothing is dropped
# volatile-*           → only evict keys that have a TTL
maxmemory-samples 5          # approximate LRU/LFU — samples, doesn't scan

Why it's asked / follow-up: misconfigured eviction is a classic incident (a “cache” left on noeviction that starts erroring, or a datastore on allkeys-lru silently dropping data). Follow-up: “LRU vs LFU?” — LRU favors recently-touched keys; LFU favors frequently-touched ones, so LFU resists a one-off scan evicting your hot set.

Source: Redis docs — Key eviction.

What's the difference between expiration and eviction? Mid

They're often conflated but they're different mechanisms. Expiration is your intent: you gave a key a TTL and it should die when the clock runs out, regardless of memory pressure. Eviction is the server's reaction to being full: when maxmemory is reached, Redis removes keys per maxmemory-policy to make room — possibly deleting keys that had plenty of TTL left (or no TTL at all, under an allkeys-* policy). So a key can vanish two ways: its TTL elapsed (expiration) or Redis needed the memory (eviction). If your policy is volatile-*, only keys with a TTL are eviction candidates — which means a keyspace full of TTL-less keys can hit “out of memory” with nothing eligible to evict.

# expiration: I set a TTL; the key dies on schedule
SET cache:x "..." EX 300
# eviction: server is full → policy removes SOME key to free memory
#   allkeys-lru  → may drop cache:x even with TTL remaining
#   volatile-lru → only TTL-bearing keys are candidates (gotcha if few have TTLs)

Why it's asked / follow-up: it's a clean check of vocabulary that trips people up under pressure. Follow-up: “you chose volatile-lru but keys still fail to write — why?” — too few keys have TTLs, so there's nothing to evict; it behaves like noeviction.

Source: Redis docs — Key eviction.

SCAN vs KEYS — why is KEYS dangerous in production? Mid

KEYS pattern walks the entire keyspace in a single blocking call — O(N) in the number of keys — and because Redis executes commands on one thread, that call stalls every other client until it finishes. On a keyspace with millions of keys that's a multi-second freeze: a self-inflicted outage. SCAN solves it with a cursor: each call returns a small batch plus a cursor to resume from, so iteration is spread across many O(1)-ish calls that never block the server for long. SCAN is guaranteed to return every key present for the whole scan (and won't miss stable keys), though it may return duplicates and reflect concurrent changes. The rule is blunt: never KEYS in production; use SCAN (and its cousins HSCAN/SSCAN/ZSCAN).

# DON'T: one blocking O(N) sweep — freezes the server
KEYS user:*
# DO: cursor-based, non-blocking, resumable
SCAN 0 MATCH user:* COUNT 200    # → cursor + batch; repeat until cursor 0

Why it's asked / follow-up: it's the canonical “have you run Redis in prod?” tell, and it reinforces the single-threaded model. Follow-up: “can SCAN miss keys or repeat them?” — it won't miss keys that stay in place for the full scan, but it can return duplicates, so dedupe on the client.

Source: Redis docs — SCAN.

4 · The single-threaded model & performance

Is Redis single-threaded, and why is that fast? Mid

Yes for the part that matters: commands execute on a single thread, one at a time, in the order they're received. Counterintuitively that's a feature, not a bottleneck. Because everything is in memory, a command is dominated by the network round-trip, not CPU; a single thread over an epoll event loop can service hundreds of thousands of ops/second without the cost and complexity of locks. And single-threaded execution gives you a free correctness property: every command is atomic — no other command can interleave with it — which is why INCR, LPUSH, and even multi-step Lua scripts don't need explicit locking. (Modern Redis does use extra threads for a few things — background disk I/O, lazy freeing, and optional threaded network I/O — but the command that mutates your data still runs on the one main thread.)

// two clients, same instant — they can't interleave:
INCR counter   // client A → 1
INCR counter   // client B → 2   (serialized on the single thread; no lost update)
// atomicity is a byproduct of single-threaded execution, not extra locking.

Why it's asked / follow-up: it's the identity question of Redis, and the naive answer (“single-threaded, so it must be slow / can't use my cores”) is wrong. Follow-up: “so how does it use a multi-core box?” — threaded I/O for the network layer, and multiple Redis processes / a Cluster for CPU scale (see threaded I/O and Cluster).

Source: Redis docs — Redis latency & the single-threaded model; antirez on the design rationale: antirez.com.

What does threaded I/O (Redis 6.0) actually parallelize? Senior

This is the trap in “is Redis multi-threaded now?” Redis 6.0 added threaded I/O (the io-threads option), but it only parallelizes reading requests off sockets and writing responses back — the network layer. Command execution is still single-threaded. The main thread still runs your GET/SET/ZADD one at a time; the I/O threads just take the syscall-heavy socket work off its plate so it isn't the bottleneck when you're serving many connections or large values. So the atomicity guarantee is unchanged, and the answer to “does 6.0 make Redis multi-threaded?” is “only for network I/O, not for the data operations.”

io-threads 4              # 1 main thread + 3 I/O threads for socket read/write
io-threads-do-reads yes   # also thread the read path (writes are threaded by default)
# commands STILL execute on the single main thread — only I/O is parallel.

Why it's asked / follow-up: it distinguishes a headline-reader from someone who understands where the parallelism actually is. Follow-up: “when does turning it on help?” — when the box is network-bound (many clients / big payloads), not when it's CPU-bound on the commands themselves; arrival on the version reference (6.0).

Source: Redis — Diving into Redis 6 (threaded I/O); the 6.0 row on the version reference.

How does one slow command hurt everyone, and how do you find it? Senior

Because commands run on one thread, a single O(N) command holds the thread for its whole duration and every other client waits behind it — latency for unrelated fast commands spikes. The usual culprits: KEYS *, a big HGETALL/SMEMBERS/LRANGE 0 -1 on a huge collection, a large SORT, or deleting a multi-million-element key with DEL (use UNLINK for async free). The tool for catching them is the SLOWLOG, which records commands that exceeded a microsecond threshold, plus redis-cli --latency and the LATENCY commands for measuring the event-loop stalls themselves.

CONFIG SET slowlog-log-slower-than 10000   # log commands over 10ms (10000µs)
SLOWLOG GET 10                           # the 10 most recent slow commands
$ redis-cli --latency                    # live latency sampling against the server
UNLINK huge:set                          # free a big key on a background thread, not inline

Why it's asked / follow-up: production Redis incidents are usually “one bad command” rather than load, and this checks you know how to hunt it. Follow-up: “DEL vs UNLINK?” — DEL frees inline (can block on a giant key); UNLINK unlinks immediately and reclaims memory on a background thread.

Source: Redis docs — SLOWLOG.

What is pipelining, and when does it help? Mid

Most Redis latency is the network round-trip, not the command. If you send 1,000 commands one at a time, you pay 1,000 round-trips. Pipelining lets the client send many commands without waiting for each reply, then read all the replies at once — collapsing 1,000 round-trips into (roughly) one and often improving throughput by an order of magnitude. It changes nothing about ordering or atomicity: the server still runs the commands one at a time in order; you've just stopped paying the latency tax per command. It is not a transaction — other clients' commands can still interleave between your pipelined ones (see pipelining vs transactions).

# without pipelining: N commands = N round-trips
# with pipelining: send all N, then read all N replies (~1 round-trip)
$ redis-cli --pipe < commands.txt      # bulk-load via the pipe protocol
// client libs expose pipeline()/multi-send; replies come back in order.

Why it's asked / follow-up: it's the first-reach throughput lever and it's frequently confused with transactions. Follow-up: “does pipelining guarantee the commands run together?” — no; use MULTI/EXEC or a Lua script if you need no interleaving.

Source: Redis docs — Pipelining.

5 · Persistence & durability

RDB vs AOF — how does each persistence mode work? Mid

Redis has two persistence mechanisms. RDB is a point-in-time snapshot: at configured intervals (the save points) or on BGSAVE, Redis fork()s and the child writes the whole dataset to a compact .rdb file using copy-on-write, while the parent keeps serving. It's fast to load and great for backups, but between snapshots you can lose everything since the last one. AOF (append-only file) is a write-ahead log of commands: every write is appended to a log that's replayed on restart, and it's periodically rewritten/compacted so it doesn't grow forever. AOF loses far less on a crash but is a bigger file and slightly slower to load. Many production setups run both: AOF for point-of-failure recovery, RDB for fast restarts and backups.

# RDB: periodic snapshot via fork + copy-on-write
save 900 1   300 10   60 10000   # snapshot if N changes in T seconds
# AOF: append every write command to a replayable log
appendonly yes
# running both is common: AOF for recovery, RDB for fast reload + backups

Why it's asked / follow-up: it's the setup for the real question — durability — and it checks you know snapshot vs log. Follow-up: “which loads on restart if both exist?” — Redis prefers the AOF (it's the more complete record) when appendonly is on.

Source: Redis docs — Persistence (RDB & AOF).

What does appendfsync control, and what's the tradeoff? Senior

Appending to the AOF writes to the OS buffer; fsync is what actually flushes it to disk, and appendfsync sets how often that happens — the direct durability↔throughput dial. always fsyncs on every write: strongest durability (you lose essentially nothing) but the slowest, because you pay a disk sync per command. everysec (the default) fsyncs once a second: near-full throughput, and the worst case is losing up to one second of writes on a crash. no lets the OS decide when to flush: fastest, but the loss window is however long the OS buffers (can be tens of seconds). The honest interview point is that even “durable” Redis has a loss window unless you run always, and everysec is the pragmatic default almost everyone uses.

appendfsync everysec   # default — lose ≤ ~1s of writes on crash; near-full throughput
# appendfsync always → fsync per write: strongest, slowest
# appendfsync no     → OS flushes when it likes: fastest, largest loss window

Why it's asked / follow-up: it's where “is Redis durable?” gets concrete — the answer is a knob, not a yes/no. Follow-up: “why not always use always?” — a synchronous disk write per command destroys the microsecond latency that's the whole point of Redis.

Source: Redis docs — Persistence (AOF fsync policy).

Is Redis durable? Can it lose data? Senior

The honest answer: Redis can persist, but by default it is not as durable as a WAL'd relational database, and yes, it can lose the most recent writes. Three loss windows stack up: (1) with appendfsync everysec you can lose up to a second of writes if the process dies before the fsync; (2) replication is asynchronous, so a primary can acknowledge a write to the client and crash before the replica receives it (see replication); and (3) with RDB-only persistence you lose everything since the last snapshot. You can tighten each — appendfsync always, WAIT for replica acks, both AOF and RDB — but you can't make Redis a zero-loss transactional store without giving up much of its speed. So the right framing in an interview is “durability is tunable, with a real tradeoff against throughput,” not “Redis persists, therefore it's safe.”

// the three loss windows to name:
//   1. fsync policy   → ≤ 1s with everysec (0 with always, slow)
//   2. async replication → primary can ACK then die before replica sees it
//   3. RDB-only       → everything since the last snapshot
// tunable, but zero-loss costs you the latency Redis exists to provide.

Why it's asked / follow-up: it's a maturity check — the wrong answer (“it has AOF so it's durable”) is exactly what interviewers listen for. Follow-up: “how do you make a single write as safe as possible?” — appendfsync always plus WAIT numreplicas timeout, accepting the latency hit.

Source: Redis docs — How durable is the append-only file?

How does a restart reload data, and can you run RDB and AOF together? Mid

On startup Redis rebuilds the dataset from disk before accepting clients. If AOF is enabled (appendonly yes), it replays the AOF, because that's the most up-to-date record; otherwise it loads the RDB snapshot. Running both is supported and common — you get RDB's fast, compact backups and AOF's smaller loss window, and modern AOF rewrite can even embed an RDB preamble (an RDB-format base plus an incremental command tail) so reload is fast and recent. If neither is enabled, a restart comes back empty — which is fine for a pure cache but a data-loss incident for a datastore, so “did you actually turn persistence on?” is a real question.

# restart precedence: AOF (if enabled) wins over RDB — it's more complete
appendonly yes
save 900 1 300 10 60 10000
# both on → fast reload + small loss window; neither on → restart is EMPTY

Why it's asked / follow-up: it catches the “restarted Redis and lost everything” footgun and confirms you know the reload order. Follow-up: “is there any downside to both?” — a little extra disk and CPU for two persistence paths; usually worth it for a datastore.

Source: Redis docs — Persistence (RDB + AOF, reload).

Why can a snapshot spike memory (the fork / copy-on-write gotcha)? Senior

RDB snapshots (and AOF rewrites) work by fork()ing a child process that writes the dataset while the parent keeps serving. The child shares the parent's memory pages via copy-on-write: unchanged pages cost nothing, but every page the parent modifies while the child is writing must be copied. On a write-heavy dataset during a long save, that copying can balloon memory — in the pathological case toward a second copy of the working set — which is the classic “Redis OOM'd during BGSAVE” incident. Mitigations: size the box with headroom (don't run at 90% of RAM if you snapshot), set vm.overcommit_memory=1 so the fork doesn't fail, disable Transparent Huge Pages (they make COW copies huge), and schedule saves for quieter windows.

# fork + copy-on-write: writes DURING the save copy pages → memory spike
# guardrails:
vm.overcommit_memory = 1          # (sysctl) let the fork succeed under pressure
# disable Transparent Huge Pages — THP makes COW copies balloon
# keep RAM headroom; a write-heavy BGSAVE can approach 2× the working set

Why it's asked / follow-up: it's a real senior/ops incident and it rewards understanding why the snapshot is cheap-until-it-isn't. Follow-up: “how do you avoid the fork stall entirely?” — snapshot on a replica instead of the primary, so the primary never forks under load.

Source: Redis docs — Persistence (fork & copy-on-write).

6 · Transactions, scripting & atomicity

What is a Redis transaction (MULTI/EXEC), and why is there no rollback? Mid

A Redis “transaction” is not what SQL means by the word. MULTI starts queuing commands; they're not executed, just buffered; EXEC then runs the whole queue atomically and in order, with no other client's command interleaving (DISCARD throws the queue away). What it does not give you is rollback: if one command fails at execution time (say you ran a list op on a key holding a string), the other commands still run and the failure is reported per-command — there's no “undo the batch.” Redis's stance is that such failures are programming errors that should be caught in development, and skipping rollback keeps the engine simple and fast. So it gives you atomicity and isolation-by-serialization, but not the “all-or-nothing on error” semantics of an ACID transaction. (Single commands are, of course, already atomic on their own.)

MULTI
DECRBY account:from 100       # queued (not run yet)
INCRBY account:to 100         # queued
EXEC                          # both run atomically, in order — no interleaving
# if one command errors at EXEC, the others STILL run — there is no rollback.

Why it's asked / follow-up: the “no rollback” fact surprises people from a SQL background and is a favorite gotcha. Follow-up: “so how do I do a conditional/atomic read-modify-write?” — WATCH for optimistic locking, or a Lua script for true server-side atomic logic.

Source: Redis docs — Transactions.

What is WATCH, and how does optimistic locking work? Senior

WATCH turns a MULTI/EXEC into a compare-and-set. You WATCH one or more keys, read them, decide what to write, then MULTI…EXEC. If any watched key was modified by another client between the WATCH and the EXEC, the EXEC aborts and returns nil — your queued commands don't run — and you retry the whole read-decide-write loop. This is optimistic concurrency control: no locks are held, you just detect a conflicting change and redo. It's the right tool for “read a value, compute a new one based on it, write it back, but only if nobody else changed it,” when a plain atomic command like INCR isn't expressive enough.

WATCH stock:42                 # optimistic lock on this key
# ... read it, decide there's enough stock ...
MULTI
DECR stock:42
EXEC                          # → nil if stock:42 changed since WATCH → retry the whole thing

Why it's asked / follow-up: it's how you do safe read-modify-write on Redis, and it tests that you know Redis prefers optimistic (CAS) over pessimistic locking. Follow-up: “WATCH or Lua for this?” — Lua is often simpler because the whole read-decide-write runs atomically server-side with no retry loop (next question).

Source: Redis docs — Optimistic locking with WATCH.

When would you use a Lua script (EVAL)? Senior

A Lua script sent with EVAL (or cached and called by hash with EVALSHA) runs atomically on the server: while it executes, nothing else runs (single-threaded), so a multi-step read-decide-write is a single indivisible operation with no WATCH retry loop. Use it when the logic can't be expressed in one command — “get this value, and only if it equals X, delete it,” or “increment a counter and set its TTL only on first creation.” It also cuts round-trips by moving the logic next to the data. The discipline: keep scripts short and deterministic (a long script blocks the whole server, and non-determinism breaks replication), and pass all keys through KEYS[] so the script is Cluster-friendly.

-- atomic compare-and-delete (the safe distributed-lock release):
EVAL "if redis.call('GET', KEYS[1]) == ARGV[1] then
        return redis.call('DEL', KEYS[1])
      else return 0 end" 1 lock:job my-token
-- runs atomically server-side: no other command interleaves.

Why it's asked / follow-up: the compare-and-delete script is the correct distributed-lock release, so this feeds straight into the Redlock scenario. Follow-up: “why keep scripts fast?” — a script holds the single thread for its whole duration, so a slow one is a server-wide stall.

Source: Redis docs — Scripting with Lua (EVAL).

What are Redis Functions (7.0), and how do they differ from EVAL scripts? Mid

Redis Functions, added in 7.0, are the “grown-up” version of Lua scripting. An EVAL script is ephemeral — the client has to ship it (or its SHA) each time, and it isn't part of the dataset. A function is registered once as part of a named library, persisted in the RDB/AOF, and replicated to replicas, so it becomes a first-class, callable part of the server (invoke with FCALL). That makes server-side logic something you deploy and version like application code rather than a string every client carries. For an interview: same atomic, single-threaded execution model as Lua scripts — the difference is management (registered, named, persisted, replicated) rather than a new execution semantics.

FUNCTION LOAD "#!lua name=mylib
      redis.register_function('myfunc', function(keys, args)
        return redis.call('INCR', keys[1]) end)"
FCALL myfunc 1 counter        # call the registered function by name
# persisted + replicated, unlike an EVAL script the client ships each time.

Why it's asked / follow-up: it checks awareness of modern Redis and the scripting-vs-functions distinction. Follow-up: “when do Functions matter?” — when server-side logic is shared, versioned, and deployed centrally rather than embedded in each client; arrival on the version reference (7.0).

Source: Redis docs — Redis Functions; the 7.0 row on the version reference.

Pipelining vs a transaction — what's the difference? Mid

They solve different problems and are constantly confused. Pipelining is a network optimization: send many commands without waiting for each reply to save round-trips. It gives no atomicity — other clients' commands can interleave with your pipelined ones. A transaction (MULTI/EXEC) is an atomicity guarantee: the queued commands run together with nothing interleaving, but it doesn't by itself reduce round-trips. In practice you often combine them — pipeline a MULTI…EXEC block to get both isolation and fewer round-trips — but conceptually pipelining is about latency and MULTI is about isolation.

// pipelining   → fewer round-trips, NO atomicity (others can interleave)
// MULTI/EXEC   → atomic block, NOT primarily a round-trip optimization
// combine them → pipeline a MULTI…EXEC to get isolation AND fewer trips

Why it's asked / follow-up: mixing these up leads to real bugs (assuming pipelined commands are isolated). Follow-up: “which do you reach for to avoid a race between two commands?” — MULTI/EXEC (or Lua), never pipelining alone.

Source: Redis docs — Pipelining and Transactions.

7 · Replication, HA & Sentinel

How does Redis replication work, and is it synchronous? Mid

Redis uses leader–follower (primary–replica) replication: a replica connects to a primary, gets a snapshot to seed itself, then receives a continuous stream of the primary's write commands. It is asynchronous by default — the primary does not wait for replicas before acknowledging a write to the client. That's what keeps writes fast, but it's also the durability caveat: if the primary acknowledges a write and then crashes before the replica received it, that write is lost even though the client thought it succeeded. Replicas are read-only by default and are used for read scaling and failover, not for absorbing writes. Replication is also the mechanism under Sentinel and Cluster failover.

# a replica follows a primary; the stream is asynchronous
replicaof 10.0.0.1 6379       # this node replicates from that primary
INFO replication              # role:master/slave, connected_slaves, offsets
# async: primary ACKs the client BEFORE replicas confirm → possible loss on failover

Why it's asked / follow-up: the async default is the root of Redis's “lost my write on failover” stories, so interviewers want you to name it. Follow-up: “how do you reduce that risk?” — WAIT for replica acks and min-replicas-to-write (next questions), accepting the latency cost.

Source: Redis docs — Replication.

What does WAIT guarantee — and what doesn't it? Senior

WAIT numreplicas timeout blocks until your preceding writes have been acknowledged by at least numreplicas replicas (or the timeout elapses), and returns how many acked. It converts the fire-and-forget async model into something closer to synchronous replication for the writes you care about, shrinking the “primary acked then died” loss window. But it is not full durability or consensus: it confirms replicas received the data in memory, not that they fsynced it to disk; it doesn't prevent a split-brain by itself; and if it times out you have to decide what to do. It's a strong tool for “don't ack this critical write until a replica has it,” not a substitute for a real consensus store when you need strict correctness.

SET order:9 "paid"
WAIT 1 200     # block until ≥1 replica has it, or 200ms — returns # of acks
# confirms in-memory receipt, NOT fsync-to-disk and NOT consensus.

Why it's asked / follow-up: it's a nuanced tool people over-trust; naming its limits is the senior signal. Follow-up: “does WAIT 1 mean the write can never be lost?” — no; the replica could crash before persisting, and a partition can still cause edge-case loss.

Source: Redis docs — WAIT.

What is Redis Sentinel, and how does failover work? Senior

Sentinel is Redis's high-availability system for a non-clustered primary–replica setup. You run several Sentinel processes that monitor the primary and replicas; when enough Sentinels agree the primary is unreachable (a quorum declares it objectively down), they elect a leader Sentinel, promote a replica to primary, reconfigure the other replicas to follow it, and update clients — clients ask a Sentinel “who's the primary right now?” rather than hardcoding an address. The quorum-of-Sentinels design is what avoids a single monitor being fooled by a network blip. It gives you automatic failover, monitoring, and service discovery without the resharding machinery of Cluster.

# sentinel watches a primary named "mymaster"; 2 sentinels must agree it's down
sentinel monitor mymaster 10.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
# on failover: quorum → elect leader sentinel → promote a replica → repoint clients

Why it's asked / follow-up: HA is a standard senior topic and Sentinel is the answer for a single-shard deployment. Follow-up: “Sentinel or Cluster?” — Sentinel = HA for one dataset that fits on one node; Cluster = HA plus horizontal sharding across many nodes (see Cluster vs Sentinel).

Source: Redis docs — High availability with Redis Sentinel.

Can you scale reads with replicas, and what are the caveats? Mid

Yes — replicas are read-only copies, so pointing read traffic at them scales read throughput and offloads the primary (this is also where you run backups/snapshots to avoid forking the primary). Two caveats to name. First, staleness: because replication is asynchronous, a replica can lag the primary, so a read-your-own-write can return a value that's a beat behind — fine for a cache, wrong for “did my just-placed order show up.” Second, replication is not a backup: a bad DEL or FLUSHALL replicates instantly to every replica, so replicas protect against node failure, not against a mistake or corruption — you still need real backups (RDB files, off-box). For write scaling, replicas don't help at all; that's what Cluster is for. And min-replicas-to-write lets the primary refuse writes when too few replicas are connected, trading availability for a smaller loss window.

# reads scale on replicas; writes do NOT (that's Cluster's job)
min-replicas-to-write 1       # primary rejects writes if <1 replica is connected
min-replicas-max-lag  10      # ...and only counts replicas within 10s of lag
# replicas ≠ backups: a bad FLUSHALL replicates everywhere instantly.

Why it's asked / follow-up: “just add replicas” is a common half-right answer; the staleness and not-a-backup caveats are the depth. Follow-up: “how do you get a consistent read?” — read from the primary, or accept eventual consistency on replicas.

Source: Redis docs — Replication (read replicas & min-replicas-to-write).

8 · Redis Cluster & scaling

How does Redis Cluster shard data across nodes? Senior

Redis Cluster (production-ready since 3.0) shards the keyspace across nodes using 16384 hash slots. Every key is mapped to a slot by CRC16(key) mod 16384, and each primary node owns a contiguous-ish range of slots; add or remove nodes and you move slots (and their keys) between them. There is no central proxy — the cluster is peer-to-peer, nodes gossip topology, and clients learn which node owns which slot. If a client asks a node for a key it doesn't own, the node replies with a MOVED redirect to the right node, and a good client caches the slot map so it goes direct next time. Each slot range also has replicas, so Cluster gives you sharding and HA together.

# 16384 fixed slots; key → slot → owning node
slot = CRC16(key) % 16384
CLUSTER KEYSLOT user:42          # which slot this key hashes to
(error) MOVED 3999 10.0.0.2:6379   # "not my slot — ask that node"

Why it's asked / follow-up: the “16384 slots / CRC16” mechanism is a signature Redis fact and often garbled on the open web. Follow-up: “why 16384 and not a consistent-hash ring?” — a fixed slot count makes the slot→node map tiny to gossip and slot migration explicit and simple.

Source: Redis docs — Cluster specification; Cluster's arrival on the version reference (3.0).

What are hash tags, and why do multi-key ops fail across slots? Senior

The catch with slot-based sharding: a multi-key command (MGET, SINTER, a MULTI touching several keys, a Lua script with several KEYS) only works if all its keys live in the same slot — a node can't atomically operate on keys it doesn't own. Cross-slot multi-key calls return a CROSSSLOT error. The fix is hash tags: if a key contains a substring in curly braces, only that substring is hashed. So {user:42}:cart and {user:42}:orders both hash on user:42 and land in the same slot, letting you run multi-key ops on a logical entity's keys. The design lesson: co-locate keys you need to touch together by design, with a hash tag — don't assume cross-key atomicity in Cluster.

# different slots → cross-slot multi-key op FAILS:
MGET user:42:cart user:42:orders   (error) CROSSSLOT
# hash tag {…} forces same slot — only the braces are hashed:
MGET {user:42}:cart {user:42}:orders   # OK — same slot

Why it's asked / follow-up: the cross-slot limitation surprises people migrating a single-node app to Cluster, and hash tags are the non-obvious fix. Follow-up: “downside of over-using one hash tag?” — you pin many keys to one slot/node, creating a hot slot and uneven load.

Source: Redis docs — Cluster spec: hash tags.

How does resharding work, and what are MOVED vs ASK? Senior

Scaling a cluster means moving slots (and the keys in them) from one node to another — done live, slot by slot, while the cluster keeps serving. The two redirects encode the difference between a settled and an in-flight slot. MOVED means “this slot permanently belongs to node X” — the client should update its slot map and always go there. ASK means “this slot is migrating; for this one request try node X (with an ASKING prelude), but don't change your map” — it's a temporary redirect during migration so a key that's already moved is still reachable. A cluster-aware client handles both transparently; the interview point is that resharding is online and the two redirects mean permanent-vs-transient.

# MOVED → permanent: update the slot map, always go to that node
(error) MOVED 3999 10.0.0.2:6379
# ASK → transient (slot mid-migration): ASKING + retry there, keep your map
(error) ASK 3999 10.0.0.3:6379

Why it's asked / follow-up: MOVED vs ASK is a classic “do you really understand Cluster” discriminator. Follow-up: “does resharding cause downtime?” — no, it's online per-slot; individual keys mid-move are briefly served via ASK.

Source: Redis docs — Cluster spec: redirection & resharding.

When do you actually need Cluster, vs a single primary with replicas (or Sentinel)? Senior

Don't reach for Cluster by default — it adds real constraints (cross-slot limits, hash-tag planning, a cluster-aware client). You need Cluster when a single node can't hold the working set in RAM, or when write throughput exceeds what one primary's single thread can do — i.e. you need to shard. If your data fits on one node and you only need high availability + read scaling, a single primary with replicas fronted by Sentinel is simpler and keeps full multi-key/transaction/Lua semantics. Rule of thumb: replicas scale reads and give HA; Cluster scales writes and memory by sharding. Reach for the least machinery that solves your actual bottleneck.

// fits in RAM + need HA / read scale   → primary + replicas + Sentinel
// data > one node's RAM, or write-bound → Redis Cluster (sharded)
// replicas scale READS; Cluster scales WRITES + MEMORY.

Why it's asked / follow-up: over-reaching for Cluster is a common architecture mistake; the discriminator is “do you actually need to shard?” Follow-up: “what do you give up in Cluster?” — effortless multi-key ops — you must co-locate related keys with hash tags (see hash tags).

Source: Redis docs — Scale with Redis Cluster.

9 · Caching patterns & messaging

Cache-aside vs read-through vs write-through vs write-behind — what's the difference? Mid

Cache-aside (lazy loading) is the default and the one you'll usually name: the application checks Redis first, and on a miss reads the database, writes the value into Redis with a TTL, and returns it. Read-through is the same flow but the cache library (not your code) does the DB fetch on a miss. On writes: write-through writes to the cache and the database synchronously (consistent, but every write pays both); write-behind (write-back) writes to the cache and flushes to the database asynchronously (fast writes, but a crash can lose not-yet-flushed data). Cache-aside plus write-through is the common, safe pairing; write-behind trades durability for write speed and is used deliberately.

// cache-aside read:
v = GET key
if v == nil:               // miss
    v = db.query(...)
    SET key v EX 300       // populate with a TTL, then return
// write-through: SET key + db.write() synchronously (consistent, 2× write cost)

Why it's asked / follow-up: caching strategy is a bread-and-butter system-design question and these are the standard vocabulary. Follow-up: “which does Redis do natively?” — Redis is just the store; the pattern lives in your application (or a caching library), not in Redis itself.

Source: Redis docs — Client-side patterns.

How do you handle TTLs and cache invalidation? Mid

The two hard parts of caching are getting stale data out and not letting everything expire at once. Invalidation has two styles: TTL-based (let the value expire and re-populate — simple, but you serve stale data up to the TTL) and explicit (delete or overwrite the key when the source changes — fresher, but you have to know every write path). Most systems use both: a TTL as a safety net plus explicit deletes on known updates. A key discipline is to give cache entries a TTL always — a cache full of TTL-less keys is a memory leak — and to jitter the TTLs (add a random spread) so a batch of keys created together doesn't all expire in the same second and stampede the database.

# always TTL a cache entry; add jitter so keys don't expire in lockstep
SET product:9 "..." EX 3600          # base TTL...
# ...+ random 0–300s in app code → 3600–3900s, spreads expiry
# explicit invalidation on write: DEL product:9 (or overwrite it)

Why it's asked / follow-up: “there are only two hard things… cache invalidation” is a genuine source of bugs, and TTL-hygiene is a maturity tell. Follow-up: “why jitter?” — to prevent synchronized expiry → a thundering herd hitting the DB at once (next question).

Source: Redis docs — Keyspace & key expiration.

What is a cache stampede (thundering herd), and how do you prevent it? Senior

A stampede happens when a hot key expires (or the cache is cold) and, in the gap before it's repopulated, many concurrent requests all miss and all hit the database at once — potentially overwhelming it exactly when the cache was supposed to protect it. Mitigations, usually combined: (1) a mutex / lock so only one request recomputes the value while the others wait or serve stale (a short SET lock NX is the gate); (2) TTL jitter so keys don't expire in lockstep; (3) early / probabilistic recomputation — refresh a hot key before it expires so it's never actually absent; and (4) serving slightly stale data while a single background refresh runs. The idea is to ensure exactly one recompute per key, not thousands.

# only ONE request recomputes; others wait / serve stale
SET lock:recompute:product:9 1 NX EX 10   # winner gets the lock
#   winner  → query DB, repopulate the cache, DEL the lock
#   losers  → brief wait + reread, or serve the last-known value
# + jitter TTLs and refresh hot keys early so they never fully expire.

Why it's asked / follow-up: it's a real outage pattern and a favorite senior scenario. Follow-up: “what if the recompute itself is slow?” — serve stale-while-revalidate so users never wait on the miss, and cap lock duration so a dead worker doesn't wedge the key.

Source: Redis docs — Distributed locks (single-request recompute).

Pub/Sub vs Streams vs a list as a queue — when do you use each? Senior

These are Redis's three messaging shapes and mixing them up is a classic error. Pub/Sub is fire-and-forget: PUBLISH pushes to whoever is subscribed right now; there's no persistence and no replay, so a subscriber that's offline or slow simply misses messages (at-most-once). Great for live fan-out (presence, notifications) where a dropped message is fine. Streams (added in 5.0) are a persistent append-only log with IDs and consumer groups: messages are retained, consumers track their position, work is split across a group, and unacknowledged messages can be reclaimed — so you get durable, at-least-once processing and replay. A plain list with LPUSH/BRPOP is the simplest job queue when you don't need groups or replay. Rule: pub/sub for ephemeral broadcast, Streams for durable/queued work, lists for a basic queue.

# pub/sub — no persistence, only live subscribers get it (at-most-once):
PUBLISH chan:alerts "deploy done"
# stream — persistent log + consumer groups (durable, at-least-once):
XADD events * type signup uid 42
XREADGROUP GROUP workers c1 COUNT 10 STREAMS events >   # then XACK
# list — simplest queue:  LPUSH jobs "…"  /  BRPOP jobs 0

Why it's asked / follow-up: reaching for pub/sub where you needed durability is a real production bug, so this tests that you know the persistence difference. Follow-up: “Streams vs Kafka?” — conceptually similar (a persistent log with consumer groups), but Kafka is a distributed, partitioned, disk-first streaming platform built for high-volume pipelines; see the Kafka interview questions. Streams landed on the version reference (5.0).

Source: Redis docs — Streams and Pub/Sub.

10 · Design & scenario questions

How would you build a rate limiter with Redis? Senior

The simplest is a fixed-window counter: a key per (user, window) that you INCR, setting a TTL equal to the window on first hit; if the count exceeds the limit, reject. It's O(1) and trivial, but it allows a burst at the window boundary (up to 2× the limit across two adjacent windows). A sliding-window log fixes that with a sorted set per user: add each request's timestamp as a score, drop timestamps older than the window with ZREMRANGEBYSCORE, then count what remains — smooth but more memory. For correctness under concurrency, do the check-and-increment atomically (a tiny Lua script), so two requests can't both read “under the limit” and both pass. Token-bucket variants (a count + a refill timestamp) are the other common answer.

# fixed window: N requests per 60s per user
INCR rate:u42:<window>
# if the return value == 1 (first hit): EXPIRE rate:u42:<window> 60
# if value > limit → reject.  Do the INCR+EXPIRE+check atomically in Lua.
# sliding window: ZADD reqs:u42 <ts> <ts> ; ZREMRANGEBYSCORE reqs:u42 0 <ts-60s> ; ZCARD

Why it's asked / follow-up: rate limiting is one of the most common “design with Redis” prompts and it exercises TTLs, sorted sets, and atomicity at once. Follow-up: “fixed vs sliding window tradeoff?” — fixed is O(1) and cheap but allows edge bursts; sliding is smooth but stores per-request timestamps.

Source: Redis docs — Keyspace / counters; sorted sets: ZSET reference.

How would you design a session store with Redis? Mid

Store each session as a hash keyed by the session id (session:<id>) with fields for user id, roles, CSRF token, etc., and give the key a TTL equal to the session lifetime. On each request you read the hash by id (O(1)) and, for a rolling/idle timeout, refresh the TTL with EXPIRE so active sessions stay alive and idle ones expire on their own — expiration is your session-cleanup mechanism, no cron needed. Redis is a natural fit here: sessions are short-lived, fit easily in memory, tolerate the small durability window (a lost session just means re-login), and benefit from being shared across all app servers (any node reads the same session).

HSET session:abc uid 42 role admin csrf x9f2
EXPIRE session:abc 1800          # 30-min TTL
# each request: HGETALL session:abc, then EXPIRE to slide the idle timeout
# TTL expiry = automatic session cleanup; no sweeper job needed

Why it's asked / follow-up: sessions are the textbook “why Redis” use case — it ties hashes, TTLs, and the shared-across-servers benefit together. Follow-up: “what if you must never lose a session?” — then it's not really session data; back it with a durable store, or accept re-login as the failure mode.

Source: Redis docs — Hashes (session pattern).

How do you build a distributed lock — and is Redlock safe? Senior

The single-instance lock is SET lock:job <random-token> NX PX 30000 — NX so only one client can acquire it, PX so it auto-expires if the holder dies. To release, you must check the token before deleting (via a Lua compare-and-delete), so you never release a lock that already expired and got re-acquired by someone else. That covers most needs. Redlock extends this across N independent Redis nodes: acquire the lock on a majority (e.g. 3 of 5) within the validity time, and you hold it. Whether Redlock is safe is a genuine, still-debated dispute: Martin Kleppmann argues it's unsafe for correctness because it assumes bounded clocks and pauses — a GC pause or clock skew can let two clients believe they hold the lock — and that for correctness you need a fencing token checked by the protected resource. Salvatore Sanfilippo (antirez) responds that Redlock's assumptions are reasonable for its intended use and that it targets efficiency and practical mutual exclusion, not a formal consensus guarantee. Both largely agree on the bottom line: if you need correctness under all failures, use a real consensus system (ZooKeeper/etcd) and fencing tokens; if you're avoiding duplicate work and can tolerate rare double-execution, a Redis lock is pragmatic. Do not present Redlock as a guaranteed-safe default.

# single-instance lock: unique token + auto-expiry
SET lock:job <token> NX PX 30000
# release ONLY if it's still your token (atomic, in Lua):
EVAL "if redis.call('GET',KEYS[1])==ARGV[1] then return redis.call('DEL',KEYS[1]) else return 0 end" 1 lock:job <token>
# Redlock = majority of N nodes. Contested for correctness — use fencing
# tokens + a consensus store when correctness must hold under GC/clock faults.

Why it's asked / follow-up: it's the hardest common Redis question and the one most often answered badly (selling Redlock as bulletproof). Presenting both sides is the senior signal. Follow-up: “what's a fencing token?” — a monotonically increasing number handed out with the lock; the protected resource rejects any write carrying a stale token, which closes the “paused holder” hole regardless of the lock service.

Sources (both sides): Redis docs — Distributed locks (the Redlock spec) and Martin Kleppmann — “How to do distributed locking” (the critique).

When would you not use Redis? Senior

The mature answer names Redis's limits rather than its strengths. Skip it (or don't make it the primary store) when: (1) the dataset far exceeds RAM — Redis is memory-first, and paging a huge dataset through it is the wrong tool (reach for a disk-based database); (2) you need strong durability or strict consistency — the default loss window and async replication make it a poor fit for “a committed write must never vanish” (a WAL'd RDBMS or a consensus store is right); (3) you need rich ad-hoc queries, joins, or complex transactions — Redis has no query planner, so anything beyond known-key access gets awkward (use a relational or document database); or (4) you need long-term analytical storage over huge history (a warehouse/OLAP system). Redis shines as a fast, in-memory layer for known access patterns — cache, session store, rate limiter, leaderboard, queue — not as a general-purpose system of record.

// reach for something else when:
//   data ≫ RAM                        → disk-first database
//   must-never-lose-a-write / strict  → WAL'd RDBMS / consensus store
//   ad-hoc queries, joins, reporting  → relational / document DB / warehouse
// Redis = fast layer for KNOWN access patterns, not a general system of record.

Why it's asked / follow-up: knowing when not to reach for your favorite tool is a seniority marker, and Redis fans over-apply it. Follow-up: “can't persistence make it a system of record?” — it can get closer, but at a latency cost that erodes the reason you chose Redis, and RAM still caps the dataset.

Source: Redis docs — Get started (use cases & limits).

Redis vs Memcached — when would you pick each? Mid

Both are in-memory key-value caches, so the honest answer is “they overlap, but Redis does more.” Memcached is deliberately minimal: opaque string/blob values, no persistence, no replication, and a multi-threaded design that can be simpler to saturate on a big multi-core box for a pure get/set cache. Redis adds rich data structures (lists, sets, sorted sets, hashes, streams), optional persistence, replication and Cluster, pub/sub, transactions, and Lua — so it can be a cache and a datastore, message broker, leaderboard, or rate limiter. Rule of thumb: if you want the simplest possible volatile cache of opaque blobs, Memcached is fine; if you want data structures, persistence, or anything beyond get/set, Redis. In practice most teams standardize on Redis because it subsumes the Memcached use case.

// Memcached → minimal, multi-threaded, opaque values, no persistence
// Redis     → data structures + persistence + replication/Cluster + pub/sub + Lua
// pure blob cache → either; anything richer → Redis

Why it's asked / follow-up: it's a standard “compare the tools” question and it checks you can articulate Redis's superset advantage without dismissing Memcached. Follow-up: “isn't multi-threaded Memcached faster?” — for a saturated pure-blob cache on many cores it can be simpler to scale, but Redis's threaded I/O and the option to run multiple instances / a cluster close most of the gap.

Source: Redis docs — FAQ (Redis vs other stores).

Is Redis open source? What is Valkey? Mid

The answer is version-dependent, which is exactly why it comes up. Redis was BSD-licensed (open source) for its first fifteen years; in March 2024 it relicensed to a dual source-available model (RSALv2 / SSPLv1) that the OSI does not recognize as open source; then in May 2025, Redis 8.0 added the OSI-approved AGPLv3 as a third option, so current Redis is open source again. The 2024 relicensing prompted a community BSD fork, Valkey (Linux Foundation), which several distributions and cloud providers adopted. This is a live, still-settling licensing story, so the interview-safe move is to state the version-by-version facts neutrally rather than take a side — the full timeline, license texts, and the two boundary dates are laid out on the Redis version reference's license callout.

// license by era (see the versions page for the full timeline):
//   ≤ 7.2   BSD              → open source
//   7.4     RSALv2 / SSPLv1  → source-available (not OSI open source)
//   8.0+    + AGPLv3         → open source again (tri-licensed)
// Valkey = the 2024 community BSD fork of Redis.

Why it's asked / follow-up: it's genuinely confusing and a current-events check; the trap is answering with one license as if it were timeless. Follow-up: “Redis or Valkey?” — that's a licensing/governance and ecosystem call, not a technical one; keep it neutral and point to the facts on the versions page.

Source: the Redis version reference — “Is Redis open source?” and redis.io/legal.

What are the common Redis production pitfalls? Senior

A grab-bag interviewers use to see how much prod experience you have. The usual suspects: keys with no TTL in a cache (a slow memory leak until you OOM or evict); KEYS in production (an O(N) blocking sweep — use SCAN); big keys (a multi-million-element collection makes every op on it slow and its deletion a stall — use UNLINK and consider splitting it); hot keys (one key taking a disproportionate share of traffic, which even Cluster can't spread since it lives in one slot); using pub/sub where you needed durability (offline subscribers silently miss messages — use Streams); and unbounded data growth (lists/sets/streams that only ever grow — trim them). Under it all: forgetting the single-threaded model, so any O(N) command or giant value stalls everyone.

// the checklist:
//   • every cache key has a TTL         • never KEYS in prod (use SCAN)
//   • watch for big keys / hot keys     • UNLINK, don't DEL, huge keys
//   • pub/sub ≠ durable (use Streams)   • trim ever-growing lists/streams
//   • remember: one slow command stalls the whole server

Why it's asked / follow-up: it separates “I've read about Redis” from “I've been paged for Redis.” Follow-up: “how do you find a big or hot key?” — redis-cli --bigkeys / --hotkeys, the MEMORY USAGE command, and the SLOWLOG for the ops they cause.

Source: Redis docs — Latency troubleshooting and SCAN.

Every answer links its primary source inline — the Redis documentation (the data-types reference, persistence, replication, the Cluster specification, scripting, and the patterns docs) and the command reference (per-command semantics and complexity). The single-threaded-design rationale is attributed to antirez (Salvatore Sanfilippo); the distributed-lock answer cites both the Redis Redlock spec and Martin Kleppmann's critique so the debate is presented two-sided, not settled. The questions are a curated set of the topics a Redis interviewer commonly covers — the data model, data types, keys and eviction, the single-threaded model, persistence, transactions and scripting, replication and Sentinel, Cluster, caching and messaging patterns, and design scenarios — not a copy of any question bank. This page covers Redis the open-source in-memory data-structure server; feature-arrival and licensing details cross-link to the Redis version reference. Last updated July 2026.

Mungomash LLC · More on Redis