January 3, 2025

Redis Production Safety Checklist for Teams

CorbinCorbin

Everyone has a Redis horror story. The intern who ran FLUSHALL on production. The senior dev who deleted the wrong keys with a wildcard pattern. The automated script that wiped session data during peak hours.

Redis doesn't ask "are you sure?" It just does what you tell it. Fast.

Here's how to not become the next cautionary tale.

The Danger Zone

Redis commands that can ruin your day:

Command Damage Potential
FLUSHALL Wipes every database. Everything. Gone.
FLUSHDB Wipes current database
KEYS * Blocks server on large datasets
DEL with wildcards Deletes more than you intended
CONFIG SET Can break replication, persistence
DEBUG SEGFAULT Crashes the server (yes, this exists)

The scary part? These commands execute instantly. No transaction log. No recycle bin. No "undo" button.

Checklist: Before You Connect

1. Know Your Environment

Before running any command, verify:

# Which server am I connected to?
redis-cli INFO server | grep redis_version

# How much data is here?
redis-cli DBSIZE

# Is this production?
redis-cli CONFIG GET maxmemory

If you're using a GUI, check the connection name. "prod-redis-main" should trigger different behavior than "local-dev".

2. Use Separate Connections for Prod

Never have production and development in the same connection list without clear visual distinction.

Naming convention:

  • [PROD] Main Redis
  • [STAGING] App Cache
  • dev-local

Some teams use color coding. Others require VPN for production access. Whatever works - just make it obvious.

3. Disable Dangerous Commands

In redis.conf:

rename-command FLUSHALL ""
rename-command FLUSHDB ""
rename-command DEBUG ""
rename-command CONFIG ""

Or rename them to something obscure:

rename-command FLUSHALL "FLUSHALL_CONFIRM_a8f3b2"

This doesn't prevent determined admins, but it stops accidents.

Checklist: Querying Safely

4. Never Use KEYS in Production

KEYS * scans every key in the database. Synchronously. Blocking everything else.

On a database with millions of keys, this can hang your server for seconds. During which no other commands run. Your app times out. Users complain.

Instead, use SCAN:

# Bad
KEYS user:*

# Good
SCAN 0 MATCH user:* COUNT 100

SCAN is cursor-based and non-blocking. It returns results incrementally.

5. Test Patterns Before Bulk Operations

Want to delete all cache keys?

Step 1: Count first

# How many keys match?
redis-cli --scan --pattern "cache:*" | wc -l

Step 2: Review a sample

redis-cli --scan --pattern "cache:*" | head -20

Step 3: Only then delete

redis-cli --scan --pattern "cache:*" | xargs redis-cli DEL

Or better yet, use a tool with preview and undo.

6. Set Appropriate TTLs

Every cached value should have an expiration. No exceptions.

SET cache:user:1001 "{...}" EX 3600  # 1 hour

Keys without TTL accumulate. Memory fills up. Eviction kicks in and starts deleting random (or LRU) keys. Now you've got cache misses on important data while stale garbage sticks around.

Checklist: Team Practices

7. Read-Only Access for Most People

Not everyone needs write access to production Redis.

Access levels:

  • Read-only: Developers debugging issues
  • Limited write: Deploy scripts, specific maintenance
  • Full access: Senior ops, emergencies only

Redis 6+ ACLs support this:

ACL SETUSER readonly on >password ~* +@read -@write
ACL SETUSER deployer on >password ~app:* +SET +DEL +EXPIRE

8. Log Dangerous Operations

Enable Redis command logging for audit trails:

CONFIG SET slowlog-log-slower-than 0
CONFIG SET slowlog-max-len 10000

This logs everything. For production, you might set a threshold (e.g., 10000 microseconds) to catch slow queries.

Better yet, use MONITOR in a logging service - but be careful, MONITOR itself is expensive on busy servers.

9. Backup Strategy

Redis persistence options:

Method Pros Cons
RDB snapshots Fast recovery, compact Data loss since last snapshot
AOF Point-in-time recovery Larger files, slower restart
Both Best of both worlds More disk usage

For production: Enable both. Test recovery regularly.

# In redis.conf
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

10. Have a Rollback Plan

Before any bulk operation:

  1. Snapshot: BGSAVE or RDB backup
  2. Document: What keys you're modifying
  3. Test: Run on staging first
  4. Execute: With monitoring active
  5. Verify: Check application behavior
  6. Cleanup: Remove temporary backups

Tool-Assisted Safety

Manual discipline works until it doesn't. 3 AM, tired, production is down - that's when mistakes happen.

What helps:

  1. Visual environment indicators: Red banner for production connections
  2. Confirmation dialogs: For destructive operations
  3. Undo support: Delete something? Get it back within a time window
  4. Pattern preview: See what matches before you act
  5. Safe Mode toggle: Disable writes entirely for browsing

Redimo builds these in because we've seen the disasters. Production connections show warning indicators. Bulk deletes require confirmation. Every deletion is undoable.

When Disaster Strikes

Already deleted something important? Act fast:

Immediate Steps

  1. Stop the bleeding: Prevent further writes if possible
  2. Check replicas: If you have read replicas, data might still exist there
  3. Check AOF: If append-only is enabled, recent commands are logged
  4. Check RDB: Restore from latest snapshot

Recovery Commands

# Stop writes (if you have replicas)
CONFIG SET min-replicas-to-write 99

# Check last save
LASTSAVE

# Restore from RDB backup
redis-cli DEBUG RELOAD
# or
redis-cli SHUTDOWN NOSAVE
# then replace dump.rdb and restart

Post-Mortem

Don't just fix it and move on. Document:

  • What happened
  • Why it happened
  • How to prevent it next time

Then implement the prevention.

Summary

Category Action
Environment Clear naming, visual distinction, VPN for prod
Commands Disable/rename dangerous ones, use SCAN not KEYS
Patterns Count and preview before bulk operations
Access Read-only default, limited write access
Backup RDB + AOF, test recovery regularly
Tooling Safe Mode, confirmations, undo support

Redis gives you speed. It's your job to add the guardrails.


Using Redimo? Enable Safe Mode in Settings for production connections. It's one toggle that could save your week.

Ready for Download

Try Redimo Today

Pattern Monitor, CRUD operations, SSH Tunneling.
Everything you need to manage Redis at light speed.

macOS & Windows