January 8, 2025

Celery Task Debugging Made Easy

CorbinCorbin

Celery with Redis as a broker is a popular combo. Reliable, fast, battle-tested.

Until you have 50,000 tasks stuck in an unknown state, workers consuming 100% CPU doing nothing useful, and Flower showing you everything is fine.

Time to look at what's actually in Redis.

How Celery Uses Redis

Celery stores tasks in Redis using this structure:

celery                          # Default queue (List)
celery-task-meta-<task-id>      # Task result/state (String)
_kombu.binding.celery           # Queue bindings (Set)
unacked                         # Unacknowledged tasks
unacked_index                   # Index for unacked tasks
unacked_mutex                   # Lock for unacked operations

If you're using multiple queues:

celery                    # Default queue
high-priority             # Custom queue
notifications             # Custom queue

Each queue is a Redis List. Tasks are JSON blobs pushed to the list.

Task Anatomy

A task in the queue looks like:

{
  "body": "base64-encoded-payload",
  "content-encoding": "utf-8",
  "content-type": "application/json",
  "headers": {
    "id": "abc123-task-id",
    "task": "myapp.tasks.process_order",
    "lang": "py",
    "root_id": "abc123-task-id",
    "parent_id": null,
    "argsrepr": "(123,)",
    "kwargsrepr": "{}",
    "eta": null,
    "expires": null,
    "retries": 0
  },
  "properties": {
    "correlation_id": "abc123-task-id",
    "reply_to": "result-queue-id",
    "delivery_mode": 2,
    "delivery_tag": "uuid"
  }
}

The actual arguments are in the base64-encoded body. Yes, it's verbose. That's Celery's AMQP heritage showing.

Common Debugging Scenarios

Scenario 1: Tasks Not Processing

Tasks are sent but nothing happens. Workers are running. What's going on?

Check 1: Are tasks in the queue?

redis-cli LLEN celery

Returns 50000? Tasks are there but not being consumed.

Check 2: Are workers listening to the right queue?

Your task might be routed to a different queue:

@app.task(queue='high-priority')
def important_task():
    pass

But workers are started with:

celery -A proj worker -Q celery

They're listening to celery, not high-priority. Tasks pile up.

In Redimo:

  • Monitor pattern celery* or your queue names
  • See queue lengths at a glance
  • Click into queues to see actual task payloads

Scenario 2: Tasks Disappearing

Tasks are sent. Queue length stays at 0. No results. No errors.

Possible causes:

  1. Task expires before processing
@app.task(expires=60)  # Expires in 60 seconds
def my_task():
    pass

If workers are slow, tasks expire and vanish. Check headers.expires in task payloads.

  1. Acks late + worker crash

With acks_late=True, tasks stay in Redis until the worker finishes. If workers crash, tasks go to unacked. Eventually they're requeued or lost (depending on visibility timeout).

Check:

redis-cli ZCARD unacked

High number? Workers are crashing mid-task.

  1. Wrong serializer

Producer uses JSON, consumer expects pickle. Task gets consumed but fails silently.

# In celery config
task_serializer = 'json'
result_serializer = 'json'
accept_content = ['json']

Scenario 3: Memory Growing

Redis memory keeps increasing. Celery tasks seem normal.

Likely culprit: Task results not cleaned up

By default, Celery stores task results forever:

# Results expire after 1 day
result_expires = 86400

Without this, every celery-task-meta-* key sticks around forever.

Check in Redimo:

  • Monitor celery-task-meta-*
  • See how many result keys exist
  • Check if they have TTL set

Fix:

# In celery config
result_expires = 3600  # 1 hour
# Or disable result storage entirely
task_ignore_result = True

Scenario 4: Finding Failed Tasks

Where do failed tasks go?

If task_acks_late=True and task raises an exception:

  1. Task stays in unacked
  2. Eventually returned to queue (with retry) or goes to dead letter queue

Check result state:

redis-cli GET celery-task-meta-<task-id>

State will be FAILURE with exception info.

In Redimo:

  1. Monitor celery-task-meta-*
  2. Filter results by state field
  3. See full exception traceback in the value

Deep Dive: The Unacked Problem

Celery's visibility timeout mechanism:

  1. Worker pops task from queue
  2. Task goes to unacked sorted set (score = timestamp)
  3. Worker processes task
  4. Worker acknowledges, task removed from unacked

If worker dies between 2 and 4, task stays in unacked. A background process checks periodically and requeues expired unacked tasks.

Problems arise when:

  • visibility_timeout too short → tasks requeued while still processing → duplicates
  • visibility_timeout too long → crashed worker leaves tasks stuck
  • Many workers dying → unacked grows huge

Check:

redis-cli ZCARD unacked
redis-cli ZRANGE unacked 0 10 WITHSCORES

Scores are timestamps. Old timestamps = stuck tasks.

Task Result Storage

Celery result keys look like:

celery-task-meta-3f2b3c4d-5678-90ab-cdef-1234567890ab

Value is JSON:

{
  "status": "SUCCESS",
  "result": {"order_id": 123, "total": 99.99},
  "traceback": null,
  "children": [],
  "date_done": "2025-01-15T10:30:45.123456",
  "task_id": "3f2b3c4d-..."
}

Or on failure:

{
  "status": "FAILURE",
  "result": {
    "exc_type": "ValueError",
    "exc_message": ["Invalid input"],
    "exc_module": "builtins"
  },
  "traceback": "Traceback (most recent call last):\n..."
}

Flower vs Redis Direct

Flower is excellent for:

  • Real-time worker monitoring
  • Task success/failure rates
  • Easy task inspection

Flower is limited for:

  • Bulk operations on Redis data
  • Finding tasks by custom criteria
  • Investigating queue-level issues
  • Debugging serialization problems

When Flower doesn't give you answers, go to Redis.

Practical Investigation Workflow

1. Overview

Pattern: *celery* OR specific queue names

Get counts for:

  • Main queue (celery)
  • Custom queues
  • Result keys (celery-task-meta-*)
  • Unacked tasks

2. Queue Health

For each queue:

  • Length (LLEN)
  • Sample tasks (LRANGE 0 10)
  • Oldest task (LINDEX -1)

3. Result Storage

  • Total result keys
  • Keys without TTL
  • Failed task results

4. Unacked Analysis

  • Total unacked count
  • Age of oldest unacked task
  • Pattern in unacked task types

Configuration Tips

Set Result Expiry

result_expires = 3600  # 1 hour
# Or per-task
@app.task(result_expires=300)
def my_task():
    pass

Tune Visibility Timeout

broker_transport_options = {
    'visibility_timeout': 43200,  # 12 hours for long tasks
}

Consider Result Backend Alternatives

If you don't need task results:

task_ignore_result = True

If you need results but not in Redis:

result_backend = 'db+postgresql://user:pass@host/db'

Keep your broker Redis clean for actual task queuing.

Use Priority Queues

task_queues = (
    Queue('high', routing_key='high'),
    Queue('default', routing_key='default'),
    Queue('low', routing_key='low'),
)
task_default_queue = 'default'

Monitor each queue separately. Know which is backing up.

Quick Reference

Celery Concept Redis Key Pattern Type
Default queue celery List
Custom queue <queue-name> List
Task result celery-task-meta-<id> String
Unacked tasks unacked Sorted Set
Queue bindings _kombu.binding.* Set

Celery abstracts Redis well. Until it doesn't. Knowing what's actually in your broker lets you debug issues that no amount of --loglevel=DEBUG will reveal.

Download Redimo and see your Celery tasks at the Redis level.

Ready for Download

Try Redimo Today

Pattern Monitor, CRUD operations, SSH Tunneling.
Everything you need to manage Redis at light speed.

macOS & Windows