Job Dispatch
When a spider job is created in estela, it enters a queue and is dispatched to the cluster only when sufficient resources are available. This prevents overloading the cluster and ensures jobs run reliably even under heavy load.
How Dispatch Works
Jobs are dispatched in periodic cycles (every 30 seconds by default). In each cycle, estela:
- Picks up queued jobs in the order they were created (first in, first out).
- Checks available CPU and memory on the cluster.
- Dispatches each job only if the cluster has enough capacity without exceeding the configured utilization threshold.
- Leaves jobs in the queue if capacity is not available — they will be retried in the next cycle.
Job status transitions: IN_QUEUE → WAITING → RUNNING → COMPLETED / ERROR
Resource Tiers
Each job is assigned a resource tier that determines how much CPU and memory its container receives. The default tier is LARGE.
| Tier | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| TINY | 128m | 256m | 96Mi | 128Mi |
| XSMALL | 192m | 384m | 192Mi | 256Mi |
| SMALL | 256m | 512m | 384Mi | 512Mi |
| MEDIUM | 256m | 512m | 768Mi | 1Gi |
| LARGE | 512m | 1024m | 1152Mi | 1536Mi |
| XLARGE | 512m | 1024m | 1536Mi | 2Gi |
| HUGE | 1024m | 2048m | 3072Mi | 4Gi |
| XHUGE | 2048m | 4096m | 6144Mi | 8Gi |
Each job also receives a memory usage limit (~85% of the memory limit) that allows spiders to shut down gracefully before being forcefully terminated by the cluster.
Configuration
The dispatch behavior can be tuned with the following environment variables:
| Variable | Default | Description |
|---|---|---|
DISPATCH_RETRY_DELAY | 30 | Seconds between dispatch cycles |
WORKERS_CAPACITY_THRESHOLD | 0.95 | Maximum cluster utilization (0–1) before new jobs are held back |
SPIDER_NODE_ROLE | bitmaker-worker | Label used to identify worker nodes |
DEDICATED_SPIDER_NODES | True | Whether spider jobs run on dedicated labeled nodes |
DISPATCH_RETRY_DELAYis persisted in the database once initialized. If you update it in your settings, you also need to update or delete the corresponding entry via the Django admin for the change to take effect.
Deployment Requirements
-
Dedicated spider nodes: When
DEDICATED_SPIDER_NODESis"True", spider jobs are scheduled on dedicated nodes identified by theSPIDER_NODE_ROLElabel, and the capacity check only considers those nodes. This is recommended for larger environments where you want to isolate spider workloads. For smaller setups, set it to"False"to allow jobs to run on any available node. -
Cluster permissions: The estela API service account must have permission to read nodes and pods in the cluster. Without these permissions, the capacity check fails and no jobs are dispatched.
-
Worker concurrency: The background worker should have enough concurrency (at least 4–8) to handle job dispatch alongside other periodic tasks. This can be configured in the Helm chart.