estela Helm Chart variables Appendix
Chart variables
These variables define general aspects of the deployment, they do not alter the behavior of estela.
-
local (Required): Set this variable to
true
if estela is being deployed on local resources. Otherwise, set it tofalse
. - hostIp (Required/Optional): This variable is required only if the above variable local has been set to
true
, this address is a reference to the host machine from minikube. Find it by running:$ minikube ssh 'grep host.minikube.internal /etc/hosts | cut -f1'
-
registryHost (Required): The registry host where the images of the estela modules are located. If a local registry is being used, this host is equal to the above variable hostIp, remember to add the port if you are using a local registry host:
<hostIp>:5001
. - nodeSelector (Optional): The name of the node on which estela will be installed in case the Kubernetes cluster has multiple nodes. Use the format
{ roles: NODE_ROLE_NAME }
.
Cloud provider variables
These variables allow the clean deployment of estela and its resources using cloud providers.
AWS
If you are not using AWS, skip this section.
-
<AWS_ACCESS_KEY_ID> (Required): Your aws access key id.
-
<AWS_SECRET_ACCESS_KEY> (Required): Your aws secret access key.
-
<AWS_DEFAULT_REGION> (Required): Default region of your aws account.
-
awsRegistry (Optional): Set this variable to
true
if you are using ECR to store the estela images. -
imagePullSecrets (Optional): Fill this variable only if the variable awsRegistry has been set to
true
. Use the value[ name: regcred ]
.
estela module variables
These variables define the estela behavior.
The variables that already have an assigned value should not be modified, unless you have a deep understanding of estela.
Global variables
Database
-
<SPIDERDATA_DB_ENGINE> (Required): Document oriented database where the data produced by the spiders is stored. Currently, estela supports the mongodb engine.
For dev a free MongoDB Atlas deploy can be used to set a database, as mentioned on Estela Resources Guide. Or a mongodb can be setup on a local cluster on a docker image.
-
<SPIDERDATA_DB_CONNECTION> (Required): The connection URL to your database instance.
-
<SPIDERDATA_DB_CERTIFICATE_PATH> (Required): Path where the database certificate is located. This value will be taken into account if your connection requires a certificate.
Queue Platform
All the queue platform variables should be written as children of the <QUEUE_PARAMETERS> object.
-
<QUEUE_PLATFORM> (Required): The queue platform used by estela.
-
<QUEUE_PLATFORM_LISTENERS> (Required): List of the queuing advertised hosts in a comma-separated style.
-
<QUEUE_PLATFORM_PORT> (Required): The port number of the aforementioned listeners.
Refer to the estela Queue Adapter documentation to fill in any additional variables needed for the selected queue platform.
Redis Stats
- <REDIS_URL> (Required): The connection URL to the Redis instance.
- <REDIS_STATS_INTERVAL> (Required): The interval, in seconds, of how often the job stats should be updated.
estela API variables
Database
-
<DB_HOST> (Required): Host of the SQL relational database.
-
<DB_PORT> (Required): Port of the SQL relational database.
-
<DB_NAME> (Required): Database name used by the API module.
-
<DB_USER> (Required): User name of the SQL relational database.
-
<DB_PASSWORD> (Required): Password of the above user. To avoid reading conflicts, enclose the value in quotes.
Registry
-
<REGISTRY_HOST> (Required): Address of the registry used to store the estela projects. This value can be equal to the variable registryHost.
-
<REGISTRY_ID>: (Optional) Fill this values if you registry has an associated ID.
-
<RESPOSITORY_NAME> (Required): Name of the registry repository used to store the project images.
-
<BUCKET_NAME_PROJECTS> (Required): Name of the bucket used to store the project files.
Settings
-
<SECRET_KEY> (Required): The Django secret key, you can generate one here. To avoid reading conflicts, enclose the value in quotes.
-
<DJANGO_SETTING_MODULE> (Required): Path of settings file to use, it can be one of these files.
-
<ENGINE> (Required): The engine used to run the spider jobs.
-
<CREDENTIALS> (Required): The credentials used by the API.
-
<CORS_ORIGIN_WHITELIST> (Required): List of origins authorized to make requests to the API. If estela web will be running locally, set this value to
http://localhost:3000
. -
<DJANGO_API_HOST>: The endpoint of the Django API. This value will be filled later after the application installation, do not change this value yet.
-
<DJANGO_EXTERNAL_APPS>: List of Django external apps that will be installed and added to INSTALLED_APPS. To install them, you must create a file similar to estela/api/requirements/externalApps.txt.example and add the repositories of the applications that will be installed via pip.
-
<EXTERNAL_APP_KEYS>: List of keys to use inside Djando external apps.
-
<EXTERNAL_MIDDLEWARES>: List of middleware that are generally found in Django external apps.
Celery
-
<CELERY_BROKER_URL> (Required): URL of the celery broker.
-
<CELERY_RESULT_BACKEND> (Required): URL to send the results from the API module tasks.
-
<CELERY_EXTERNAL_IMPORTS> (Optional): List of apps that contain Celery apps with their own configurations. The beat schedules from these apps will be imported to estela’s main Celery app. E.g., you may set
app1,app2
as a value for this variable. Then, estela will look for Celery apps namedapp
insideapp1.celery
andapp2.celery
.
Mailing
-
<EMAIL_HOST> (Required): Host of the SMTP email server.
-
<EMAIL_PORT> (Required): Port of the SMTP email server.
-
<EMAIL_HOST_USER> (Required): The user using the SMTP email service.
-
<EMAIL_HOST_PASSWORD> (Required): Password of the above user. To avoid reading conflicts, enclose the value in quotes.
-
<EMAILS_TO_ALERT> (Required): Email address that will receive a notification when a new user is created.
-
<VERIFICATION_EMAIL> (Required): Email address that will send the verification emails.
-
<REGISTER> (Required): Set this value to
"False"
to disable the user registration.
The mailing configuration is used to send email regarding users creation on the estela system.
Data Downloads
- <MAX_CLI_DOWNLOAD_CHUNK_MB> (Required): This is the maximum size of the chunks when downloading data via the estela-cli. E.g., if this is set to a value of 2 and you download 1GB of data, 500 chunks would be downloaded.
- <MAX_WEB_DOWNLOAD_SIZE_MB> (Required): This is the maximum download size via Estela’s web interface. We recommend not setting this value higher than 2GB, and you should update the timeout value for your API according to the value you set here. E.g., if you use
gunicorn
, you would add thetimeout
flag:gunicorn config.wsgi --bind=0.0.0.0:8000 --timeout=600
. We nencourage you to use the estela-cli for bigger downloads.
Proxies
- <PROXY_PROVIDERS_TO_TRACK> (Optional): In Estela, you can add custom proxy providers you can configure and reutilize in your projects, spiders, jobs and cronjobs. In this variable, set the names of the proxy providers you want to track. E.g.,
my_custom_proxy,my_other_custom_proxy
.
estela queueing variables
-
<CONSUMER_PRODUCTION> (Required): Set this value to
"False"
if the database used by the consumers does not require a certificate for the connection. Otherwise, set it to"True"
. -
<WORKER_POOL> (Optional): Number of worker threads per consumer, it must be an integer. If the variable is left blank, the default value is
10
. -
<HEARTBEAT_TICK> (Optional): Number of seconds between heartbeat inspections, it must be an integer. If the variable is left blank, the default value is
300
. -
<QUEUE_BASE_TIMEOUT> (Optional): Minimum number of seconds a worker thread can wait for an item to be available in the internal item queue, it must be an integer. If the variable is left blank, the default value is
5
. -
<QUEUE_MAX_TIMEOUT> (Optional): Maximum number of seconds a worker thread can wait for an item to be available in the internal item queue, it must be an integer. If the variable is left blank, the default value is
300
. -
<BATCH_SIZE_THRESHOLD> (Optional): Size threshold in bytes of the data batch to be inserted, it must be an integer. If the variable is left blank, the default value is
4096
. -
<INSERT_TIME_THRESHOLD> (Optional): Time threshold in seconds of the insertion of consecutive items belonging to the same batch of data, it must be an integer. If the variable is left blank, the default value is
5
. -
<ACTIVITY_TIME_THRESHOLD> (Optional): Time threshold in seconds of the activity time of an Inserter object before being cleaned up, it must be an integer. If the variable is left blank, the default value is
600
.