This is part two of a five-part series addressing Airflow at an enterprise scale. I will update these with links as they are published.
- Airflow: Planning a Deployment
- Airflow + Helm: Simple Airflow Deployment
Previously, we formulated a plan to provision Airflow in a Kubernetes cluster using Helm and then build up the supporting services and various configurations that we will need to ensure our cluster is production ready. This post will focus on getting the Helm chart deployed to our Kubernetes service. This most basic of configurations requires a database and we have chosen to use PostgreSQL in this case.
Code samples can be found here.
Preparing the Database
Here we will assume that you have:
- PostgreSQL Database Server
- Credentials with administrative access
psql
CLI accessible
I will be using the Azure PostgreSQL Service but any compatible version will do. First Log into the database server using the psql
command:
psql "host=************.postgres.database.azure.com port=5432 dbname=postgres user=**************** password=********* sslmode=require"
Next, referring to the Airflow documentation, we can execute the following commands:
CREATE DATABASE airflow_db;
CREATE USER airflow WITH PASSWORD 'your-password';
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow;
Pulling the Chart and Value File
After the database is set up, we can move on to preparing the chart and our values file. Using Helm, add the airflow chart repository:
helm repo add apache-airflow https://airflow.apache.org
For the values file, retrieve the default values from the chart.
curl https://raw.githubusercontent.com/apache/airflow/main/chart/values.yaml > values.yaml
Set Airflow to use the KubernetesExecutor:
executor: "KubernetesExecutor"
Make sure we have some example DAGs to play with:
env:
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: "True"
Turn off the charts provided PostgreSQL resources:
postgresql:
enabled: false
Input credentials and database information:
data:
metadataConnection:
user: airflow@some-host
pass: your-password
protocol: postgresql
host: some-host.postgres.database.azure.com
port: 5432
db: airflow_db
sslmode: require
Deploying the Chart
Now that we have our values file setup for our database, we can deploy the chart. Authenticate with the cluster:
az aks get-credentials --name airflow-demo --resource-group airflow-demo
Add a namespace:
kubectl create ns airflow
The Airflow chart has a tendency towards long run times so, increase the timeout as you install the chart:
helm upgrade \
--install \
-f values.yaml \
--namespace airflow \
--timeout 30m0s \
--wait=false \
airflow \
apache-airflow/airflow
After Helm exits, we can navigate to our Kubernetes Dashboard and see the replica sets, pods, etc., that have been provisioned.
Now we should login into the cluster using the credentials provided in the Helm output. As we didn’t enable the ingress feature of the chart, access to the Airflow cluster requires port forwarding:
kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow
Navigating to http://localhost:8080
will bring up the login in screen. After using the credentials in the Helm output, you’ll see a table of DAGs.
To test our installation, unpause a DAG using the toggle on the left side of the screen and execute it. We expect a number of pods to be created as the tasks execute.
And that’s it, we have an Airflow cluster up and running. Now we can work on tuning the cluster to better fit our needs. The next installment in this 5-part series will handle logging in Apache Airflow!