Setting Up Apache Airflow: Local & Production Deployment

 Introduction

Apache Airflow proves to be a widespread workflow orchestration tool for businesses that need to handle their data pipelines and elaborate workflows. The system enables workflow programmers to generate and arrange procedures using straightforward Python scripting. Before working with DAGs (Directed Acyclic Graphs) and workflow management you must establish proper Airflow configuration.

This blog provides instructions to establish Apache Airflow setup in both local development and production systems.




What is Apache Airflow?

Apache Airflow functions as an open-source platform for developing workflows through code definition. The platform runs workflow scheduling and monitoring tasks through an architecture that remains easy to scale. Apache Airflow serves as a preferred tool in data engineering and ETL processes as well as machine learning pipelines.

Setting Up Airflow Locally

Setting up Airflow locally is great for development, testing, and learning. Here's a step-by-step guide:

1. Prerequisites

  • Python 3.7 or later
  • pip installed
  • Virtual environment (venv) recommended
  • Docker (optional but useful)

2. Installation

First, create a virtual environment:

python3 -m venv airflow_venv
source airflow_venv/bin/activate

Now, install Apache Airflow using pip:

pip install apache-airflow

Alternatively, if you want to use the official constraints (recommended):

AIRFLOW_VERSION=2.9.0
PYTHON_VERSION=3.8

pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "<https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt>"

3. Initialize Airflow Database

Airflow uses a metadata database to keep track of task statuses, DAG runs, and other system information.

airflow db init

4. Create a User

Create an admin user for the Airflow UI:

airflow users create \\
   --username admin \\
   --firstname Admin \\
   --lastname User \\
   --role Admin \\
   --email admin@example.com

5. Start Airflow

Start the Airflow web server:

airflow webserver --port 8080

In another terminal, start the Airflow scheduler:

airflow scheduler

Now visit http://localhost:8080 to access the Airflow UI!

Setting Up Airflow for Production

Running Airflow in production requires a more robust setup than local development. You want scalability, fault tolerance, monitoring, and security.

Here’s how a typical production deployment looks:

1. Choose an Executor

Airflow provides several executors:

  • LocalExecutor: Runs tasks in parallel (good for small teams).
  • CeleryExecutor: Distributes tasks across multiple worker nodes (popular in production).
  • KubernetesExecutor: Schedules tasks as Kubernetes pods (best for cloud-native setups).

For production, CeleryExecutor or KubernetesExecutor are preferred.

2. Setup Production Components

You will need the following components:

  • Web Server: Serves the UI.
  • Scheduler: Picks up new tasks.
  • Workers: Executes tasks.
  • Database: Metadata storage (PostgreSQL or MySQL preferred).
  • Broker: (Only for CeleryExecutor) like RabbitMQ or Redis.

3. Deploying with Docker (Recommended)

Using Docker Compose makes it easy to deploy all Airflow components.

Official Docker repository:

Apache Airflow Docker GitHub Repo

Sample steps:

git clone <https://github.com/apache/airflow.git>
cd airflow
cd docker-compose

# Set your environment variables in .env
cp .env.example .env

# Initialize Airflow
docker-compose up airflow-init

# Start services
docker-compose up

Services Running:

  • airflow-webserver
  • airflow-scheduler
  • airflow-worker
  • airflow-postgres
  • airflow-redis
  • airflow-triggerer

Then visit http://localhost:8080 and log in.

4. Best Practices for Production

  • Use PostgreSQL or MySQL, not SQLite.
  • Enable Airflow authentication (e.g., LDAP, OAuth).
  • Monitor with Prometheus and Grafana.
  • Backup metadata database regularly.
  • Separate your environments: dev, staging, production.
  • Use GitOps for DAG deployments.
  • Use Kubernetes Executor if you are cloud native (AWS EKS, GCP GKE).

Conclusion

The right implementation of Apache Airflow provides users with exceptional power. A quick and easy installation of Apache Airflow for development occurs using LocalExecutor on local servers. For production-grade orchestration, Docker Compose or Kubernetes deployments with CeleryExecutor or KubernetesExecutor offer robustness and scalability.

The proper configuration of Apache Airflow to meet your environment requirements enables workflow automation and steady execution of data engineering pipelines.


Apache Airflow Training:

Master Apache Airflow with AccentFuture's expert-led online training! Our comprehensive Apache Airflow online course covers real-time workflow orchestration and production-grade deployment. Enroll now to boost your data engineering skills with hands-on Airflow training.

Related Blogs


🚀Enroll Now: https://www.accentfuture.com/enquiry-form/

📞Call Us: +91-9640001789

📧Email Us: contact@accentfuture.com

🌍Visit Us: AccentFuture

Comments

Popular posts from this blog

What is Apache Airflow? A Beginner’s Guide

Apache Airflow: The Ultimate Guide to Workflow Automation and Why It Matters