Setting Up Apache Airflow: Local & Production Deployment
Introduction
Apache Airflow proves to be a widespread workflow orchestration tool for businesses that need to handle their data pipelines and elaborate workflows. The system enables workflow programmers to generate and arrange procedures using straightforward Python scripting. Before working with DAGs (Directed Acyclic Graphs) and workflow management you must establish proper Airflow configuration.
This blog provides instructions to establish Apache Airflow setup in both local development and production systems.
What is Apache Airflow?
Apache Airflow functions as an open-source platform for developing workflows through code definition. The platform runs workflow scheduling and monitoring tasks through an architecture that remains easy to scale. Apache Airflow serves as a preferred tool in data engineering and ETL processes as well as machine learning pipelines.
Setting Up Airflow Locally
Setting up Airflow locally is great for development, testing, and learning. Here's a step-by-step guide:
1. Prerequisites
- Python 3.7 or later
- pip installed
- Virtual environment (
venv
) recommended - Docker (optional but useful)
2. Installation
First, create a virtual environment:
python3 -m venv airflow_venv
source airflow_venv/bin/activate
Now, install Apache Airflow using pip:
pip install apache-airflow
Alternatively, if you want to use the official constraints (recommended):
AIRFLOW_VERSION=2.9.0
PYTHON_VERSION=3.8
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "<https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt>"
3. Initialize Airflow Database
Airflow uses a metadata database to keep track of task statuses, DAG runs, and other system information.
airflow db init
4. Create a User
Create an admin user for the Airflow UI:
airflow users create \\
--username admin \\
--firstname Admin \\
--lastname User \\
--role Admin \\
--email admin@example.com
5. Start Airflow
Start the Airflow web server:
airflow webserver --port 8080
In another terminal, start the Airflow scheduler:
airflow scheduler
Now visit http://localhost:8080 to access the Airflow UI!
Setting Up Airflow for Production
Running Airflow in production requires a more robust setup than local development. You want scalability, fault tolerance, monitoring, and security.
Here’s how a typical production deployment looks:
1. Choose an Executor
Airflow provides several executors:
- LocalExecutor: Runs tasks in parallel (good for small teams).
- CeleryExecutor: Distributes tasks across multiple worker nodes (popular in production).
- KubernetesExecutor: Schedules tasks as Kubernetes pods (best for cloud-native setups).
For production, CeleryExecutor or KubernetesExecutor are preferred.
2. Setup Production Components
You will need the following components:
- Web Server: Serves the UI.
- Scheduler: Picks up new tasks.
- Workers: Executes tasks.
- Database: Metadata storage (PostgreSQL or MySQL preferred).
- Broker: (Only for CeleryExecutor) like RabbitMQ or Redis.
3. Deploying with Docker (Recommended)
Using Docker Compose makes it easy to deploy all Airflow components.
Official Docker repository:
Apache Airflow Docker GitHub Repo
Sample steps:
git clone <https://github.com/apache/airflow.git>
cd airflow
cd docker-compose
# Set your environment variables in .env
cp .env.example .env
# Initialize Airflow
docker-compose up airflow-init
# Start services
docker-compose up
Services Running:
airflow-webserver
airflow-scheduler
airflow-worker
airflow-postgres
airflow-redis
airflow-triggerer
Then visit http://localhost:8080 and log in.
4. Best Practices for Production
- Use PostgreSQL or MySQL, not SQLite.
- Enable Airflow authentication (e.g., LDAP, OAuth).
- Monitor with Prometheus and Grafana.
- Backup metadata database regularly.
- Separate your environments: dev, staging, production.
- Use GitOps for DAG deployments.
- Use Kubernetes Executor if you are cloud native (AWS EKS, GCP GKE).
Conclusion
The right implementation of Apache Airflow provides users with exceptional power. A quick and easy installation of Apache Airflow for development occurs using LocalExecutor on local servers. For production-grade orchestration, Docker Compose or Kubernetes deployments with CeleryExecutor or KubernetesExecutor offer robustness and scalability.
The proper configuration of Apache Airflow to meet your environment requirements enables workflow automation and steady execution of data engineering pipelines.
Apache Airflow Training:
Master Apache Airflow with AccentFuture's expert-led online training! Our comprehensive Apache Airflow online course covers real-time workflow orchestration and production-grade deployment. Enroll now to boost your data engineering skills with hands-on Airflow training.
Related Blogs
🚀Enroll Now: https://www.accentfuture.com/enquiry-form/
📞Call Us: +91-9640001789
📧Email Us: contact@accentfuture.com
🌍Visit Us: AccentFuture
Comments
Post a Comment