Airflow CLI: Overview and Usage
Apache Airflow is a versatile open-source platform for orchestrating workflows, and its command-line interface (CLI) is a key tool that gives you direct control over every aspect of your data pipelines. With the Airflow CLI, you can start services, test tasks, and manage workflows—all from your terminal, no web browser needed. This guide, hosted on SparkCodeHub, takes a deep dive into the Airflow CLI, covering its commands, how to use them step-by-step where applicable, and why they’re essential for mastering Airflow. We’ll explore the basics, key commands, and practical examples to get you up to speed. New to Airflow? Start with Airflow Fundamentals, and complement this with Airflow Architecture (Scheduler, Webserver, Executor) for a fuller understanding.
What is the Airflow CLI?
The Airflow CLI is your command-line gateway to managing Airflow—a powerful, scriptable interface that lets you interact with the platform directly. When you install Airflow—detailed in Installing Airflow (Local, Docker, Cloud)—you gain access to the airflow command, which unlocks a wide range of subcommands. These let you initialize the database, start the Scheduler or Webserver, test Directed Acyclic Graphs (DAGs), and more, all with precise terminal inputs. It’s designed for speed and automation, offering a hands-on alternative to the web UI (covered in Airflow Web UI Overview). Whether you’re running a simple script with BashOperator or scaling up with Airflow with Apache Spark, the CLI is an indispensable tool for taking charge of your workflows.
Getting Started with the CLI
Before you can start using the CLI, you need to set up Airflow and ensure it’s ready to respond to your commands. Here’s how to get going, broken down into clear steps where actions are required.
Prerequisites for Using the CLI
To use the Airflow CLI, you need Airflow installed and a working environment. After following the installation process in Installing Airflow (Local, Docker, Cloud), you’ll have a virtual environment—say, airflow_env—with Airflow inside it. You also need the metadata database initialized to store workflow data. The CLI won’t work without these basics, so let’s ensure they’re set up.
Steps to Prepare Your Environment
- Open Your Terminal: On Windows, press the Windows key, type “cmd” into the search bar, and press Enter to open Command Prompt. On Mac, click the magnifying glass in the top-right corner, type “Terminal,” and hit Enter. On Linux, press Ctrl+Alt+T or search for “Terminal” in your applications menu.
- Navigate to Your Home Directory: Type cd ~ and press Enter on Mac/Linux to go to your home folder (e.g., /home/username or /Users/username). On Windows, type cd %userprofile% and press Enter to reach C:\Users\YourUsername.
- Activate Your Virtual Environment: If your environment is named airflow_env, type source airflow_env/bin/activate on Mac/Linux or airflow_env\Scripts\activate on Windows, then press Enter. You’ll see (airflow_env) in your prompt—like (airflow_env) username@machine:~$ or (airflow_env) C:\Users\YourUsername>—confirming it’s active. If you get “command not found,” ensure the path matches your environment’s location (e.g., ~/airflow_env).
- Initialize the Database: Type airflow db init and press Enter. This creates a SQLite database at ~/airflow/airflow.db (e.g., /home/username/airflow/airflow.db), along with the airflow folder and airflow.cfg. You’ll see output about setting up tables—it takes a few seconds. If it’s already initialized, you’ll get a message saying so, and that’s fine.
- Verify the CLI Works: Type airflow --help and press Enter. You’ll see a list of subcommands like db, scheduler, and dags—if this appears, the CLI is ready. If you get an error, recheck your environment activation or installation.
These steps ensure Airflow is installed, your environment is active, and the database is set up—now you can use the CLI with confidence. Learn more about the database in Airflow Metadata Database Setup.
Basic CLI Syntax
The CLI follows a straightforward structure: airflow [subcommand] [options]. The subcommand tells Airflow what to do—airflow db init sets up the database, airflow scheduler starts the Scheduler. Options tweak how it runs—add -h (e.g., airflow scheduler -h) to see help for that subcommand, showing flags like --dag-id or --port. You type these commands in your terminal after activating your environment, making it a fast, scriptable way to control Airflow.
Why Use the CLI?
The CLI is all about efficiency and flexibility. It’s faster than navigating the web UI in Monitoring Task Status in UI—type airflow tasks list my_dag to see tasks instantly instead of clicking around. It’s also perfect for automation—put airflow scheduler in a script to run unattended, unlike the UI’s manual triggers. From starting services to debugging DAGs with Defining DAGs in Python, it gives you precise control over Airflow’s workings.
Key CLI Commands
Airflow’s CLI offers a rich set of commands—let’s explore the ones you’ll use most, with steps where they involve a process.
Database Commands
The db subcommand manages the metadata database, which tracks your workflows’ states and history. It’s the backbone of Airflow’s memory, and these commands keep it in shape.
- Initialize the Database: Type airflow db init and press Enter to create the database—SQLite by default at ~/airflow/airflow.db. This sets up tables for DAGs, tasks, and runs, essential for Airflow to function—run it once after installation.
- Reset the Database: Type airflow db reset and press Enter—it asks for confirmation; add --yes (airflow db reset --yes) to skip it. This wipes all data (runs, states) and reinitializes, useful if things get messy but destructive, so back up airflow.db first if needed.
- Upgrade the Database: Type airflow db upgrade and press Enter to update the schema after an Airflow version bump—keeps it compatible, as noted in Airflow Version Upgrades.
Scheduler Commands
The Scheduler is what runs your tasks on schedule—use the CLI to start it.
- Start the Scheduler: Type airflow scheduler and press Enter—you’ll see logs as it scans your dags folder (set in DAG File Structure Best Practices) and queues tasks based on their schedule_interval, detailed in Introduction to Airflow Scheduling. Run it in its own terminal and keep it active.
Webserver Commands
The Webserver powers the UI—launch it via the CLI.
- Start the Webserver: Type airflow webserver -p 8080 and press Enter—it starts a server at localhost:8080. Open your browser, type that address, and hit Enter to see the UI with DAGs and statuses. The -p 8080 sets the port—use -p 8081 if 8080’s busy. Customize it with Customizing Airflow Web UI.
DAG Commands
The dags subcommand controls your workflows.
- List All DAGs: Type airflow dags list and press Enter—it shows every DAG in your dags folder, like my_dag, with details like status.
- Trigger a DAG: Type airflow dags trigger -e 2025-04-07 my_dag and press Enter—it runs my_dag for April 7, 2025, mimicking Triggering DAGs via UI.
- Pause/Unpause a DAG: Type airflow dags pause my_dag and press Enter to stop scheduling—it keeps past runs. Reverse it with airflow dags unpause my_dag—see Pause and Resume DAGs.
Task Commands
The tasks subcommand focuses on individual tasks within DAGs.
- List Tasks: Type airflow tasks list my_dag and press Enter—it lists task IDs like task1, task2.
- Test a Task: Type airflow tasks test my_dag task1 2025-04-07 and press Enter—it runs task1 for that date without database impact, ideal for debugging with DAG Testing with Python.
- Run a Task: Type airflow tasks run my_dag task1 2025-04-07 and press Enter—it executes and logs it, affecting the database.
Configuration Commands
Peek into settings with these commands.
- List All Configs: Type airflow config list and press Enter—it shows all settings from airflow.cfg, detailed in Airflow Configuration Options.
- Get a Specific Config: Type airflow config get-value core executor and press Enter—it returns the current Executor, like “SequentialExecutor.”
Practical Examples
Let’s see the CLI in action with real-world tasks, including steps where processes are involved.
Starting Airflow Services
To run Airflow locally, you need the Webserver and Scheduler up.
Steps to Start Airflow Services
- Open Your First Terminal: Launch Command Prompt (Windows) or Terminal (Mac/Linux) as in Step 1 of Prerequisites.
- Navigate to Home Directory: Type cd ~ (Mac/Linux) or cd %userprofile% (Windows) and press Enter.
- Activate Environment: Type source airflow_env/bin/activate (Mac/Linux) or airflow_env\Scripts\activate (Windows) and press Enter—see (airflow_env) in your prompt.
- Start the Webserver: Type airflow webserver -p 8080 and press Enter—open your browser to localhost:8080 to see the UI. Keep this terminal running.
- Open a Second Terminal: Launch another Command Prompt or Terminal.
- Navigate to Home Directory: Repeat Step 2—type cd ~ or cd %userprofile% and press Enter.
- Activate Environment: Repeat Step 3—type source airflow_env/bin/activate or airflow_env\Scripts\activate and press Enter.
- Start the Scheduler: Type airflow scheduler and press Enter—it starts scanning ~/airflow/dags. Keep this terminal running.
- Add a Test DAG: In a text editor (e.g., Notepad, VS Code), paste:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
with DAG(
dag_id="cli_example",
start_date=datetime(2025, 1, 1),
schedule_interval="@daily",
) as dag:
task = BashOperator(
task_id="echo_task",
bash_command="echo 'Hello from CLI!'",
)
- Save the DAG: Save it as cli_example.py in ~/airflow/dags (e.g., /home/username/airflow/dags/cli_example.py). Wait 10-20 seconds, refresh localhost:8080, and see “cli_example” listed.
Testing a DAG
Test your DAG without affecting the database.
- Run the Test: Type airflow dags test cli_example 2025-04-07 and press Enter—it simulates the run for April 7, 2025, printing “Hello from CLI!” to your terminal. For just the task, use airflow tasks test cli_example echo_task 2025-04-07—same output, focused on echo_task.
Triggering a DAG Run
Manually start a DAG run.
- Trigger the DAG: Type airflow dags trigger -e 2025-04-07 cli_example and press Enter—it queues a run for that date. Check localhost:8080 or logs in ~/airflow/logs (via Task Logging and Monitoring) to see “Hello from CLI!” executed.
FAQ: Common Questions About Airflow CLI
Here are some frequently asked questions about the Airflow CLI, with detailed answers based on online discussions.
1. How do I confirm the Airflow CLI is working after installation?
After installing Airflow with pip install apache-airflow (from Installing Airflow (Local, Docker, Cloud)), activate your environment—type source airflow_env/bin/activate (Mac/Linux) or airflow_env\Scripts\activate (Windows) and press Enter. Then type airflow version and press Enter—you’ll see a version like “2.4.3” if it’s working. If you get “command not found,” ensure the environment’s active and Airflow’s installed—type pip show apache-airflow to check details like location and version.
2. What’s the difference between airflow dags test and airflow tasks test?
The airflow dags test command runs a whole DAG for a specific date—type airflow dags test my_dag 2025-04-07 and press Enter to simulate all tasks without saving to the database, showing output in your terminal. In contrast, airflow tasks test targets one task—type airflow tasks test my_dag task1 2025-04-07 to run just task1 for that date, also without database changes. Use dags test to check the full workflow, tasks test to debug a single piece—both are safe for testing, unlike dags trigger.
3. How do I stop the Scheduler or Webserver after starting them with the CLI?
If you ran airflow scheduler or airflow webserver -p 8080, go to their terminal and press Ctrl+C—it stops them cleanly, showing a shutdown message. If you backgrounded them (e.g., airflow scheduler &), find their process ID—type ps aux | grep airflow (Mac/Linux) or tasklist | findstr airflow (Windows) and press Enter, then note the PID (e.g., 1234). Kill it with kill 1234 (Mac/Linux) or taskkill /PID 1234 (Windows). Restart with the original commands if needed.
4. Can I use the CLI to figure out why a task failed?
Yes—first, list tasks with airflow tasks list my_dag to get IDs like failed_task. Then type airflow tasks run my_dag failed_task 2025-04-07 --dry-run and press Enter to rerun it in test mode, showing output in your terminal. For logs, type cat ~/airflow/logs/my_dag/failed_task/2025-04-07T00:00:00+00:00/1.log (Mac/Linux) or type %userprofile%\airflow\logs\my_dag\failed_task\2025-04-07T00:00:00+00:00\1.log (Windows) and press Enter—check Task Logging and Monitoring for more.
5. How do I trigger a DAG for a past date using the CLI?
Use the trigger command—type airflow dags trigger -e 2025-03-01 my_dag and press Enter to start my_dag for March 1, 2025. The -e flag sets the execution date—verify it in the UI or logs. It’s a manual way to backfill, similar to Catchup and Backfill Scheduling.
6. What does airflow db reset do, and when should I use it?
The airflow db reset command erases your metadata database—DAG runs, task states, everything—then rebuilds it. Type airflow db reset --yes and press Enter to skip confirmation—it’s fast but wipes all history. Use it if the database is broken (e.g., Scheduler errors) or you want a clean slate after testing—copy ~/airflow/airflow.db elsewhere first if you need a backup.
7. How do I view my current Airflow configuration using the CLI?
Type airflow config list and press Enter—it lists all settings from airflow.cfg, like executor and dags_folder (see Airflow Configuration Options). For one setting, type airflow config get-value core executor and press Enter—it might show “SequentialExecutor.” It’s a quick way to check your setup without editing anything.
Conclusion
The Airflow CLI puts workflow management at your fingertips—start services, test DAGs, and troubleshoot with precision. From airflow db init to airflow dags trigger, it’s your toolkit for automation and control. Get Airflow running with Installing Airflow (Local, Docker, Cloud), craft DAGs in Defining DAGs in Python, and monitor them with Monitoring Task Status in UI. Dive deeper with Airflow Concepts: DAGs, Tasks, and Workflows!