dbt run Command: Usage & Examples
Introduction
Welcome to this comprehensive tutorial on the dbt run
command. dbt is a transformative tool for data professionals who need to manage and transform data in SQL. The dbt run
command is a core function of dbt, enabling the execution of compiled SQL model files against a specified target database. This is particularly beneficial in business scenarios where raw data needs to be transformed into a more digestible format for analysis and decision-making.
Understanding the dbt run Command
The dbt run
command is designed to execute your compiled SQL model files against the current target database. It connects to your database and executes the necessary SQL to materialize all data models using the strategies you’ve outlined.
Let’s consider a business scenario where you have a raw sales data table and you want to create a model that aggregates this data by product category. We'll assume you already created a staging model for that table, so we can reference the staging model here. Your SQL model file might look like this:
-- sales_by_category.sql
{{
config(
materialized='table'
)
}}
select
product_category,
sum(sales_amount) as total_sales
from {{ref('stg_sales_data')}}
group by product_category
To run this model, you would use the dbt run
command in your terminal:
dbt run --model sales_by_category
The command will execute the SQL in the model file against your target database, creating a new table with the aggregated sales data.
dbt run Model Deployment
When deploying new models, dbt run
handles the process smoothly. It first builds each model with a temporary name, then drops the existing model, and finally renames the new model to its correct name. This process is handled within a single database transaction for database adapters that support transactions, ensuring data integrity.
For instance, if you update the sales_by_category.sql
model to also include the average sales amount per category, dbt run
will create a new version of the sales_by_category
table with the additional data, and then replace the old table with the new one.
Using the full-refresh Flag
The --full-refresh
flag is a powerful feature of dbt run
. When used, dbt treats incremental models as table models. This is especially useful when the schema of an incremental model changes and you need to recreate it, or when you want to reprocess the entirety of the incremental model due to new logic in the model code.
For example, if we had a daily version of our sales_by_category
model from earlier and it only added new rows each day, it'd look like this:
--daily_sales_by_category.sql
select
sales_date,
product_category,
sum(sales_amount) as total_sales,
avg(sales_amount) as average_sales,
count(*) as number_of_transactions
from {{ref('stg_sales_data')}}
{% if is_incremental() %}
where sales_date > (select max(sales_date) from {{ this }})
{% endif %}
group by sales_date, product_category
In this case if you need to run the model with all your raw data, and not just the days not yet added to the model, you can use the --full-refresh
flag to recreate the table with the new data.
dbt run --model sales_by_category --full-refresh
This will ignore the incremental logic in the where
statement of the model and run for all days in the source data set.
Selecting Specific Models to Materialize
dbt run
also allows you to select which specific models you’d like to materialize. This can be useful in scenarios where you may prefer running a different set of models at various intervals, or when you want to limit the tables materialized while you develop and test new models.
For example, if you have a large number of models but you’re currently only working on a few of them, you can use the m
option with dbt run
to only run those models. This can save time and resources during the development and testing process.
dbt run -m sales_by_category,daily_sales_by_category
Conclusion
We’ve covered a lot in this tutorial, from understanding the dbt run
command and its intelligent multi-threading, to deploying dbt run
models and using the --full-refresh
flag. Remember, the best way to get comfortable with these concepts is to practice. So, go ahead and start using the dbt run
command in your data transformation projects.
Previous
dbt ls