dbt build Command: Usage & Examples

Introduction

dbt is a powerful tool used by data engineers and analysts to transform data in their warehouses more effectively. One of the essential commands in dbt is dbt build, which allows you to run models, test tests, snapshot snapshots, and seed seeds in your dbt project. This tutorial will guide you through understanding and using the dbt build command effectively.

Understanding the dbt build Command

The dbt build command operates in Directed Acyclic Graph (DAG) order. This means it runs commands based on the dependencies between your models, tests, seeds, and snapshots. When you run dbt build, it creates two important artifacts: a single manifest and a single run results artifact. The manifest is a file that contains representations of all the resources in your dbt project, while the run results artifact contains detailed information about the output of the dbt build command, including executed models and tests, the time to run the models, test failure rates, and more.

Setting Up dbt build

To use dbt build, you first need to have a dbt project set up. If you don't have one yet, you can create one using the dbt init command. Once your project is set up, you can use dbt build with various flags and options to control its behavior. For example, you can use the --select option to specify which models to run, or the --exclude option to specify which models to exclude. The --resource-type flag allows you to filter the resources that dbt build will operate on.

Running dbt build

Let's say you have a dbt project with two models: orders and customers. The orders model depends on the customers model. Here's how you can run dbt build to build these models:

dbt build --select orders,customers

When you run this command, dbt will first build the customers model because the orders model depends on it. If the build for customers succeeds, dbt will then build the orders model. If the build for customers fails, dbt will skip the orders model.

Understanding the Output of dbt build

The output of dbt build provides detailed information about the build process. For each model, test, seed, or snapshot that dbt build runs, it will show whether the operation was successful or not, and how long it took. If there are any errors, dbt build will also provide error messages to help you troubleshoot the issues.

For example, if you run dbt build on the orders and customers models, you might see output like this:

$ dbt build --select orders,customers

Running with dbt=0.21.0
Found 2 models, 0 tests, 0 snapshots, 0 analyses, 341 macros, 0 operations, 0 seed files, 0 sources

14:49:43 | Concurrency: 1 threads (target='dev')
14:49:43 |
14:49:43 | 1 of 2 START table model my_project.customers..................... [RUN]
14:49:43 | 1 of 2 OK created table model my_project.customers................ [SELECT 100 in 0.12s]
14:49:43 | 2 of 2 START table model my_project.orders........................ [RUN]
14:49:43 | 2 of 2 OK created table model my_project.orders................... [SELECT 500 in 0.15s]
14:49:43 |
14:49:43 | Finished running 2 table models in 0.45s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

This output tells you that dbt build successfully created the customers and orders models, selecting 100 and 500 rows respectively.

Comparing dbt build vs run

While dbt build and dbt run might seem similar, they have significant differences. The most noticeable difference is in their operating principles. dbt build validates tests and then runs the downstream models, while dbt run runs prior to testing. This means that dbt build is optimized to obtain better data quality, while dbt run can lead to data quality issues if not used carefully.

For example, if you have a unique test on a column in the customers model and you use dbt run to build the models, dbt run will not check the unique test before building the orders model. This could lead to data quality issues in the orders model if there are duplicate values in the column in the customers model.

Best Practices for Using dbt build

When using dbt build, it's important to follow best practices to ensure that your data transformations are reliable and efficient. Here are some tips:

dbt build is the preferred way to run models in production environments. This ensures that your data transformations are tested before they are run, which can help prevent data quality issues.
Use the --full-refresh flag when there is a change in the schema of the incremental model or if you want to reprocess the logic changes in the models.
Use graph operators (+ before or after model names) or --select and --exclude to control which models, tests, seeds, and snapshots dbt build operates on. This can be useful for testing changes to specific parts of your dbt project.
Regularly check the output of dbt build for errors and warnings. This can help you catch and fix issues early.

Conclusion

The dbt build command is a powerful tool for data transformation in dbt. By understanding how it works and following best practices, you can use dbt build to create reliable and efficient data transformations in your dbt projects. Whether you're a data engineer or a data analyst, mastering dbt build is a valuable skill that can help you get the most out of dbt.

dbt artifacts

dbt clean

Unified workspace for your dbt workflow

Forget about the painful parts of dbt development, focus on what matters the most - data analysis

Join the beta