PopSQL
Back to tutorials

dbt clean & clean-targets Commands: Usage & Examples

Introduction

dbt is an essential tool for data professionals, enabling them to transform, test, and document data in the modern analytics engineering workflow. As with any project, maintaining a clean environment is crucial for efficiency and accuracy. This tutorial will guide you through the dbt clean command, ensuring your dbt projects remain clutter-free.

What is the dbt clean Command?

The dbt clean command is a utility function designed to help you manage your dbt project's environment. Its primary purpose is to delete specified folders, ensuring that your project remains organized and free from unnecessary files.

The Role of clean-targets in dbt_project.yml

The clean-targets configuration in the dbt_project.yml file plays a pivotal role. It provides a list of directories and files that the dbt clean command should remove, offering a tailored cleaning experience for your project.

Setting Up clean-targets

Syntax and Structure

To specify which files or directories the dbt clean command should target, you'll use the clean-targets configuration. For instance:

clean-targets:
  - "target/*"
  - "dbt_modules/*"
  - "sales_data_temp.log"

Placement in the Project

Ensure that the clean-targets configuration is placed within the dbt_project.yml file, which should be located in your dbt project's root directory.

Using Wildcards

Wildcards, represented by the ***** character, can be incredibly useful. For example, target/* will match all files and directories within the target directory, ensuring a thorough cleanup.

Types of Files and Directories to Clean

In a business context, you might have directories related to specific departments or projects, such as:

  • finance_reports/
  • marketing_data/
  • temp_files/

You might also have log files or temporary data files from various business operations:

  • Q1_sales_temp.csv
  • employee_data_log.log

To ensure these are cleaned, you'd add them to your clean-targets configuration.

Dependencies and the dbt clean Command

When specifying directories or files in clean-targets, be aware of dependencies. If a file or directory is crucial for another part of your dbt project, removing it might cause issues. Always review dependencies before running the dbt clean command.

Safety Precautions

Before executing the dbt clean command:

  • Backup: Ensure you have backups of essential data or files.
  • Review: Double-check the clean-targets configuration to avoid unintentional deletions.
  • Test: Consider running the command in a test environment first.

Limitations of the dbt clean Command

The dbt clean command doesn't work when interfacing with the RPC server powering the dbt Cloud IDE. In such cases, use the dbt deps command, which cleans before installing packages. If you're using dbt Cloud, you can manually delete the target folder from the sidebar file tree.

Practical Exercise

Let's simulate a business scenario:

Imagine you're handling data for a retail company. Over time, you've accumulated temporary sales data, logs from marketing campaigns, and old finance reports.

  1. Setup: Create a sample DBT project with directories: sales_data/, marketing_logs/, and finance_reports/.
  2. Populate: Add some dummy data and logs to these directories.
  3. Configure clean-targets:
clean-targets:
  - "sales_data/temp_sales*"
  - "marketing_logs/campaign_2022.log"
  - "finance_reports/Q1_2022_report_temp.csv"
  1. Run: Execute the dbt clean command and observe the specified files and directories being removed.

Conclusion

The dbt clean command is a powerful tool in your dbt toolkit. By understanding its functionality and best practices, you can maintain a streamlined and efficient dbt project environment, especially crucial in dynamic business settings.

Additional Resources

Previous

dbt build
database icon
Unified workspace for your dbt workflow
Forget about the painful parts of dbt development, focus on what matters the most - data analysis