DAG | dbt Developer Hub (2024)

A DAG is a Directed Acyclic Graph, a type of graph whose nodes are directionally related to each other and don’t form a directional closed loop. In the practice of analytics engineering, DAGs are often used to visually represent the relationships between your data models.

While the concept of a DAG originated in mathematics and gained popularity in computational work, DAGs have found a home in the modern data world. They offer a great way to visualize data pipelines and lineage, and they offer an easy way to understand dependencies between data models.

DAGs are an effective tool to help you understand relationships between your data models and areas of improvement for your overall data transformations.

Unpacking relationships and data lineage

Can you look at one of your data models today and quickly identify all the upstream and downstream models? If you can’t, that’s probably a good sign to start building or looking at your existing DAG.

Upstream or downstream?

How do you know if a model is upstream or downstream from the model you’re currently looking at? Upstream models are models that must be performed prior to the current model. In simple terms, the current model depends on upstream models in order to exist. Downstream relationships are the outputs from your current model. In a visual DAG, such as the dbt Lineage Graph, upstream models are to the left of your selected model and downstream models are to the right of your selected model. Ever confused? Use the arrows that create the directedness of a DAG to understand the direction of movement.

One of the great things about DAGs is that they are visual. You can clearly identify the nodes that connect to each other and follow the lines of directions. When looking at a DAG, you should be able to identify where your data sources are going and where that data is potentially being referenced.

Take this mini-DAG for an example:

A miniature DAG

What can you learn from this DAG? Immediately, you may notice a handful of things:

  • stg_usersand stg_user_groups models are the parent models for int_users
  • A join is happening between stg_users and stg_user_groups to form the int_users model
  • stg_orgs and int_users are the parent models for dim_users
  • dim_users is at the end of the DAG and is therefore downstream from a total of four different models

Within 10 seconds of looking at this DAG, you can quickly unpack some of the most important elements about a project: dependencies and data lineage. Obviously, this is a simplified version of DAGs you may see in real life, but the practice of identifying relationships and data flows remains very much the same, regardless of the size of the DAG.

What happens if stg_user_groups just up and disappears one day? How would you know which models are potentially impacted by this change? Look at your DAG and understand model dependencies to mitigate downstream impacts.

Auditing projects

A potentially bold statement, but there is no such thing as a perfect DAG. DAGs are special in-part because they are unique to your business, data, and data models. There’s usually always room for improvement, whether that means making a CTE into its own view or performing a join earlier upstream, and your DAG can be an effective way to diagnose inefficient data models and relationships.

You can additionally use your DAG to help identify bottlenecks, long-running data models that severely impact the performance of your data pipeline. Bottlenecks can happen for multiple reasons:

  • Expensive joins
  • Extensive filtering or use of window functions
  • Complex logic stored in views
  • Good old large volumes of data

...to name just a few. Understanding the factors impacting model performance can help you decide on refactoring approaches, changing model materializations, replacing multiple joins with surrogate keys, or other methods.

A bad DAG, one that follows non-modular data modeling techniques

Modular data modeling best practices

See the DAG above? It follows a more traditional approach to data modeling where new data models are often built from raw sources instead of relying on intermediary and reusable data models. This type of project does not scale with team or data growth. As a result, analytics engineers tend to aim to have their DAGs not look like this.

Instead, there are some key elements that can help you create a more streamlined DAG and modular data models:

  • Leveraging staging, intermediate, and mart layers to create layers of distinction between sources and transformed data
  • Abstracting code that’s used across multiple models to its own model
  • Joining on surrogate keys versus on multiple values

These are only a few examples of some best practices to help you organize your data models, business logic, and DAG.

Is your DAG keeping up with best practices?

Instead of manually auditing your DAG for best practices, the dbt project evaluator package can help audit your project and find areas of improvement.

dbt and DAGs

The marketing team at dbt Labs would be upset with us if we told you we think dbt actually stood for “dag build tool,” but one of the key elements of dbt is its ability to generate documentation and infer relationships between models. And one of the hallmark features of dbt Docs is the Lineage Graph (DAG) of your dbt project.

Whether you’re using dbt Core or Cloud, dbt docs and the Lineage Graph are available to all dbt developers. The Lineage Graph in dbt Docs can show a model or source’s entire lineage, all within a visual frame. Clicking within a model, you can view the Lineage Graph and adjust selectors to only show certain models within the DAG. Analyzing the DAG here is a great way to diagnose potential inefficiencies or lack of modularity in your dbt project.

The Lineage Graph in dbt Docs

The DAG is also available in the dbt Cloud IDE, so you and your team can refer to your lineage while you build your models.

Leverage exposures

One of the newer features of dbt is exposures, which allow you to define downstream use of your data models outside of your dbt project within your dbt project. What does this mean? This means you can add key dashboards, machine learning or data science pipelines, reverse ETL syncs, or other downstream use cases to your dbt project’s DAG.

This level of interconnectivity and transparency can help boost data governance (who has access to and who owns this data) and transparency (what are the data sources and models affecting your key reports).

A Directed acyclic graph (DAG) is a visual representation of your data models and their connection to each other. The key components of a DAG are that nodes (sources/models/exposures) are directionally linked and don’t form acyclic loops. Overall, DAGs are an effective tool for understanding data lineage, dependencies, and areas of improvement in your data models.

Get started with dbt today to start building your own DAG!

Further reading

Ready to restructure (or create your first) DAG? Check out some of the resources below to better understand data modularity, data lineage, and how dbt helps bring it all together:

  • Data modeling techniques for more modularity
  • How we structure our dbt projects
  • How to audit your DAG
  • Refactoring legacy SQL to dbt
DAG | dbt Developer Hub (2024)

FAQs

What are the four generic tests that dbt ships with? ›

There are four generic data tests that are available out of the box, for everyone using dbt.
  • not_null ​ ...
  • unique ​ ...
  • accepted_values ​ ...
  • relationships ​ ...
  • Test an expression​ ...
  • Use custom generic test​ ...
  • Custom data test name​ ...
  • Alternative format for defining tests​

What file type is used for specifying which generic tests to run by model and column? ›

In the . yml file the test is specified within the model or the column context. A test can either be only for one specific column, or for the whole model. For example, a test for a whole model could compare two columns with each other.

Does dbt build run snapshots? ›

The dbt build command will: run models. test tests. snapshot snapshots.

How to use dbt model? ›

A dbt model is a representation of a table or view in the data model. To write a model, we use a SQL SELECT statement. Here, we can apply use CTEs (Common Table Expressions) and apply transforms using SQL.

What is the 4 missed rule in DBT? ›

In DBT, if you miss four consecutive individual sessions or four consecutive DBT skills group sessions, you are discharged from the program. This is based on the idea that DBT can only work if a person is coming to treatment.

What are the 4 pillars of DBT? ›

The four pillars of DBT are mindfulness, distress tolerance, emotion regulation, and interpersonal effectiveness.

What is the difference between dbt run and dbt test? ›

dbt run — Runs the models you defined in your project. dbt build — Builds and tests your selected resources such as models, seeds, snapshots, and tests. dbt test — Executes the tests you defined for your project.

What is the difference between constraints and tests in dbt? ›

You may ask what is the difference between constraints and tests: Constraints depend on platform-specific support, while tests are more flexible, you can test anything, as long as you can build a query for it. Constraints prevent the table materialization if failed, tests run after the model is already materialized.

What is the unique key test in dbt? ›

The unique test operates by examining each value in the targeted column(s) of a model (such as a table or view) to verify that no two rows have the same value in that column(s). This is essential for fields that are expected to uniquely identify each record, such as order IDs, customer IDs, or any composite keys.

What is the empty flag in dbt? ›

Leveraging the --empty Flag for Efficient dbt CI/CD Workflows. The solution to this problem is the --empty flag in dbt 1.8. This flag allows dbt to perform schema-only dry runs without processing large datasets.

What is the difference between dbt compile and dbt run? ›

dbt compile is similar to dbt run except that it doesn't materialize the model's compiled SQL into an existing table.

Does dbt build include seed? ›

As the name implies, dbt seeds are a part of the dbt framework. They help you transform data in your warehouse more effectively. Consequently, including seeds in your dbt project allows you to load static data into your warehouse as part of your dbt run.

What are the 4 models of dbt? ›

When it comes to the four modules of DBT they fall under acceptance skills or change skills. Acceptance Skills include Mindfulness (module one) and Distress Tolerance (module four). Change Skills include Emotion Regulation (module two) and Interpersonal Effectiveness (module three).

What programming language does dbt use? ›

SQL: Since dbt uses SQL as its core language to perform transformations, you must be proficient in using SQL SELECT statements.

What are the 4 problem solving options for dbt? ›

DBT offers four basic options for handling any challenge: solving the problem, finding ways to feel better about the problem, learning to accept the problem or situation, or staying miserable.

What are the 4 models of DBT? ›

When it comes to the four modules of DBT they fall under acceptance skills or change skills. Acceptance Skills include Mindfulness (module one) and Distress Tolerance (module four). Change Skills include Emotion Regulation (module two) and Interpersonal Effectiveness (module three).

What is a generic test in DBT? ›

Generic Tests: A generic dbt test is defined in a YAML file and references a macro that contains the SQL logic. This setup allows for greater flexibility and reuse. A dbt test macro typically contains a select statement that returns records that don't pass the test.

What are the 4 topics of DBT? ›

DBT skills training is commonly done in a group setting, which helps teach individuals behavioral skills and how to implement them in their everyday lives. The 4 skills that are focused on are mindfulness, emotion regulation, distress tolerance, and interpersonal effectiveness.

What are the 4 modules of DBT? ›

The 4 DBT Modules – Explained By A DBT Therapist
  • Module 1 – Mindfulness.
  • Module 2 – Distress Tolerance.
  • Module 3 – Emotion Regulation.
  • Module 4 – Interpersonal Effectiveness.
  • To Conclude.

Top Articles
Arihant Capital Share Price Today - Arihant Capital Markets Ltd Stock Price Live NSE/BSE
Error (Roaming profile was not completely synchronized) and logon, logoff delays in Windows 10, version 1803 - Windows Client
English Bulldog Puppies For Sale Under 1000 In Florida
Katie Pavlich Bikini Photos
Gamevault Agent
Pieology Nutrition Calculator Mobile
Hocus Pocus Showtimes Near Harkins Theatres Yuma Palms 14
Hendersonville (Tennessee) – Travel guide at Wikivoyage
Compare the Samsung Galaxy S24 - 256GB - Cobalt Violet vs Apple iPhone 16 Pro - 128GB - Desert Titanium | AT&T
Vardis Olive Garden (Georgioupolis, Kreta) ✈️ inkl. Flug buchen
Craigslist Dog Kennels For Sale
Things To Do In Atlanta Tomorrow Night
Non Sequitur
Crossword Nexus Solver
How To Cut Eelgrass Grounded
Pac Man Deviantart
Alexander Funeral Home Gallatin Obituaries
Energy Healing Conference Utah
Geometry Review Quiz 5 Answer Key
Hobby Stores Near Me Now
Icivics The Electoral Process Answer Key
Allybearloves
Bible Gateway passage: Revelation 3 - New Living Translation
Yisd Home Access Center
Pearson Correlation Coefficient
Home
Shadbase Get Out Of Jail
Gina Wilson Angle Addition Postulate
Celina Powell Lil Meech Video: A Controversial Encounter Shakes Social Media - Video Reddit Trend
Walmart Pharmacy Near Me Open
Marquette Gas Prices
A Christmas Horse - Alison Senxation
Ou Football Brainiacs
Access a Shared Resource | Computing for Arts + Sciences
Vera Bradley Factory Outlet Sunbury Products
Pixel Combat Unblocked
Movies - EPIC Theatres
Cvs Sport Physicals
Mercedes W204 Belt Diagram
Mia Malkova Bio, Net Worth, Age & More - Magzica
'Conan Exiles' 3.0 Guide: How To Unlock Spells And Sorcery
Teenbeautyfitness
Where Can I Cash A Huntington National Bank Check
Topos De Bolos Engraçados
Sand Castle Parents Guide
Gregory (Five Nights at Freddy's)
Grand Valley State University Library Hours
Hello – Cornerstone Chapel
Stoughton Commuter Rail Schedule
Nfsd Web Portal
Selly Medaline
Latest Posts
Article information

Author: Twana Towne Ret

Last Updated:

Views: 6201

Rating: 4.3 / 5 (64 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.