A BI Developer’s First Look at dbt labs: Why I’m a Believer

As part of my journey to become a Microsoft Fabric Analytics Engineer, I recently completed the dbt fundamentals with VSCode course. I am new to the tool but I am really impressed. In my 8 years of BI experience I saw a lot of data quality and maintenance scenarios where dbt can help out with.

Version control

I used vscode for the course as it made it possible to practice using Git. Using version control is so fundamental for enterprise level data products that I won't spend a lot of time on detailing its advantages. It's a must to document who changed the code and why.

Testing

This feature is close to my heart as I worked on data quality reports for a number of years. When you start using a new source system, you might find it useful to build a data quality report before more moving on to create reports that serve a more specific goal. There are a lot of basic data quality checks that should prevent problems reaching the semantic model. I frequently used the Sempy library to retroactively check for referential integrity violations using a method described in this article. Integrating these checks directly into the transformation pipeline with dbt's test command could have prevented many of those issues from ever reaching the semantic model. The freshness check is also useful, it's better to see if the data is stale before the users flag it for you. Documenting testing scenarios or possible data quality issues should be independent of semantic models and part of a scoping process. You don't want to present analysis based on problematic data or investigate issues the stakeholders are already aware of.

Documentation and lineage

The ability to generate documentation and lineage directly from the code is a massive time-saver. This is incredibly relevant today, as preparing a data model for an AI connection with tools like Microsoft Copilot/ data agent requires clear, coherent descriptions of tables and fields. Collecting and maintaining this information in the development tool saves a lot of time compared to using documents or Sharepoint pages. This isn't static documentation that quickly goes stale; it's a dynamic resource for the entire team and the users. The automatically generated data lineage graph is especially powerful, making it easier to clean up unused objects from databases and planning dependencies better. If you can present the lineage to you audience or customers, it will help building trust in your analysis.

Conclusion

dbt is just one tool in the modern data stack, but it is gaining popularity, they just announced merging with Fivetran. Even if you do these process with other tools at your company, the principles dbt implements testing, version control and documentation as code are fundamental to building reliable data platforms.

Version control

Testing

Documentation and lineage

Conclusion

Leave a Reply Cancel reply