Over the past decade, innovation in the modern data stack has pushed the boundaries of what’s possible with data. At the bottom of the modern data stack, cloud data warehouses like Snowflake and Redshift have made it easier than ever to store and query large volumes of data. At the top of the modern data stack, visualization tools like Tableau and Looker have completely democratized self-service analytics, enabling business users to answer their questions with no code. Between these layers, dbt (data build tool) has quickly become an industry standard for data transformation in the modern data stack. This blog explains why we decided to build the Numbers Station Data Transformation Assistant on top of dbt, and how Numbers Station enables any data analyst to rapidly create powerful data transformation pipelines.
dbt is one of the fastest growing tools for data modeling and transformation. It serves as a collaborative tool that enables technical data analysts, or newly called analytics engineers, to build well-governed data transformation pipelines. At its core, dbt brings the best software engineering practices into the modern data stack. With dbt analytics engineers can collaborate on their transformation pipelines, document their code, manage dependencies, define metrics, test their models, implement version control and orchestrate production runs. The result is a powerful tool that governs the data transformation process, enforcing software engineering best practices for analytics engineers who may not be software engineering experts. With dbt, users no longer need to worry about setting up and maintaining their own infrastructure, which can be a significant barrier to entry for smaller organizations or teams.
Unfortunately, technical skills (e.g. SQL, Python) are still a prerequisite for using dbt, which creates a high barrier to entry for less technical data and business analysts. Consequently, business stakeholders need to constantly work with analytics engineers to create new data views, which can sometimes cause costly communication cycles until data is prepared and amenable to analysis. Even worse, communication costs are more costly for business questions that rely on statistical and machine learning-based transformations as the data science team also needs to be looped in.
At Numbers Station, our mission is to close this skills gap using foundation model technology. Foundation models, because of their natural language interface, enables users with limited to limited to no technical expertise to be part of the data transformation journey. Our team pioneered applying foundation models to data transformation tasks (see our research blog on this topic) and was the first to show that foundation models can clean data, reformat columns, fill-in missing values or match duplicate records. We firmly believe that the next level of accessibility for data transformation will be powered by foundation model technology, a vision we share with Tristan Handy, dbt labs’s CEO, who told us:
I have 100% confidence that foundation model powered AI will play a defining role in the evolution of the analytics engineering workflow over the coming years. Tristan Handy, dbt labs CEO.
Available now, the Numbers Station Data Transformation Assistant democratizes the data transformation process to all skill levels in the data and analytics space. In more details, the platform offers two main types of transformations powered by dbt: SQL-based and AI-based transformations. For data and business analysts that are not comfortable expressing their ideas in code, Numbers Station's SQL Transformations offer a natural language interface that enables users to generate SQL code for mundane transformation tasks like joining or aggregating data. For advanced tasks such as extracting values from text, classifying data, and predicting sentiments, Numbers Station's AI transformations offer a natural language interface to intelligent foundation models over users’ data. By directly producing answers for the different transformation tasks, these foundation models unlock the power of AI for any data analyst.
The combination of Numbers Station and dbt can accelerate data transformation and intelligence tasks by directly allowing data analysts to create data pipelines in natural language. This saves precious time for engineers who can now focus on more important problems than writing group by queries or reformatting entries. Of course, there is always a tension between speed and accuracy, and to ensure trustworthiness in their pipelines, Numbers Station's users have the option to export their pipelines as dbt projects and share them with engineers for verification and deployment into production.
As AI technology continues to become more advanced, the flexibility and power to gain valuable insights from data only grows stronger. Numbers Station is bringing our cutting edge foundation model technology to analytics workflows, empowering data and business analysts to accelerate their data-driven insights. To learn more, listen to our podcast with dbt on Spotify or sign up to start your free trial.