Introducing Numbers Station Labs

Today we are excited to announce Numbers Station Labs, a research lab dedicated to designing, building, and sharing the innovation needed to bring foundation model technology to the modern data stack. Research has always been at the core of Numbers Station: we started as a group of PhDs from the Stanford AI lab where the term foundation models (FMs) was coined. As such, we believe that continued dedication to research based innovation is a key to our impact on the modern data stack and the success of our customers. To introduce Numbers Station Labs, we’d first like to recap how we got here, our research agenda and mission, and a summary of our latest contributions to the field.

How We Got Here

Artificial Intelligence (AI) is undergoing a revolution with the recent development of foundation models. These large pretrained models exhibit out-of-box capabilities (zero/few-short learning) that eliminate the barrier to entry that traditional AI/ML solutions suffer from (e.g. model implementation, expensive data labeling, model training). This paradigm shift is drastically changing the way we build applications around AI by making it very easy for anyone to integrate AI capabilities into their products, and we are already seeing exciting generative AI applications for content creation (e.g. marketing blogs), customer support (e.g. chatbots) and workflow automation (e.g. email generation).

At Numbers Station, we see the opportunity to bring this technology into the modern data stack to remove its current high barrier to entry, opening new possibilities for enterprise data automation. We are particularly excited about this vision for three core reasons:

Data workers today still are in real pain—spending most of their time on mundane tasks, not producing valuable insights for their organizations.
Our founding team all earned PhDs in a mix of AI and data systems, including individuals who designed and shipped some of the most advanced AI systems in wide use today. This problem space is near and dear to our hearts.
Classic enterprise data tasks are often neglected by the top AI talent in favor of flashier applications, like image generation, and we were in a unique position to fill this talent gap.

We made these observations prior to starting Numbers Station and quickly started experimenting more with foundation models. We were blown away by our early results and the ability of these models to match or exceed previous state-of-the-art AI data systems (many of which we designed!). Through this effort we became the first set of researchers to show that foundation models could be used to replace legacy state-of-the-art systems on a variety of tough enterprise data tasks. We saw an immense opportunity to bring this technology into a product to democratize access to the modern data stack, and this is when we decided to start Numbers Station.

Our Mission and Research Agenda

Foundation models (FMs) exhibit amazing out-of-the-box capabilities which feel like magic. With a foundation model, prototyping AI capabilities requires no ML expertise and takes minutes. For example, anyone can prototype sentiment analysis workflows by simply providing a prompt like “assign a sentiment (positive or negative) to the following text” to a foundation model, and the model will “magically” understand the task. It is however important to note the word prototype here. During our early experimentations with foundation models, we quickly realized that using FMs out-of-the-box was not enough to solve real enterprise production use cases. There are a host of last mile problems for deploying foundation models into the modern data stack that remain unsolved.

At Numbers Station, our goal is to solve these last mile problems and push FMs across the finish line for enterprise data tasks. That involves answering a few questions which drive our research agenda.

Applications: How can we use foundation models in the modern data stack? ‍

In this context, we are working with very large (typically hundreds of billions of parameters), generalist, frozen models.

Beyond the engineering work that needs to go into bringing foundation model technology into the modern data stack (e.g., building integrations with popular data warehouses, connectors with metadata and backend APIs, human-in-the-loop user interfaces, …), there is a fundamental research question to understand when it makes sense (or not) to use these models and how to apply to various enterprise data tasks to create 0-1 capabilities for data workers. This is a challenging problem because the concept of foundation models is itself closely tied to unstructured data (e.g., text or images) with generative, probabilistic outcomes, while data applications typically involve structured data with procedural, deterministic outcomes.

Consequently, when we first introduced the idea of applying FM technology to the structured data world in our VLDB 2022 vision paper, it sounded like a crazy concept. With the release of ChatGPT and FM popularization, people quickly realized the immense opportunity of unlocking the power of these models on structured data. We’ve seen a massive wave of startups and organizations using GPT models to generate code and automate some structured data workflows. However, there is so much more that these models can do beyond generating code.

We already showed that they can impute missing entries in tables, find duplicate records, resolve siloed records, or normalize data entries. Beyond these data wrangling tasks, we believe these models can be used to automate the full stack of data pipelines: Could they generate reports and data visualizations? Create dashboards and data summaries? Inspect data quality and provide metrics or recommend actions? Could we use FMs to close the (unnecessary) gap between data science, data analysis, and data engineering? Given the wealth of possibilities, this first line of research is focused on understanding possible applications and exploring the limits of what’s possible with FMs.

Algorithms: How can we make the models better for enterprise data tasks?

In this context, we are working with large (typically tens of billions of parameter) models that we specialize for data tasks using continual pretraining and finetuning.

As we continue to explore and expand the set of tasks where FMs can replace legacy technology (first research question), our second research driver is focused on developing algorithms to make existing FMs better at these tasks. The trend today is for most people to run off of a single large foundation model hosted by a third-party. This limits the ability for intelligent AI systems to understand enterprises’ unique needs, workflows, and processes. As a result, it’s easy to build a cool parlor trick demo, but this rarely leads to production problems solved.

At Numbers Station, we focused on data tasks that occur in data-intensive workflows, and this is only a small subset of what general purpose FMs can do. Our goal is to therefore understand whether we can make existing generalist FMs better at these tasks by specializing and personalizing them to the enterprise’s unique needs. For instance, if an enterprise is using FMs to resolve duplicates in different data systems, do they need a FM that can generate poetry or cooking recipes? Likely not, but it might be useful to teach FMs what the different customer profiles are, what the data schemas are, as well as any other useful organizational knowledge.

We believe there is a huge opportunity to refocus open source pretrained models on data tasks and make them better at understanding private organizational data via personalization. Not only can this improve FMs’ out-of-the-box capabilities at these tasks, but it can also make them better over time as users interact with them. Further, by focusing our efforts on specializing open source models, we have greater control over how the models are improved and can maintain enterprise privacy needs. The FM never needs to leave the enterprise’s secure ecosystem. We are continuously pushing the boundaries of how to personalize FMs and what capabilities can be enhanced by leveraging techniques like continual pretraining over organizational data, finetuning for data tasks specifically, and personalization using user feedback and interactions captured in logs.

Systems: How can we deploy FMs into production data pipelines?

In this context, we are considering much smaller foundation models (hundreds of millions of parameters) that we finetune for user-specified tasks (one model per task) to enable large scale deployment into production pipelines.

Foundation models typically need to be very large in size (tens or hundreds of billions of parameters) to have that magic feel and perform tasks out-of-the-box. This generality is essential to fix cold start problems and enable quick prototyping of varied AI capabilities. The larger size, however, comes at the cost of more expensive system requirements (e.g., model hosting) and slower inference. For applications where foundation models are used with a human-in-the-loop (e.g. generating code, writing content, etc), this expense might not be an issue since requests are limited by the speed of humans interacting with the model. However, for enterprise data tasks that cannot be solved using code generation (e.g. resolving duplicate records, imputing missing data value, categorizing text entries, etc), it is the FM itself that generates the answers. For real-world data use cases, models this size are impractical if not impossible to run on large-scale data problems (millions or billions of records). This not only would be extremely expensive but also extremely slow to run.

At Numbers Station, we are investing research efforts into building a seamless transition from prototyping with very large models to production deployment with much smaller, faster models without sacrificing performance. Achieving the highest quality deployed model comes with challenges around how to best leverage the user interaction with the larger model and how to select and finetune a smaller model from the diverse pool of publicly available models. Some of the techniques we use include intelligent data sampling for prototyping, active learning to select examples for user feedback, and weak labeling of training data for model distillation.

Our Findings to Date

The above questions formed our research agenda and key technical innovation points that power and will power (some are still work in progress!) future versions of the Numbers Station Data Intelligence Suite. In addition to building and releasing the first version of our product, we have core technical innovations we are excited to share back with the community today.

Applications

Can FMs Wrangle your Data? [Blog] [VLDB 2022 Paper]

Decades of research has gone into building systems to help analysts clean and wrangle their data. We simply asked – could a single foundation model do these tasks instead. This was our pioneering work showing how to get state-of-the-art performance on complex wrangling tasks with foundation models.

Algorithms

Ask Me Anything Prompting Method [Blog] [ICLR 2023 Paper]

Foundation models are notoriously finicky to prompt, and prompts often only work for a single foundation model. In this work, we developed a general prompting method that closed the performance gap between a variety of open source 6B parameter models and closed source models 30x larger across 15 benchmark tasks.

Personalizing SQL Coding Assistants [Blog]

Data analysts spend a majority of their time wrangling data with SQL processing, but existing foundation models for code are not personalized to the data and infrastructure of the analysts. We developed a framework to customize small, open source foundation models to the enterprise’s SQL workloads and data. Our framework can be 2000 times cheaper compared to alternative customization methods and generates models that surpass GPT-4 in performance on SQL benchmarks.

Systems

From Prototyping to Production Deployment with FMs [Blog]

Relying on a single large closed-source foundation model presents prohibitive security problems to enterprises and can be expensive to run over enterprise data. Running a foundation model over 1M rows of data can cost thousands of dollars! We show how to leverage user feedback to customize a 800x smaller, open source foundation model that can be deployed by the enterprise 2000x more cheaply without sacrificing performance compared to the larger model.

This is just the start of an ambitious technical innovation agenda for the team here at Numbers Station. We invite you to join us in our commitment to design new foundation model powered technology that democratizes access to the modern data stack.