October 21, 2022
Written by
Ines Chami
,

Numbers Station: Intelligence for the modern data stack

Numbers Station is building foundation model-powered technology to bring automation in data-intensive enterprise workflows, so data workers spend less time on mundane data tasks, and more time generating insights.

Efforts towards building Artificial Intelligence (AI) systems have always been about automating time-consuming tasks to let humans spend time on the things that matter most. In the past decade, we have seen amazing progress towards this goal with the development of AI systems that can automatically recognize objects, understand sentiments, classify images or translate one language to another. However, these systems were built using an outdated machine learning paradigm where users must define a task, label large amounts of data for that task, and then train a model specific to that task.

Foundation models are revolutionizing AI by drastically changing this paradigm, opening new avenues for task automation. A single foundation model like GPT-3 or OPT can serve as the foundation for many tasks with no task-specific data labeling, model implementation or model training. Because of this, we are starting to see task-specific models that took months or years to build, now being replaced by these general purpose foundation models. Concretely, these models are pre-trained on vast amounts of text data, and exhibit emergent capabilities at run-time that allow users to customize them for multiple tasks using only natural language instructions (or prompts). For instance, a prompt for an extraction task could be: “extract all the names mentioned in the following passage.” These new advances bring us one step closer to achieving Artificial General Intelligence (AGI) (i.e. intelligent agents that can understand or learn any task that a human being can), and there are amazing efforts in both academia and industry actively pursuing that goal (e.g. OpenAI, AdeptAI, InflectionAI, AI21).

Our focus at Numbers Station is on automating enterprise data tasks, i.e. any task involved in the management, processing and analysis of large, messy and siloed enterprise datasets.

The reason why we decided to focus on these enterprise data tasks is that there is a huge need for automation in this space; data workers spend up to 80% of their time on mundane data tasks because today’s tools are decades old, error-prone, and brittle. Even worse, disparate and disjointed technology is sparsely available to automate different stages of the entire enterprise data journey: from data preparation, to insight generation, and eventually downstream actions. Automating all of these data tasks would empower data workers to spend their time delivering value rather than on unpleasant and time-consuming data tasks.

There is a good reason that there has been no solution. For the last few decades, this problem was technically unsolvable. Unlike simple data tasks (e.g. count over a table) that can be automated with rule-based solutions, many enterprise data tasks are complex in nature, and vary drastically from one project to the other, making it hard to use existing templated solutions. Intelligent solutions can be implemented in some cases, but these suffer from a high time to value due to a high barrier to entry for technical knowledge.

At Numbers Station, we are building foundation model-powered technology that removes the barrier to entry that existing solutions face, opening new possibilities for enterprise data automation. Our team spent years in the Stanford AI lab building data task automation technology and pioneered (paper) the use of foundation models on these tasks. Our foundation models act like an AI trusted assistant that you can collaborate with in natural language to offload repetitive data tasks and be more productive at work. Their intuitive interface empowers anyone, not only technical experts, to customize and implement data task automation in their workflows. Using Numbers Station, enterprise data tasks that would typically require weeks or months to automate with traditional systems can now be automated in minutes, by just talking to the model and telling it what we want it to do for us in natural language.

There are of course many technical details involved to make this technology work on enterprise use cases such as improving quality on domain-specific data (e.g. legal, finance, …), making these models compute-efficient to use them at scale, and integrating the foundation models’ natural language interface with enterprise data software tools so they can take actions (e.g. Snowflake, Tableau, Databricks, …). We will elaborate more on these challenges and how we solve them in a subsequent blog post.

We are excited to release a trial version of Numbers Station’s platform. Join our waitlist to give it a try, all you need is to upload some data and tell our platform what you would like to automate, it will take it from there.