“By 2024 60% of the data used to develop AI and analytics projects will be generated synthetically.” This is a Gartner prediction that you’ll find in almost every article, deck, or press release related to synthetic data.
We repeat this quote here despite its ubiquity, because it says a lot about the overall addressable synthetic data market.
Let’s unpack: First, describing synthetic data that is “synthetically generated” seems tautological, but it’s also pretty obvious: we’re talking about data that is artificial/fake and created, rather than collected in the real world.
Then there’s the gist of the prediction: that synthetic data will be used in the development of most AI and analytics projects. As such projects are on the rise, the correlation is that the synthetic data market will also grow.
Last but not least is the time horizon. In our startup world, 2024 is almost today, and folks at Gartner already have a longer-term forecast: Some of the team published a piece of research “Forget About Your Real Data – Synthetic Data Is the Future of AI.”
“The future of AI” is the kind of promise investors like to hear, so it’s no surprise that checks have been poured in at startups with synthetic data.
In 2022 alone, MOSTLY AI raised a $25 million Series B round led by Molten Ventures; Datagen won a $50 million Series B led by Scale Venture Partners, and Synthesis AI won a $17 million Series A.
Startups with synthetic data that have raised significant funds already serve a wide variety of industries, from banking and healthcare to transportation and retail. But they expect use cases to continue to expand, both within new industries and those where synthetic data is already commonplace.
To understand what’s happening, as well as what’s going to happen when synthetic data is more widely adopted, we’ve been talking to several CEOs and VCs over the past few months. We learned about the two main categories of synthetic data companies, which sectors they appeal to, how to size the market and more.
The tip of the iceberg
Quiet Capital’s founding partner, Astasia Myers, is one of the investors optimistic about synthetic data and its uses. She declined to reveal whether she has invested in this space, but said that “there is a lot to be excited about in the world of synthetic data.”
Why the enthusiasm? “Because it gives teams faster access to data in a secure way at a lower cost,” she told Marketingwithanoy.
We can simply say that the TAM of synthetic data and the TAM of data will converge. Ophir Zuk (Chakon)
Access to large amounts of data has become critical for machine learning teams, and real data is often not up to the task for a variety of reasons. This is the gap that synthetic data startups hope to fill.
There are two main contexts these startups are targeting: structured data and unstructured data. The former refers to the kind of data sets contained in tables and spreadsheets, while the latter refers to what we might call media files, such as audio, text, and visual data.
“It makes sense to distinguish between structured and unstructured synthetic data companies,” Myers said, “because the synthetic data type is applied to different use cases and thus different buyers.”