Big Data in Open Innovation

Useful After All If Harnessed Appropriately

Spread the love

Woman with data code projected on her

As leaders in Open Innovation, at yet2 we strive to innovate ourselves. Using Big Data in our own scouting activities has been an investment we’ve been making over the few years. Though early returns were disappointing, we’ve recently been finding some very interesting results, and we will continue to refine our capabilities. To help make this intangible concept feel a little more real, below we share just 3 examples of how we at yet2 leverage Big Data in our scouting:

Starting with unique, quality datasets: avoid “garbage in, garbage out.”

We believe the most important step to get right in any Big Data strategy is having access to high-quality and unique data sets. There are a handful of now standard innovation datasets that many companies are accessing (clinical trials, patents, VC funding, publications, etc). At yet2 we are building a new set of datasets like CMO databases, active pharmaceutical ingredients, chemical compounds, and yet2’s own proprietary global database now numbering over 40,000 startups and SMEs. And, most importantly, we’re finding interesting value in building a capability to access unique topic-specific datasets across different geographies. For example, we’ve curated datasets around promising CMOs in China, which became a critical scouting source for several recent projects for consumer healthcare clients.

Making sense of the noise

While some datasets are higher quality than others, you will come across data sources that are inaccurate, incomplete, and – worst case – filled with bad data. Rather than rejecting those resources, we’re finding interesting value in being able to clean the data upfront – including normalization, data processing – such as merging different datasets together using a unique identifier to make a single more robust dataset – and using scripts to identify and remove irrelevant data points. At yet2 we perform an initial cleaning of the dataset but we believe this is an iterative process and choose to repeat to make sure our data is kept up to date.
Once datasets are clean, we then leverage custom scripting to filter through the massive datasets and extract promising leads efficiently. These scripts are custom-built by yet2’s team for each project, dataset, and topic area. This step helps make the dataset more manageable for our team of OI consultants, taking datasets as large as 10,000 data points down to a few hundred or so top relevant data points for further human review and filtering.

Leaving no stone left unturned

Once we have an initial dataset, we apply different analytical tools to further understand the technology landscape from the data. We analyze the dataset to identify classes of technologies as well as identify similar terms or synonyms that can then be fed into additional scouting or filtering exercises.
For example, in one project we analyzed a list of reference papers from a promising technology publication to extract additional technology categories to search in. Our initial source used terms such as “small molecule,” “taste modulation,” and “plant protein.” After running a keyword analysis against a list of 20+ reference articles our terms list expanded to include terms such as “taste perception,” “extracts,” “peptides,” “taste enhancer,” and more. We then use an iterative approach where we feed these back into our scripts to extract even more relevant leads from the dataset.


We’ve only scratched the surface in terms of developing our capabilities around leveraging Big Data. Yet, we are already seeing value permeate across our scouting projects. Shifting from skeptics to qualified believers, we’re excited to continue to develop this differentiating capability to help us continue to bring promising technologies to our clients.

Here are a few recent projects where we leveraged Big Data:


Contact us to learn more about Big Data or discuss your current projects.

More Insights:

Spread the love