Dark data in chemistry R&D: Strategies for success

GettyImages-507492447_CAS-Insights-Hero-Image

Written by:

June 30, 2023

What is dark data?

The power to revolutionize R&D is within reach for innovative chemistry companies. With an abundance of untapped data, commonly called "dark data," these companies can unlock unprecedented insights and accelerate innovation like never before. By implementing effective knowledge management strategies, the potential for breakthrough discoveries and advancements is limitless.

Dark data is typically unstructured or semi-structured data that is not easily searchable or accessible. It is estimated that 55 percent of data stored by organizations is dark data. Yet around 90 percent of global business and IT executives and managers agree that every organization will need to extract value from this unstructured data to be successful in the future.

In the context of diversified chemistry R&D, this could include data from lab notebooks, LIMS, experimental reports, literature references, and more that are not incorporated into searchable databases. This data can be valuable for identifying novel materials, improving existing formulations, and reducing R&D cycle times.

To unlock the value of dark data, diversified chemistry organizations need to identify where their most valuable data is hiding and implement effective knowledge management strategies that enable them to access, collect, organize, and analyze this data as needed.

Uncovering the hidden gems: Identifying the most valuable chemical R&D data

Dark data could be hiding throughout the chemistry R&D workflow. From early-stage research to manufacturing, formulations, characterization, and even post-market surveillance, valuable data is generated and collected, but it may not be utilized to its full potential. To unlock the value of dark data and accelerate innovation, it's crucial for R&D organizations to identify where this data is hiding and develop strategies to access and utilize it effectively.

Several types of dark data are valuable for research. For example, historical experimental data is often scattered, incomplete, or unstructured, but it can provide valuable insights into current and future projects with some organization and analysis. Looking beyond the organization's own R&D efforts, external data sources like academic papers, patents, and industry reports can also offer valuable insights and identify new opportunities for innovation and research. Lastly, unstructured data, like text data from scientific articles or laboratory notes, can hold hidden insights but requires the right tools and techniques to analyze effectively.

Organizations can identify and access this hidden data with the following steps as their workflows require:

Conducting a thorough inventory of available data sources, both internal and external, structured and unstructured, is key.
Prioritizing data sources based on their potential value to current and future R&D efforts can help organizations make the most of their resources. For instance, if you are planning to scale up a newly validated functional material, you may want to prioritize access to historical formulations and manufacturing data to help predict the ideal conditions.
Fostering a culture of data-driven decision-making and continuous improvement can help innovative chemistry organizations realize the full potential of dark data.

Five critical knowledge management strategies for unlocking dark data

Custom-curated datasets, semantic frameworks, automated data mining, and collaborative workflows are critical knowledge management strategies for unlocking the value of dark data and driving innovation. Here's a closer look at how these strategies can help:

Custom curation
Custom curation involves the manual curation of chemical data by domain experts to create high-quality data sets that are specific to the needs of the organization. Using custom curation, scientists in functional materials, cosmetics, agriculture, or other DivChem fields can ensure that the data they are working with is accurate, up-to-date, and relevant to their research goals. By working with expert data curators, organizations can also connect information internally and to the world’s science, making their internal data more robust. You can take this even further to empower AI-based digital transformation initiatives by getting custom-curated datasets specially designed for machine learning models.

Download this case study to learn how curated training sets improved an AI’s model prediction accuracy and transferability.
Semantic frameworks
Semantic frameworks are standardized approaches for organizing and classifying concepts and relationships in a specific domain, such as functional materials. These frameworks may include elements of specialized lexicons, ontologies, and taxonomies, and are designed to provide a common language and understanding of chemical data across an organization. This approach can help speed up R&D and enable scientists to make more informed decisions.

For example, a researcher is trying to identify a novel material for use in a new electronic device. To do so, they might start by using specialized lexicons, ontologies, and taxonomies to categorize and organize the properties and characteristics of known materials. They could use a specialized taxonomy to categorize materials by their electrical conductivity, optical properties, or thermal stability. By organizing materials in this way, the chemist can more easily identify gaps in knowledge or areas where new materials might be needed. They could also use ontologies to define the relationships between different properties of materials, such as the one between a material's structure and its electronic properties. This can help the chemist make more informed decisions about which materials to investigate further.

Download this case study to discover how stored proprietary knowledge can reveal insights and drive data-driven decisions.
Automated data mining
Automated data mining techniques enable R&D organizations to uncover hidden patterns and insights in large volumes of unstructured chemical data. Machine learning and advanced analytics can analyze chemical data from previous experiments, manufacturing conditions, scientific papers, patents, and other sources to identify relationships between chemicals, reactions, and formulations. These insights can lead to the discovery of new opportunities for R&D and provide insights into existing products and processes.

For example, a researcher could scan thousands of articles related to their research area and extract key information such as material properties, synthesis methods, and performance metrics. Once this information is extracted, the researcher can use machine learning algorithms to analyze the data and identify patterns or correlations that could lead to the discovery of a novel material. The researcher may discover that certain synthesis methods or scale-up conditions consistently produce materials with desirable properties or that materials with certain structural characteristics tend to perform well in specific applications.
Collaboration tools
Tools and technology for collaboration, such as centralized databases and integrated LIMS systems, offer an efficient and reliable way for R&D teams to share knowledge and insights and break down data silos. By providing access to a centralized data repository, R&D organizations can improve communication and accelerate innovation. A centralized, cloud-based database can also improve knowledge sharing between remote teams and researchers who may be geographically dispersed.

Modern digital ecosystems also facilitate knowledge transfer between two organizations. This is especially valuable for joint projects between academia and industry and during M&As (Mergers and Acquisitions) where researchers need to share knowledge of a material’s characteristics or performance data based on prior research. With a digital R&D ecosystem that fosters collaboration, organizations can better identify potential opportunities for innovation.

By leveraging dark data and implementing effective knowledge management strategies, chemistry organizations can accelerate innovation and improve R&D outcomes. They can reduce cycle times, identify new research opportunities, improve product formulations, and make more informed decisions about which research projects to pursue.

Download this case study to learn how Toray Industries eliminates data silos and better incorporates data into workflows.
Partner with an expert to put knowledge management strategies into action
The complexities of scientific information throughout the entire chemical R&D workflow make it challenging for any in-house IT team to tame. An outside partner can help you build solutions for storing and connecting existing data in a structured format, allowing all employees to access valuable R&D data straightforwardly and efficiently. The experience of an outside partner can be invaluable. Their insights into best practices and expertise in knowledge management can help ensure your efforts are successful.