Hexagon shaped overlay

Predictive chemical deformulation flips the formulation challenge

Adam Sanford
Hexagon shaped overlay

Predictive chemical deformulation flips the formulation challenge

A variety of cosmetic dropper bottles lie scattered on a light background. The bottles are in different shades, including clear, amber, yellow, pink, and green, each with a white, black, gold, or rose gold dropper top. The arrangement is casual and visually appealing.

Deformulation is the process of determining the exact composition of known products. Starting from known relative proportions of ingredients, precise amounts of each ingredient are determined. Deformulation is also known as chemical reverse engineering.

Chemical product deformulation enables organizations to:

  • Extrapolate new recipes from existing formulations.
  • Improve competitive intelligence.
  • Benchmark competitive products.
  • Identify counterfeits.
  • Develop private-label products.

While researchers have turned to machine learning for the discovery and optimization of chemicals and materials, deformulation is typically performed experimentally with the help of analytical chemistry methods. The relatively limited amount of structured data available for chemical formulations hinders many AI-driven deformulation efforts. Much of the widely available formulation data is incomplete and inconsistent in its records of ingredients and their amounts.

Training predictive models to enable rapid, data-driven suggestions of formulation recipes

The Industrial Engineering Chemistry Research publication, Toward Predictive Chemical Deformulation Enabled by Deep Generative Neural Networks, shows that it is possible to train unsupervised generative models, variational autoencoders (VAEs), to enable rapid data-driven suggestions of formulation recipes.

A VAE neural network trained with CAS scientist-curated formulation data learns meaningful representations of formulations in various product classes such as antiperspirants and oral care that performed better on average than more conventional approaches. The article states that this approach "produces estimates that are significantly more accurate than nearest neighbor methods, extrapolates better to formulations that are significantly different than previously seen formulations, and provides a way to leverage large datasets for industrially relevant capabilities."  

The curated formulations in the CAS Content Collection™ offer consistent and highly structured representations of the formulations and chemical identities of their components. Due to unique curation processes that utilize both specialized technologies and scientific expertise, CAS can consistently identify each formulation's chemical components, their groupings, and their amounts. The authors report that “without the CAS dataset, the practical validation of these generative methods for deformulation applications would not have been possible.”

Explore these findings in the full publication, "Toward Predictive Chemical Deformulation Enabled by Deep Generative Neural Networks".

Interested in achieving more accurate deformulation predictions? CAS Custom Services tailors our specialized technologies, scientific expertise, and unparalleled content to meet your unique needs.  

Read the full journal publication

Gain new perspectives for faster progress directly to your inbox.

Scroll to top