Scientists at Merck published an article about using ChromGenius to predict retention times of new compounds.

Abstract

We introduce a new workflow that relies heavily on chemical quantitative structure-retention relationship (QSRR) models to accelerate method development for micro/mini-scale high-throughput purification (HTP). This provides faster access to new active pharmaceutical ingredients (APIs) through high-throughput experimentation (HTE). By comparing fingerprint structural similarity (e.g., Tanimoto index) with small training data sets containing a few hundred diverse small molecule antagonists of a lipid metabolizing enzyme, we can predict retention time (RT) of new compounds. Machine learning (ML) helps to identify optimal separation conditions for purification without performing the traditional crude QC step involving ultrahigh performance liquid chromatography (UHPLC) analyses of each compound. This green-chemistry approach with the use of predictive tools reduces cost and significantly shortens the design-make-test (DMT) cycle of new drugs by way of HTE.

ChromGenius uses a structure similarity search to select the compounds that are most similar to the compound under consideration. The more similar the structure, the more similar the retention mechanism should be. The retention model for the compound under consideration is based on the retention times and physical properties of the most similar compounds in the ChromGenius training database. In our case, we selected the best 50 records in the training set for each test set compound in predicting RT based on the Tanimoto similarity coefficient.