Current Software Versions

What's New in PhysChem Suite™ Version 2024

Version 2024 of PhysChem Suite introduces dramatic enhancements in the prediction accuracy for pK_a, logP, and logD, as well as related pH profile of aqueous solubility and ADME endpoints such as absorption, bioavailability, and blood-brain barrier penetration, as a result of thorough algorithm improvement paired with significant expansion of the training sets. Read below for details, and contact us for help upgrading your software.

Significant revisions and improvements to the pK_a, logP, and logD algorithms and equations with particular focus on complex heterocycles and novel therapeutic modalities (PROTACs)
Further significant enhancements to the pK_a Classic algorithm, building upon the previously released improvements, including the addition of more than 5000 experimental pK_a values from diverse chemical space, including aromatic and aliphatic heterocycles
LogP GALAS training set of experimental values has been increased by 25% (~4750 new experimental logP values) after careful curation, reinterpretation, and verification of new data. This adds exciting accuracy improvements for the varied and diverse chemical classes.
Technical improvements to the logD algorithm, in addition to enhanced accuracy prediction as a result of improved logP and pK_a predictors

You can expect greatly improved accuracy of pK_a calculations due to a significant expansion of the algorithm training sets, with several diverse chemical datasets. We have developed novel algorithms and deeply revisited models for specific classes. Data quality and chemical diversity guide our training efforts for each release. Through our own research, and ongoing partnership initiatives, our algorithm training expands covered chemical space, and allows us to enrich, curate, and prioritize historically available training data, for better predictive performance.

After the addition of new datasets, the full ACD/Labs pK_a training data set presently contains ~26,000 compounds with more than 42,000 experimental pK_a values. It should be noted that not all compounds and pK_a values are always used for algorithmic Hammet equations or pK_a0 values. However, with new datasets available, previously unused “old” data can be used for new algorithm improvements.

The improvements in the pK_a prediction accuracy using the historically renown BioByte Master file from v2022 to v2024 are shown in the table below:

	Number of predicted pK_a values	Average prediction accuracy
v2022	12,224	0.419
v2023	12,421	0.412
v2024	12,458	0.395

Improved pK_a Prediction Accuracy for Expanded Pharmaceutical Space

You can have greater trust in the accuracy and reliability of pK_a calculations from a significant expansion of the pK_a Classic algorithm training set. A brand-name pharmaceutical company has provided data for over 1100 compounds with more than 2200 pK_a values for this project. The dataset contained exciting modern compounds and members of chemical classes underrepresented in previous versions of our software. Collaborative efforts allowed us to improve prediction accuracy for this chemistry, which has resulted in overall improvement of pK_a predictions for current and future users of the software. The linear scatter plots below present the comparison of experimental versus predicted pK_a, before and after training (v2023 versus v2024) for the new data set provided by the pharmaceutical company. The average error of prediction is significantly lower in v2024 (0.27 log units vs. 0.76 log units in v2023, and R² is closer to 1 (0.98 vs. 0.86 in v2023).

Linear regression plots demonstrate the improved correlation between experimental and predicted pK_a (pK_a Classic algorithm) for 2200 ionization centers from v2023 to v2024.

Training of the algorithm resulted in significant improvements in prediction accuracy for these 1100 novel pharmaceutical compounds, with more ionization centres calculated using v2024 (2262 vs. 2248 in v2023). 84% of pK_a values for these compounds are now predicted within 0.5 log units (99% within 1 log unit). This compares to 47% within 0.5 log units (72% within 1 log unit) in v2023.

What's New PhysChem Suite - Improvement in prediction accuracy

Improvement in prediction accuracy (pK_aClassic algorithm) for 2200 ionization centers in 1100 pharmaceutical compounds in v2024.

Enhanced pK_a Prediction Accuracy for Diverse Chemical Compounds

You can now expect significant improvement in the prediction accuracy of pK_a for compounds belonging to diverse chemical classes, such as aromatic and aliphatic heterocycles which were previously underrepresented in our training sets. Working with publicly available data sources, and a government organization, we have augmented a training dataset using opensource data with 3150 curated ionization centers from 2500 chemical compounds.

What's New PhysChem Suite - comparison of experimental vs. predicted pKa v2024

The linear scatter plots below show the comparison of experimental vs. predicted pK_a, before and after training. The average error of prediction is significantly lower in v2024 (0.43 log units vs. 0.61 log units in v2023 and 0.88 in v2022), and R² is closer to 1 (0.98 versus 0.93 in v2023 and 0.83 in v2022).

Training of the algorithm resulted in significant improvements in prediction accuracy for these 2500 compounds, with more ionization centres calculated using v2024 (3158 vs. 3023 in v2023). 68% of pK_a values for these compounds are now predicted within 0.5 log units (91% within 1 log unit). This compares to 58% within 0.5 log units (81% within 1 log unit) in v2023, and to 48% within 0.5 log units (70% within 1 log unit) in v2022.

What's New PhysChem Suite - Improvement in prediction accuracy

Improvement in prediction accuracy for 3150 ionization centers in 2500 compounds in v2024.

Enhanced Prediction for Novel Therapeutic Modalities of High Interest

You can now expect improved pK_a predictions for PROTACs (Proteolysis Targeting Chimeric Molecules). We have collected data on PROTACs and their precursors with measured experimental pK_a values that originated from both our industry collaborators and public sources.

PROTACs are composite molecules designed to attenuate function of specific proteins by binding to the target protein and inducing its degradation by intracellular ubiquitin-proteasome system. Structurally, PROTACs are heterobifunctional molecules, composed of two small molecular ligands connected by a covalent linker (one ligand is responsible for recognition of the target protein, another for the recruitment of ubiquitin E3 ligase, initiating the degradation pathway). Overall, this results in larger molecules that fall beyond the traditional Rule-of-five governed chemical space and therefore may prove challenging for various property prediction algorithms.

Two comparisons below present the results of the enhanced prediction between v2023 and v2024.

Publication Comparison

We have reviewed data from a 2022 publication, PROTACs bearing piperazine-containing linkers: what effect on their protonation state?, which contained 34 PROTACs bearing piperazine-containing linkers, as well as their precursors, with 49 experimental pK_a values.

The linear scatter plots below show the comparison of experimental versus predicted pK_a of this data, for version 2023 and the new version 2024. The average error of prediction is lower in v2024 (0.30 log units vs. 0.42 log units in v2023).

What's New PhysChem Suite - improved correlation between experimental and predicted pKa v2024

Linear regression plots demonstrate the improved correlation between experimental and predicted pK_a values for 49 ionization centres of PROTACs reviewed in the publication

Examples of PROTACs and the comparable predicted pK_a values from v2024


Exp. pK_a	Pred. pK_a v2024	Exp. pK_a	Pred. pK_a v2024
2.74	2.90	4.69	3.85
6.27	6.02	7.98	7.82

Overall PROTAC Assessment

The linear scatter plots below show the comparison of experimental vs. predicted pK_a, for v2023 and v2024 for a dataset of 253 PROTAC molecules with 491 ionization centers. The average error of prediction is significantly lower in v2024 (0.28 log units vs. 0.52 log units in v2023).

What's New PhysChem Suite - improved correlation between experimental and predicted pKa v2023

Linear regression plots demonstrate the improved correlation between experimental and predicted pK_a for 491 ionization centers from v2023 to v2024.

Improved Accuracy of the LogP GALAS and Resulting LogP Consensus Algorithms

You can have greater trust in the calculations from a significant expansion of the logP GALAS algorithm training set. For the most part, the newly collected data represents logD_7.4 values that have been back-calculated to logP using the ACD/pK_a Classic algorithm (which features improvements to minimize uncertainty related to evaluation of the compound’s ionization state).

Improvements include:

Inclusion of ~4750 new experimental values
Increase in the number of entries to more than 22,000 (>25% increase in entries overall)
Addition of series of compounds with novel scaffolds and varying substituents, such as heterocycles and functional groups which are usually seen as challenging to source and predict
Improved prediction accuracy of high MW compounds, such as PROTACs

The improvements in calculation accuracy are available with both ACD/LogP GALAS and ACD/LogP Consensus algorithms utilizing the new LogP v. 1.4 built-in self-training library.

The linear regression scatter plots below show the comparison of experimental vs. predicted logP values for newly acquired compounds, before and after the training.

What's New PhysChem Suite - improved correlation between experimental and predicted logP v2023

Linear regression plots demonstrate the improved correlation between experimental and predicted logP values for ~4750 compounds from v2023 to v2024.

Examples of PROTACs and the comparable predicted logP values from v2024 and v2023.


Exp. LogP	Pred. v2023	Pred. v2024	Exp. LogP	Pred. v2023	Pred. v2024
4.23	5.68	4.44	4.21	2.81	3.87
Reliability index	0.44 (Borderline)	0.86 (High)	Reliability Index	0.49 (Borderline)	0.75 (High)

Predict LogD More Accurately

You can now expect greater accuracy for these calculations due to several technical improvements to the algorithm, including enhancements to the normalization procedure
- Inaccuracies between the simulated logD pH curves and the plot expected from the compound’s apparent pK_a profile have been resolved
Starting from the current release, the shape of the simulated logD pH curve will closely match the shape expected from the compound’s apparent pK_a values
You can have greater trust in the accuracy of logD predictions and water solubility pH profile due to improvements in the pK_a and logP algorithms

Updated Reporting Templates

You can generate compliance reports with confidence as the QPRF report templates have been updated to v2.0 in full accordance with QAF checklist.

Improved Batch Reporting

You can now automatically save all individual reports as a .zip file, preventing duplication of file names

Improved Export of Data

With the latest update you can now:

Export your data in the less limiting .xlsx file format
Export molecular structures as vector images which offers better scaling support
Include atom numbers with structure images, enabling you to easily analyze atom-specific properties

Convert Units in Calculations

You can now easily configure the software to convert certain units of calculated values to user-specific dimensions by default (for example converting concentration from molar to mass)

ACD/Labs’ development team is eager to collaborate with organizations to improve predictions for novel compounds. Do you have accurately measured experimental values for the predictions we support? Contact us to discuss how we may work together.

You Might Also Be Interested In...

ADME Suite

Version 2024 of ADME Suite introduces significant improvements to the P-gp Substrate specificity predictions including a new quantitative efflux model, expansion of the training libraries for P-gp Substrates and Protein Binding.

Tox Suite

Version 2024 of Tox Suite includes enhancements to the Ames Test and Aquatic Toxicity predictors, improving the accuracy and reliability of predicted data, among other features.

Want to learn more?

Read more about the full features of PhysChem Suite, or contact us for help upgrading your software.

Learn More About PhysChem Suite