September 11, 2007
by Ryan Sasaki, NMR Product Manager, ACD/Labs
Those of you who have been following this blog through it’s first few month will no doubt have seen a few posts about the comparisons between ACD/Labs, Modgraph, CSEARCH, and the NMRShift DB. For those of you who have not followed this story but are intrigued, you have a lot of reading to do, starting here:
http://acdlabs.typepad.com/my_weblog/2007/05/nmrshiftdb_acdl.html
If you go to Modgraph’s website for example, you will find the following web page:
http://www.modgraph.co.uk/product_nmr_shiftdb.htm
Here Modgraph presents results that show, most notably to this blogger:
Modgraph NMRPredict: 1.40 ppm overall average deviation
ACD/Labs CNMR Predictor: 1.59 ppm overall average deviation
These values correspond to the average deviation of predicted chemical shifts as compared to the experimental values (over 200,000) in the NMRShiftDB. Let me first state that the quoted average deviation for ACD/Labs CNMR Predictor is INCORRECT in this context. The number is correct in a different context. More on that in a moment.
In the meantime, there is one flaw in these numbers that I continue to object to, yet Modgraph continually chooses to pass over. It involves the following statement from their website:
For this evaluation the combined databases from CSEARCH and SPECINFO holding a total of 345,308 reference spectra were used. Based on this higher number of reference spectra a somewhat higher structural overlap between our databases and the NMRShiftDB-testdata has been detected. In order to compensate for this, we have recalculated our overall average deviation of 1.40 ppm using the lower structural overlap as detected by ACD. The value of 1.40 ppm corresponds to 92,927 known carbon environments and 121,209 unknown carbon environments – without this compensation our overall average deviation would be slightly better, but a comparison with ACD’s results would be impossible.
Am I the only one who thinks that the bolded part is rather inappropriate and unscientific?
We have acknowledged the overlap between our database and the NMRShiftDB to be 57%. Modgraph declines to share what their overlap is and admits only that it is higher. So in order to compensate for this they have used our lower structural overlap? So what structures and shifts are they removing from their databases to compensate for this overlap? How do they decide which ones get removed?
Irregardless of how this study was conducted, it should be known that the average deviation reported on their website of 1.59 ppm is incorrect in this context as a data point of comparison with Modgraph’s number. I have already addressed that number and written a post about it on this blog earlier.
The average deviation of ACD/Labs CNMR Predictor compared to the entire NMRShiftDB (with an overlap of 57% included) is actually 0.96 ppm.
Of course I will re-iterate here my thoughts that I shared in my original posting about this. These reported average deviations (0.96 ppm for ACD/Labs and 1.40 ppm for Modgraph) are likely not the best measure of how well NMR prediction software will perform for an end user. Why? Because as clearly stated, these numbers are based on an experimental dataset (the NMRShiftDB) that has a 57% overlap with the databases in the prediction engines.
So unless you, the end-user can expect a very significant overlap between your own compounds and the NMR predictors (57%) you shouldn’t expect this kind of accuracy. In reality, chances are you are working with novel chemistry and the majority of your compounds are completely novel and thus not represented in a prediction softwares database.
In my opinion, we have come up a best practice for NMR prediction validation. Not only do we provide an average deviation for the entire NMRShiftDB dataset (0.96 ppm and acknowledging a 57% overlap), but we also provide a a validation study on the completely novel chemical shifts. This study evaluated only the chemical shifts that are not present in our database (the other 43%).
Modgraph has yet to produce a study consistent with the second approach (that is, a study on completely novel chemical shift not represented in their database). In fairness to them, it can be argued that this is not a fair comparison either as we would likely be comparing two completely different data sets with completely different structures. Our overlap is not the same after all.
Nonetheless, ACD/Labs has provided the public with both numbers for their reference. The bottom line, is that Modgraph has chosen to publish some number that doesn’t have a clear explanation behind it. It neither represents an evaluation of the entire dataset, nor one without any overlap. Their reported number is somewhere in between and they have declined to offer details. If they produced an average deviation for the entire dataset (including all overlap) or the results of a dataset excluding any overlap, it would most definitely be different than 1.40 ppm value they have quoted. All we hear from them is that if they didn’t compensate for the difference in overlap between ACD/Labs and Modgraph, their results would be SLIGHTLY better.
That’s fine, and while I question their experimental method of removing chemical shifts to compensate for the difference between ACD/Labs and Modgraph’s overlap, this is the study they have chosen to go public with. So in the end the final results are:
Modgraph NMRPredict: 1.40 ppm overall average deviation
ACD/Labs CNMR Predictor: 0.96 ppm overall average deviation
Ryan..I just finished edits tonight on the final form of a manuscript submitted to JCIM …entitled “Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comparison of Neural Network and Least Squares Regression Based Approaches”. The abstract reads as: “The efficacy of neural network (NN) and partial least squares (PLS) methods is compared for the prediction of NMR chemical shifts for both 1H and 13C nuclei using very large databases containing millions of chemical shifts. The chemical structure description scheme used in this work is based on individual atoms rather than functional groups. The performances of each of the methods were optimized in a systematic manner described in this work. Both of the methods, least squares and neural network analysis, produce results of a very similar quality but the least squares algorithm is approximately 2-3 times faster. ” It might make interesting reading to your readers when it appears in JCIM shortly.