Computer Assisted Structure Elucidation of Two Polyenol Natural Products (Adenosine and Uridine) Using Structure Elucidator Suite

November 5, 2024

by Mikhail Elyashberg, Leading Researcher, ACD/Labs

Two Polyenol Natural Products

In 2023, an article entitled “Structural reassignment of two polyenol natural products” [1] was published in the Eur. J. Org. Chem. The article attracted great interest from readers. Over the course of a year, this work has received more than 2,000 views. Both structures of natural products revised in this article were similar, and the reasons for the incorrect structures turned out to be the same in both cases. The Structure Elucidator Suite expert system was used to verify and subsequently revise these structures. Therefore, we decided to talk about how both structures which were initially assigned to the polyenol class were reassigned.

In 2016 Ma et al. [2] reported the isolation of (5S,6R,7S,8R)-5-amino-(2Z,4Z)-1,2,3-trihydroxybuta-2,4-dienyloxy-pentane-6,7,8,9-tetraol (1) from the southeast Asian spice Murraya koenigii (L.).

More recently Siebatcheu et al. [3] isolated 1, together with a related isomer, (Z)-5-amino-5-(1,1,2-trihydroxybuta-1,3-dienyloxy)pentane-6,7,8,9-tetraol (2), from an endophytic fungus, Trichoderma erinaceum.

These purported natural products were elucidated based on extensive spectroscopic methods (e.g., NMR, MS, UV, IR, and CD), and were isolated as stable solid substances (i.e., colorless and white powders, respectively). From a structural perspective, however, both 1 and 2 contain multiple enol moieties, that would under normal chemical circumstances be expected to exist in the respective keto (carbonyl) tautomeric forms (Scheme 1).

**Scheme 1.** A) and B) Suspected misassigned natural products 1 and 2 highlighted in red, and associated keto tautomers (3, 4, and 7); hemiaminal ether (highlighted in green) degradation products 5 and 6.

For example, 1 would be better represented as either 3 or 4 (and 2 as 7), and in both cases the hemiaminal ether moiety would also be considered quite sensitive to even mild acid (Scheme 1). The latter would undergo hydrolysis to give degradation products 5 and 6 (Scheme 1). Furthermore, stable enols are rare and exist only when stabilized in some form, while stable ene-diols are even rarer with ascorbic acid (vitamin C) being one of a very few. Therefore, based on these chemical principles it was suspected that a potential structure misassignment had occurred in both cases (i.e., 1 and 2).

Revision of Structure 1

The proposed structure 1 (C₉H₁₇NO₈) was entered into Structure Elucidator Suite and ¹³C chemical shift prediction was performed using the empirical methods implemented in the expert system (see Figure 1).

**Figure 1**. Structure 1 for which ¹³C chemical shift prediction was carried out using the HOSE code-based method, the neural networks, and the incremental approach. Average deviations of ¹³C chemical shifts determined by these methods are denoted as *d_A*, *d_N*, and *d_I* correspondingly. Each atom is colored to mark a difference between its experimental and calculated ¹³C chemical shifts. The green color represents a difference between 0 to 3 ppm, yellow was >3 to 15 ppm and red >15 ppm.

Given that acceptable deviations for correct structures are usually in the range of 1-2.5 ppm, Figure 1 unambiguously shows that structure 1 is incorrect. Therefore, ¹³C, ¹H, HSQC, and HMBC data presented in the work of Ma et al [2] were entered into the program (Table 1), and a Molecular Connectivity Diagram (MCD) was generated (Figure 2).

Table 1. NMR spectroscopic data of compound 1 [1]

Label	dC	dC calc (HOSE)	XHn	dH	M (H)	H to C HMBC
C 1	153.5	127.05	CH	8.19	s	C 3, C 2
C 2	157.6	133.63	C
C 3	150	141.82	C
C 4	142	132.49	CH	8.32	s	C 3
C 5	91.3	81.46	CH	5.97	d	C 6, C 4
C 6	75.5	72.92	CH	4.75	t	C 8, C 5
C 7	72.7	75.64	CH	4.34	u	C 9, C 8, C 5
C 8	88.2	71.75	CH	4.18	u	C 7
C 9	63.5	64.01	CH2	3.92	u	C 7, C 8
C 9	63.5	64.01	CH2	3.77	u

**Figure 2.** Molecular connectivity diagram (MCD) for **C₉H₁₇NO₈**. Hybridizations of carbon atoms are marked by corresponding colors: *sp²* – violet, *sp³* – blue. Labels “ob” and “fb” are set by the program to carbon atoms for which neighboring with heteroatom is either obligatory (ob) or forbidden (fb). HMBC connectivities are marked by green arrows.

Structure generation was performed from the MCD with the following results: k = 25, t_g = 1 s, where k is the number of structures, t_g – processor time. ¹³C chemical shift prediction was carried out for the output file, and structures were ranked in increasing order of d_Adeviations. The top six structures of the ranked structural file are shown in Figure 3.

**Figure 3.** The six top-ranked structures of the output file obtained as a result of structure generation from MCD, Figure 2.

It turned out that proposed structure 1 was placed in the first position with very large deviations. This means that no structure characterized by average deviations of acceptable values can be generated from the initial data. Therefore, structure 1 should be revised.

The reassessment of 1 was initially approached by inspecting the non-controversial moiety of the molecule, which in this case was the carbohydrate fragment (right-hand fragment) concerning carbons C6-C9. All four carbon chemical shift values were in the range (δ_c 75.5, 72.7, 88.2, 63.5 ppm) matching that expected for hydroxylated sp³ carbon atoms. The hemiaminal ether at C5 would also be expected to resonate in the recorded region (i.e., δ_c 91.3 ppm) and this was further supported by the observed HSQC correlations.

Inspection of the ¹³C NMR spectrum of compound 1 revealed an additional resonance at ~121 ppm. Therefore, attention was then focused on the reported molecular formula, i.e., C₉H₁₇NO₈ (m/z 290.0853 [M+Na]⁺, calcd. for 290.0846), which contained an unusually high number of hydrogen atoms. The observed value of m/z 290.0853 was also in agreement with a molecular formula consistent with C₁₀H₁₃N₅O₄ (calcd. for m/z 290.0860[M+Na]⁺), which requires an additional carbon atom. Considering that the new molecular formula also indicated the presence of four additional nitrogen atoms, and that purine bases are often associated with carbohydrate moieties, it was conceivable that a nucleoside had been isolated.

A new MCD was created from the spectroscopic data (Table 1) and the molecular formula C₁₀H₁₃N₅O₄(Figure 4), and structure generation was repeated, which gave the following results: k = 2,160,807 → (Filter) → 2151 → (duplicate removal) → 1978, t_g = 43 m.

**Figure 4.** Molecular connectivity diagram for the molecular formula **C₁₀H₁₃N₅O₄**.

The six top-ranked structures ranked in increasing order of average deviations are presented in Figure 5.

**Figure 5.** The six top-ranked structures of the output file obtained as a result of structure generation from MCD, Figure 4.

It turned out that the best structure in Figure 5 coincided with the structural formula of a known compound, adenosine. This fact was confirmed from the calculation of the DP4 probabilities for the set of the three top-ranked structures (Figure 6).

**Figure 6.** DP4 probabilities calculated for structures #1 – #3.

DU8ML [4], the DFT-empowered NMR chemical shift and coupling constant prediction tool enabled with machine learning capability, also confirmed the hypothesis of adenosine, which was subsequently proven beyond doubt through direct ¹H and ¹³C NMR comparison to commercial material.

Revision of Structure 2

Having deduced the presence of nucleosides, the focus was turned to the assessment of 2. Evaluation of the HRMS-ESI spectrum for 2 revealed two peaks [2]. The major closely matched the molecular formula previously seen for adenosine [i.e., C₁₀H₁₃N₅O₄ (m/z 268.1041 [M+H]⁺, calcd. for 268.1046)], and the minor was observed at m/z 245.0769. Therefore, assuming the minor ion was also a [M+1] peak, a molecular formula of C₉H₁₃N₂O₆ (calcd. for 245.0774) could be generated.

The steps of structure 2 revision were similar to those which were described above. With this in mind, we will present only the figures reflecting the data which were processed or obtained, with short explanations. In the case of 2, Structure Elucidator again demonstrated that this structure was incorrect (see Figure 7).

**Figure 7.** Structure 2 (**C₉H₁₇NO₈**) with predicted ¹³C chemical shifts. See the legend for Figure 1.

In the work [3], COSY and main HMBC correlations were presented graphically:

**Figure 8.** COSY and main HMBC correlations.

¹³C, ¹H, HSQC, and HMBC data presented in Figure 8 were entered into the program, and a Molecular Connectivity Diagram (MCD) was created (Figure 9).

**Figure 9.** Molecular connectivity diagram (MCD) for **C₉H₁₇NO₈**. Hybridizations of carbon atoms are marked by corresponding colors: *sp²* – violet, *sp³* – blue, *not s*p – light blue. Labels “ob” and “fb” are set by the program to carbon atoms for which neighboring with heteroatom either obligatory (ob) or forbidden (fb). HMBC connectivities are marked by green arrows, while COSY connectivities – by blue arrows. A connectivity of nonstandard length is marked by violet.

Results of structure generation: k= 75 → (removal of duplicates) → 58, t_g = 2.5 s, where k is number of structures, t_g – processing time. ¹³C chemical shift prediction was performed for the output file, and the structures were ranked in increasing order of d_Adeviations. The top six structures of the ranked structural file are shown in Figure 10. It turned out that proposed structure 2 was placed in the fourth position by the ranking procedure, while the “best” structure is characterized by very large values of average and maximum deviations.

**Figure 10.** The six top-ranked structures of the output file (**C₉H₁₇NO₈**).

The next actions were similar to those performed in the previous case. Therefore, we will present the corresponding pictures assuming that their meaning will be clear.

**Figure 11.** Molecular connectivity diagram for the revised molecular formula **C₉H₁₃N₂O₆**.

Results of structure generation: k = 1300 → (Filter) → 402 → (duplicate removal) → 402, t_g =9 s. The six top-ranked structures are shown in Figure 12.

**Figure 12.** The six top-ranked structures (**C₉H₁₃N₂O₆**).

**Figure 13**. DP4 probabilities calculated for structures #1 – #6.

As in the previous case, the structure of uridine was confirmed by DU8ML calculations, as well as by comparison of experimental ¹H spectrum with the literature.

Thus, structures 1 and 2 were revised:

Several methodological conclusions can be drawn from the examples considered:

First and foremost, chemical knowledge must be used to assess the stability of each proposed chemical structure.
All possible variants of the molecular formula that follow from the mass spectrum should be carefully checked.
It is necessary to perform predictions of ¹³C chemical shifts for the proposed structures, which is easily achieved with the help of fast empirical methods. This would make it possible in both considered cases to immediately establish that the structures are wrong.
It is necessary to extract information about functional groups from the infrared spectrum as much as possible. Obviously, the infrared spectrum of compound 2 (if registered) would show the presence of a carbonyl group in the molecule.

References

G. Kutateladze, R. W. Bates, M. E. Elyashberg, C. M. Williams. (2023). Structural reassignment of two polyenol natural products. Eur. J. Org. Chem., 26, e202201316
Q.-G. Ma, K. Xu, Z.-P. Sang, R.-R. Wei, W.-M. Liu, Y.-L. Su, J.-B. Yang, A.-G. Wang, T.-F. Ji, L.-J. Li. (2016). Alkenes with antioxidative activities from Murraya koenigii (L.) Spreng. Bioorg. Med. Chem. Lett., 26, 799.
C. Siebatcheu, D. Wetadieu, O. Y. Youassi, M. A. B. Boat, K. G. Bedane, N. S. Tchameni, M. L. Sameza. (2022). Secondary metabolites from an endophytic fungus Trichoderma erinaceum with antimicrobial activity towards Pythium ultimum. Nat. Prod. Res. 37(4), 657-662.
M. Novitskiy, A. G. Kutateladze. (2022). DU8ML: Machine learning-augmented Density Functional Theory nuclear magnetic resonance computations for high-throughput in silico solution structure validation and revision of complex alkaloids. J. Org. Chem., 87, 4818.

About the Author

Mikhail Elyashberg

Leading Researcher, ACD/Labs

4 Replies to “Computer Assisted Structure Elucidation of Two Polyenol Natural Products (Adenosine and Uridine) Using Structure Elucidator Suite”

Luis Arcangel Rivera Montalvo says:

November 25, 2024 at 9:31 am

Dear Sir or Nadam: Please provide me more information about interpreting MS-MS spectra to arrive at tentative candidate structures for my sample. Thanks in advance,
Dr. Luis A. Rivera Montalvo, University of Puerto Rico ay Mayaguez, Department of Chemistry, PO Box 9000 Puerto Rico 00681

Mikhail Elyashberg says:

December 2, 2024 at 6:33 pm

Dear Dr. Montalvo,

The Structure Elucidator Suite uses the molecular formula and 1D and 2D NMR spectra for structure elucidation. HRMS is utilized only for the molecular formula determination. The application of MS-MS is not provided in our software.

Therefore, unfortunately I am unable to help you.

Kind regards,
Mikhail Elyashberg

Luis Arcangel Rivera Montalvo says:

December 6, 2024 at 12:06 pm

December 6, 2024.

Dear Sir or Madam:

Please provide a quote for “Structure Elucidation Suite”

Thanks in advance.

1. Sherry Myles says:
  
  December 6, 2024 at 2:58 pm
  
  Hello Luis,
  
  I have requested for an Account Manager to reach out to you with more information about pricing for Structure Elucidator Suite.
  
  Thank you,
  Sherry

Two Polyenol Natural Products (Adenosine and Uridine) Structure Elucidation