Molecular design in drug discovery: a comprehensive review of deep generative models Briefings in Bioinformatics

Table Of Content

A pharmacophore-guided deep learning approach for bioactive molecular generation
Macrocyclization of linear molecules by deep learning to facilitate macrocyclic drug candidates discovery
About Nature Portfolio
‘Designer molecules’ could create tailor-made quantum devices
Performance of deep neural networks
Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR
Professional development

And these advances of molecular generation also herald a promising future of related problems such as retrosynthesis. With the development of friendly and easy-to-use automat tools, collaborative work of chemists and computer technicians will promote drug discovery further in the future. Overall, based on the generation process, the existing graph-based models can roughly classify into two types, one is the sequential iterative process, the other is one-shot generation. Specifically, they can divide into atom-by-atom, subgraph-based (fragment) models. In order to reduce the number of predicting the edge and train under the possible node permutation, some models such as RationaleRL, MolecularRNN was adopted in a BFS manner.

A pharmacophore-guided deep learning approach for bioactive molecular generation

Design, synthesis, and evaluation of PD-1/PD-L1 small-molecule inhibitors bearing a rigid indane scaffold - ScienceDirect.com

Design, synthesis, and evaluation of PD-1/PD-L1 small-molecule inhibitors bearing a rigid indane scaffold.

Posted: Sat, 05 Aug 2023 07:00:00 GMT [source]

In the case of reconstructability of RNN decoding, input descriptor was evaluated by trying to retrieve the molecules that was represented by them. By identifying the sampled canocnical SMILES string in 10,000 generated strings given seed molecules from the test dataset, almost 62.4% of the consisted of strings with the same canonical form as the molecule behind the seeding ECFP. Deep molecular generative models based on graphs have been a hot trend in the graph research with a prospect for drug discovery.

Macrocyclization of linear molecules by deep learning to facilitate macrocyclic drug candidates discovery

The constructed energy-based model uses the generated molecular descriptors f and the molecular property range y as the input data. This energy-based model is trained with a set of molecule–property pairs by drawing samples from a quantum annealer to estimate the gradients required for parameter update rules. Upon training, the constructed energy-based model learns the probability distribution \(p(y|f)\), as shown in Fig. The conditional energy-based model also utilizes latent variable representations h that can be considered as the compressed chemical space spanned by the molecules and their properties.

About Nature Portfolio

The appearance of such techniques opened the door to the computer-aided drug design. The major challenges here were how to recognize and store molecules accurately by computers and be acceptable for chemists. A flurry of molecular representations have been designed in the last few years owing to the rapid development of computers [24]. Here we introduce two common representations used in de novo molecular design, including SMILES and graphs. Deep learning, whose prototype was the perceptron known as neural networks for pattern recognition [7], aimed at learning the latent distribution and representation of data.

A trick here was using Gaussian process to reach the points with target attributes. The contribution could be described as a new method for exploring the molecular space in which no prior knowledge was required to manually construct a compound library. Simultaneously, the model captured the character characteristics of the molecule and showed good predictive ability. We take the view that not all the generated one in this model can be converted back to original space owing to the non-uniqueness of SMILES. For this situation, one approach is to give the model explicit restrictions about how to produce valid molecules. For instance, GVAE [49] incorporated the grammar production rules of SMILES into models.

Deep learning workflow for the inverse design of molecules with specific optoelectronic properties

The evolutionary design framework where a genetic algorithm (GA) finds the design route towards the target under the guidance of deep learning models is illustrated in Fig. This approach automatically optimizes the structure of seed molecules via the collaborative work of an encoding function e(∙), decoding function d(∙), and property prediction function f(∙). 1a, the encoding function e(∙) transforms the molecular structure m, which exists in a canonical simplified molecular input line entry system (SMILES)33 string format, into the corresponding extended-connectivity fingerprint (ECFP)34 vector x. Then, the decoding function d(∙) converts the bit-string ECFP vector x into the SMILES string m to enable it to be recognized as a real chemical structure.

Performance of deep neural networks

At first, we assess the performance of current state-of-the-art artificial intelligence (AI)-guided molecular design tools, mainly focusing on small molecule for therapeutic design and discovery. We start with an extensive discussion of popular molecular representation with various formulation and data generation tools used in advanced ML and deep learning (DL) models. We also benchmark the physics informed predictive ML by comparing various property predictions, which is critical for small-molecule design. In the end, we highlighted the cutting edge AI tools to utilize these ML models for inverse design with desired properties.

Accelerating how new drugs are made with machine learning - University of Cambridge news

Accelerating how new drugs are made with machine learning.

Posted: Mon, 15 Jan 2024 08:00:00 GMT [source]

In their models, information is propagated back and forth in the molecules in the form of waves, making it possible to pass the information locally while simultaneously traveling the entire molecule in a single pass. With the unprecedented success of learned molecular representations for predictive modeling, they are also adopted with success for generative models [57,69]. Distributions of the molecular properties of molecules generated with various molecular design frameworks, including the proposed QC-based technique, conditional variational autoencoder (CVAE), masked graph model (MGM), and graph-based genetic algorithm (GBGA). The molecules are generated with these frameworks for different property targets for QED (a–e) and LogP (f–j), as shown in the figure.

The performance of SchNet is further improved by Jørgensen et al. [80] by making edge features inclusive of the atom receiving the message. Moreover, the high-impact materials of today come from exploring only a fraction of the known chemical space. Larger portions of the chemical space are still uncovered, and it is expected to contain exotic materials with the potential to bring unprecedented advances to state-of-the-art technologies. Exploring such a large space with conventional experiments will take time and a lot of resources [4,5,6,7]. In this scenario, complete automation of laboratories is long overdue and has been used with limited success in the past [8,9,10,11,12]. Automating the computational design of small molecules that integrates physics-based simulations and optimization with ML approaches is a feasible and efficient alternative instead; it significantly contributes in expediting autonomous molecular design.

Deep learning approaches can also be tailored for the conditional generation of molecules that satisfy the target property requirements27. Despite their potential, the dependence of deep neural networks on large amounts of diverse training data comprising molecules with properties spanning the chemical space may cast doubt on their generative abilities in the presence of out-of-domain uncertainty. In this paper, we propose a hybrid quantum-classical computational framework for molecular design that utilizes QC-based learning and optimization strategies to efficiently navigate the chemical space for targeted molecular generation. We construct an energy-based deep learning model that can be trained with QC-assisted generative training to extract latent representations from molecular graphs for property prediction. A QC-assisted optimization technique is further presented that efficiently traverses through the chemical space to identify molecules with desired properties by inverting the structure–activity relationship captured with the energy-based model.

VAE models with no extra constraint have a high probability to induce invalid molecules. However, language models extract the information automatically at grammar and semantic levels. RNNs are connected models which are able to capture the dynamics of sequences via cycled units in the network of nodes. Consequently, the models can easily process the input and output that consists of sequences.

The research was conducted as part of the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium between MIT and eight pharmaceutical companies, announced in May. The consortium identified lead optimization as one key challenge in drug discovery. Chemist Corwin Hansch, who pioneered the field of relating a molecule’s chemical structure to its biological activity, an approach widely used in developing new drugs and other commercial chemicals, died in Claremont on May 8. In Freedman and colleagues’ qubits, a single chromium ion, an electrically charged atom, sits at the center of the molecule. The qubit’s value is represented by that chromium ion’s electronic spin, a measure of the angular momentum of its electrons. Additional groups of atoms are attached to the chromium; by swapping out some of the atoms in those groups, the researchers can change the qubit’s properties to alter how it functions.

MolGrow recursively splitted a node into two from a single one to generate molecular structures, which could be regarded as plug-and-play modules. And the model achieved better performance while learning on a fixed atom ordering. In this review, we mainly focus on deep generative models for molecular generation in drug discovery. We first introduce the representation methods of molecules and conclude the prevalent databases. As for generative models, we emphasize the recent advances based on different representations in the de novo molecular design domain. The objective evaluation and the comparison of state-of-the-art models facilitate the selection and improvement for readers.

Powerful deep learning techniques have driven the development of generative models. After training on realistic data, generative models are able to produce new synthetic data that are similar to given samples. A central question in deep generative models to be solved is how to capture the unknown data distribution and reveal the internal hidden structures. One of the ways is to learn the data representations which can be easily modeled [45]. In the field of de novo molecular design, a good representation also is capable of being converted back into molecules readily.

In terms of the simple characteristics of SMILES, it has been proven easier to learn for deep learning. And the sequence-based methods can be further divided into variational encoder (VAE) [46], generative adversarial networks (GANs) [47] and recurrent neural networks (RNNs) [48] based models. The holy grail of materials science is de novo molecular design, meaning engineering molecules with desired characteristics. The introduction of generative deep learning has greatly advanced efforts in this direction, yet molecular discovery remains challenging and often inefficient.

You Can Thank Us Later - 3 Reasons To Stop Thinking About DESIGN

Tuesday, April 30, 2024