Javascript must be enabled to continue!
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
View through CrossRef
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100 publications. The publications were categorized to one of the following areas of the pharmaceutical sciences: PK/PD, dose optimization, quantitative structure–activity relationship (QSAR); adverse drug event (ADE) prediction and drug–drug interactions(DDIs) and clinical trial simulation (CTS). Supervised classification, regression, or a combination of the two approaches are the most frequently used ML methods and QSAR emerged as the most frequent application area. Table 1 summarizes the pharmaceutical sciences applications and ML algorithms that have been investigated in the literature. As noted, previously several recent reviews have focused on identifying the most promising and appropriate application areas and ML tools. Mak et al. provided an accessible primer on ML concepts that included a case study that employed RFR for investigating the structure–activity relationships of inhibitors of the putrescine transporter of trypanosome parasites. Hutchinson et al. proposed an implementation framework with two hypothetical examples that utilized deep learning to perform global parameter sensitivity analysis and for combining PK parameters with imaging and omics data. Koch et al. used CART-based ML approaches for covariate selection in the context of a simulated data and clinically relevant example of phototherapy for bilirubinaemia in neonates.81 In a commentary, Chaturvedula et al. highlighted the use of genetic algorithms (GAs) for model selection in population modelling and deep learning for target identification in drug repurposing.
OBJECTIVE
Applying deep learning to mine the increasing datasets in drug discovery not only enables us to learn from the past but to predict future drug repurposing specially in solid dosage formulations. Deep learning was applied to predict successful pharmaceutical formulations by constructing regression models. One of the main difficulties in formulation prediction is the small dataset with imbalanced input space due to the limited experimental data. For better performance, the data splitting algorithm and the evaluation criteria suitable for pharmaceutical formulation data were introduced. The DNNs were trained on the data of two types of pharmaceutical dosage forms, including oral fast disintegrating films (OFDF) and oral sustained release matrix tablets (SRMT). Machine learning (ML) is enabling leap-step advances in several fields including drug discovery and materials science. Machine learning is one of the most exciting research areas in recent years. Machine learning can make data-driven predictions with existed oral formulation experimental data, which provides a great opportunity for efficient oral formulation development1–6. A well-designed machine learning method can greatly speed the development, optimize oral formulations, save the cost, keep products consistency, and accumulate and preserve the specific knowledge and expertise from the experts in a well-defined domain1. Table 1 summarizes recent development of machine learning in oral formulation application. An expert system (ES) is an intelligent program with the ability to accumulate and preserve the knowledge and experiences of the experts in a specific area (e.g., pharmaceutical formulations)12.
Machine Learning Tools Oral Formulation Design (Capsule/Tablet)
Hybrid expert system with ANNs Hard gelatin capsule formulations
Expert system (SeDeM Diagram) Orally disintegrating tablets
Expert system with ANNs Osmotic pump tablets
Ontology-based expert system Immediate release tablets
ME_expert 2.0 Microemulsions formulation
Fuzzy logic-based expert system Freeze-dried formulations
Cubist and Random Forest Cyclodextrin formulations
Table 1. Machine Learning Applications in Oral Formulation Design
Parenteral routes of administration are four routes of injectables, namely a. subcutaneous (mostly under the skin) to release into patient’s systemic circulation, b. intramuscular (mostly in a muscle) injected deep into muscle layers of rich blood supply, c. intravenous (in a vein) dose directly, rapidly into systemic circulation and d. intrathecal (around spinal cord), directly into vertebral column to bypass blood brain barrier. One critical aspect of parenteral formulation is the pH level. Drug formulations should have a target pH as close as possible to physiological pH. The acceptable range is pH 2–11 for IV and intramuscular injections, and pH 4-9 for subcutaneous injection due to potential irritation issues. Recently, long-acting injectable (LAIs) that are amenable to parenteral administration can confer several advantages over conventional drug formulations, including increased patient compliance and bioavailability of drug14. Moreover, LAIs can be engineered to provide either local (e.g., Zilretta®) or systemic (e.g., Lupron Depot®) drug exposure over a prolonged period, making them ideal formulation strategies for the treatment of chronic diseases15. Several strategies have been investigated to inform decision-making and expedite the drug formulation development process. For instance, mathematical models have been used to describe and greatly enhance our understanding of drug release mechanisms16. However, the application of these empirical models is limited to post hoc analysis of the in vitro drug release profiles of LAIs,and they do not offer information on in vitro drug release from LAIs a priori. More recently, molecular dynamics simulations have been investigated17. These techniques have been useful in quantifying links between drug release rates and formulation parameters (including particle size and drug loading levels)18.While the development of these techniques is an active area of research, molecular scale simulations of entire drug delivery systems are computationally intensive. These approaches can be used to confer useful information on potential LAI systems; however, they cannot currently be used in place of experimental drug release assays14. Several studies have also investigated machine learning (ML) approaches19.
METHODS
Both drug and drug product components were used for representing the properties of active pharmaceutical ingredient (API) and inactive ingredients (excipient). All drugs׳ name was described with the 4 molecular descriptors, including molecular weight, logP, pKa, and solubility. The excipient types were encoded to different numbers. The excipient performance and process parameters include %purity, %recovery, %in vitro and in vivo release of drugs at different timepoints and granulation process, with percentage of some excipients.
Data splitting strategy and hyper parameters of machine learning methods
A three-dataset (training/validation/test datasets) splitting strategy was used. The training set is for training models, and the validation set is for tuning hyper-parameters to find the best model. The accuracy of the test set shows the prediction ability on unknown data. This strategy is widely adopted in machine learning. For each dosage form, the pharmaceutical data were split into three subsets, both the validation set and the test set include 20 formulations, the rest of the data were used to train the models.
RESULTS
There are many reasons to classify drugs, ranging from understanding the usefulness of types of drugs to formulating treatment plans based on properties such as solubilty, hydrophobicity or log P and pKa for biological drugs. The quality of each drug will be characterized by several screening methods as shown below in Figure 4-6.
Regression and Deep learning are the type of representation learning with multiple levels of transform modules, which contains more parameters than other learning algorithms and requires more data for training. However, one of the main difficulties in pharmaceutical drug product prediction is the small dataset with imbalanced input space due to the limited experimental data. Each drug has only around 109 drug product compositions. There are 3 APIs in the subcutaneous injection dataset. And all he APIs include with 109 drug product compositions. Therefore, selecting representative datasets for training and test is very important for the drug product composition prediction. In our research, the specific evaluation criteria were introduced, and several data splitting methods were investigated. Moreover, deep learning was compared with other machine learning techniques for the drug product composition prediction( Zhang W et. al., Sedan G et.al.,Ouyang D et. al) [5-7]. Drug products are made up of different excipients involving injectable formulations. The experimental data were extracted from our published data in the patents and Web of Science. Three different searching terms of solubility in buffer, solvents and molecular weight were used for search literature about the development of Injectable formulations.
The molecular descriptors were used for representing the properties of APIs. All drugs' name were described with the three molecular descriptors, including molecular weight, logP, and pKa. The excipient types were encoded to different numbers. The process parameters include lyophilization process and other excipient characteristics with compositions.
A three-dataset (training/validation/test datasets) splitting strategy was used. The training set is for training models, and the validation set is for tuning hyper-parameters to find the best model. The accuracy of the test set shows the prediction ability on unknown data. This strategy is widely adopted in machine learning. For each dosage form, the pharmaceutical data were split into three subsets, both the validation set and the test set include 109 formulations, the rest of the data were used to train the models (Rowe R.C,et.al., Zhang ZH, et. al., Mendyk A et. al.) [6-8]
Three machine learning methods were introduced to construct regression models to compare with DNNs, including multiple linear regression (MLR),, random forest (RF) and k-nearest neighbors (k-NN). These regression models were trained using the scikit-learn package [9]. For excipients, the number of components was set to maximum of 3. In RF, the maximum depth of the tree was set to 3. In k-NN, the number of neighbors was set to 5. The 3 models were developed using the same hyperparameters. In RF, the maximum depth of the tree was set to 5. In k-NN, the number of neighbors was set to 3.
In machine learning, correlation coefficient and coefficient of determination are usually adopted as evaluation metrics for regression problems. Correlation coefficient indicates the linear relationship between two variables. The coefficient of determination shows the correlation between the predicted values and the real values. However, the correlation coefficient and the coefficient of determination cannot properly evaluate the performance of the pharmaceutical formulation prediction models. In pharmaceutics the good models for predicting drug dissolution profiles should have less than 10% error40. Thus, specific criteria suitable for pharmaceutics should be introduced to evaluate the model performance.
Following the FDA (the U.S. Food and Drug Administration) recommendation using the similarity factor f2 to evaluate the similarity of drug dissolution profiles40, the similarity factor f2 was introduced to evaluate the performance of the models for predicting the cumulative drug release curves. If the f2 is greater than or equal to 50, it is considered a successful prediction (Han X et.al.) [10]
The results are summarized as below in Figure 7-12. Multiple linear regression approaches were employed in an attempt to learn a linear combination of input features which could predict the output. Multiple linear regression is simple and easy to model. The weights and biases of multiple linear regression could be directly calculated by using the least squares method. Multiple linear regression have better interpretability than other nonlinear machine learning models because the weights could indicate the importance of the input features in the prediction. However, multiple linear regression and partial least squared regression could only fit the linear function mapping, while obviously the relationship between the formulation and the key in vitro characteristics is complex and non-linear.
For comparison, we also trained and evaluated a series of ML models without any initial drug release measurements included as input features (i.e., without the features T= 0.1, 0.2,0.3,0.4,0.5, and T= 0.6,0.7,0.8,0.9 and 1.0 (as shown correlation factor in Figure 9.). Models that include initial experimental measurements as input are referred to as few-shot models, and models without such inputs are referred to as zero-shot models. Few-shot models necessitate the measurement of the first few experimental points before the predictions can be performed; however, the result is often a more accurate model. In this study, it was found that the addition of initial drug release measurements (i.e., few-shot models) resulted in better performance than the corresponding models without these features.
Random forest is an ensemble learning method. Ensemble learning methods combining multiple base learners could obtain better generalization ability of models than a single base learner. Random forest usually shows better performance than other ensemble learning models in many learning tasks. In random forest, the diversity of the base learners is not only from the sample disturbance but also from the attribute disturbance, which makes the difference between the base learners increase and the generalization ability be further improved. Support vector machine maps the sample from the original space to the higher dimensional feature space, therefore, the sample can be divided in the higher dimensional feature space. However, the conventional machine learning methods highly rely on the feature extractors designed by the subjective expert experiences.
Furthermore, the representative abilities of artificial neural networks enhance with the increase of the hidden layers and hidden nodes. The larger the model capacity, the more complex function the model can achieve. Therefore, deep learning containing more hidden layers could make multiple abstractions and feature extractions, making deep learning be able to accomplish more complex tasks to higher accuracy than the shallow artificial neural networks.
In summary, cutting-edge ML technologies are now freely available to pharmaceutical and materialsscientists.The results obtained in this study demonstrate the potential for ML to expedite the development of innovative drug-delivery technologies. Among the strengths of modern ML techniques are their ability to provide insights into how models reach their predictions. Herein we demonstrate that ML models can not only be used to predict in vitro drug release from different excipients with a high degree of accuracy but also that interpretation of such models can be used to guide the design of new formulation candidates. In the current study, we found that for this dataset, the tree-based DNN and RF models provided the most accurate prediction of fractional drug release. Given the small size of the dataset (<2000 observations) and that most of the data points contain variables that are properties of the drug or excipients, it is perhaps not surprising that the neural network models investigated perform well. As the use of ML in drug formulation development increases, we anticipate that larger datasets will become available, leading to an increase in the utility of neural networks .
CONCLUSIONS
In this paper, deep learning models were successfully developed to predict drug product on 3 different injectable drugs with more than 100 excipient composition on small data. The good generalization performance of the models was demonstrated by the external datasets. The proposed models could effectively predict the key characteristics in regression problems than the models trained by other machine learning methods, because deep learning can find out the complex correlation in the data. Modern successful pharmaceutical development needs to incorporate quality by design (QbD) concepts throughout the drug development process. Machine learning methods could not only help to predict the in vivo and in vitro characteristics based on the drug product compositions and process data, but also assist in the pharmaceutical experimental design and help to control the product quality in the whole product cycle. Deep learning shows great potential in the implementation of QbD. We expect deep learning to significantly shorten the drug product development timeline and decrease the material usage. Furthermore, the crossdisciplinary integration of pharmaceutics and artificial intelligence may shift the paradigm of pharmaceutical research from experience-dependent studies to data-driven methodologies.
CLINICALTRIAL
N/A
Title: Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Description:
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100 publications.
The publications were categorized to one of the following areas of the pharmaceutical sciences: PK/PD, dose optimization, quantitative structure–activity relationship (QSAR); adverse drug event (ADE) prediction and drug–drug interactions(DDIs) and clinical trial simulation (CTS).
Supervised classification, regression, or a combination of the two approaches are the most frequently used ML methods and QSAR emerged as the most frequent application area.
Table 1 summarizes the pharmaceutical sciences applications and ML algorithms that have been investigated in the literature.
As noted, previously several recent reviews have focused on identifying the most promising and appropriate application areas and ML tools.
Mak et al.
provided an accessible primer on ML concepts that included a case study that employed RFR for investigating the structure–activity relationships of inhibitors of the putrescine transporter of trypanosome parasites.
Hutchinson et al.
proposed an implementation framework with two hypothetical examples that utilized deep learning to perform global parameter sensitivity analysis and for combining PK parameters with imaging and omics data.
Koch et al.
used CART-based ML approaches for covariate selection in the context of a simulated data and clinically relevant example of phototherapy for bilirubinaemia in neonates.
81 In a commentary, Chaturvedula et al.
highlighted the use of genetic algorithms (GAs) for model selection in population modelling and deep learning for target identification in drug repurposing.
OBJECTIVE
Applying deep learning to mine the increasing datasets in drug discovery not only enables us to learn from the past but to predict future drug repurposing specially in solid dosage formulations.
Deep learning was applied to predict successful pharmaceutical formulations by constructing regression models.
One of the main difficulties in formulation prediction is the small dataset with imbalanced input space due to the limited experimental data.
For better performance, the data splitting algorithm and the evaluation criteria suitable for pharmaceutical formulation data were introduced.
The DNNs were trained on the data of two types of pharmaceutical dosage forms, including oral fast disintegrating films (OFDF) and oral sustained release matrix tablets (SRMT).
Machine learning (ML) is enabling leap-step advances in several fields including drug discovery and materials science.
Machine learning is one of the most exciting research areas in recent years.
Machine learning can make data-driven predictions with existed oral formulation experimental data, which provides a great opportunity for efficient oral formulation development1–6.
A well-designed machine learning method can greatly speed the development, optimize oral formulations, save the cost, keep products consistency, and accumulate and preserve the specific knowledge and expertise from the experts in a well-defined domain1.
Table 1 summarizes recent development of machine learning in oral formulation application.
An expert system (ES) is an intelligent program with the ability to accumulate and preserve the knowledge and experiences of the experts in a specific area (e.
g.
, pharmaceutical formulations)12.
Machine Learning Tools Oral Formulation Design (Capsule/Tablet)
Hybrid expert system with ANNs Hard gelatin capsule formulations
Expert system (SeDeM Diagram) Orally disintegrating tablets
Expert system with ANNs Osmotic pump tablets
Ontology-based expert system Immediate release tablets
ME_expert 2.
0 Microemulsions formulation
Fuzzy logic-based expert system Freeze-dried formulations
Cubist and Random Forest Cyclodextrin formulations
Table 1.
Machine Learning Applications in Oral Formulation Design
Parenteral routes of administration are four routes of injectables, namely a.
subcutaneous (mostly under the skin) to release into patient’s systemic circulation, b.
intramuscular (mostly in a muscle) injected deep into muscle layers of rich blood supply, c.
intravenous (in a vein) dose directly, rapidly into systemic circulation and d.
intrathecal (around spinal cord), directly into vertebral column to bypass blood brain barrier.
One critical aspect of parenteral formulation is the pH level.
Drug formulations should have a target pH as close as possible to physiological pH.
The acceptable range is pH 2–11 for IV and intramuscular injections, and pH 4-9 for subcutaneous injection due to potential irritation issues.
Recently, long-acting injectable (LAIs) that are amenable to parenteral administration can confer several advantages over conventional drug formulations, including increased patient compliance and bioavailability of drug14.
Moreover, LAIs can be engineered to provide either local (e.
g.
, Zilretta®) or systemic (e.
g.
, Lupron Depot®) drug exposure over a prolonged period, making them ideal formulation strategies for the treatment of chronic diseases15.
Several strategies have been investigated to inform decision-making and expedite the drug formulation development process.
For instance, mathematical models have been used to describe and greatly enhance our understanding of drug release mechanisms16.
However, the application of these empirical models is limited to post hoc analysis of the in vitro drug release profiles of LAIs,and they do not offer information on in vitro drug release from LAIs a priori.
More recently, molecular dynamics simulations have been investigated17.
These techniques have been useful in quantifying links between drug release rates and formulation parameters (including particle size and drug loading levels)18.
While the development of these techniques is an active area of research, molecular scale simulations of entire drug delivery systems are computationally intensive.
These approaches can be used to confer useful information on potential LAI systems; however, they cannot currently be used in place of experimental drug release assays14.
Several studies have also investigated machine learning (ML) approaches19.
METHODS
Both drug and drug product components were used for representing the properties of active pharmaceutical ingredient (API) and inactive ingredients (excipient).
All drugs׳ name was described with the 4 molecular descriptors, including molecular weight, logP, pKa, and solubility.
The excipient types were encoded to different numbers.
The excipient performance and process parameters include %purity, %recovery, %in vitro and in vivo release of drugs at different timepoints and granulation process, with percentage of some excipients.
Data splitting strategy and hyper parameters of machine learning methods
A three-dataset (training/validation/test datasets) splitting strategy was used.
The training set is for training models, and the validation set is for tuning hyper-parameters to find the best model.
The accuracy of the test set shows the prediction ability on unknown data.
This strategy is widely adopted in machine learning.
For each dosage form, the pharmaceutical data were split into three subsets, both the validation set and the test set include 20 formulations, the rest of the data were used to train the models.
RESULTS
There are many reasons to classify drugs, ranging from understanding the usefulness of types of drugs to formulating treatment plans based on properties such as solubilty, hydrophobicity or log P and pKa for biological drugs.
The quality of each drug will be characterized by several screening methods as shown below in Figure 4-6.
Regression and Deep learning are the type of representation learning with multiple levels of transform modules, which contains more parameters than other learning algorithms and requires more data for training.
However, one of the main difficulties in pharmaceutical drug product prediction is the small dataset with imbalanced input space due to the limited experimental data.
Each drug has only around 109 drug product compositions.
There are 3 APIs in the subcutaneous injection dataset.
And all he APIs include with 109 drug product compositions.
Therefore, selecting representative datasets for training and test is very important for the drug product composition prediction.
In our research, the specific evaluation criteria were introduced, and several data splitting methods were investigated.
Moreover, deep learning was compared with other machine learning techniques for the drug product composition prediction( Zhang W et.
al.
, Sedan G et.
al.
,Ouyang D et.
al) [5-7].
Drug products are made up of different excipients involving injectable formulations.
The experimental data were extracted from our published data in the patents and Web of Science.
Three different searching terms of solubility in buffer, solvents and molecular weight were used for search literature about the development of Injectable formulations.
The molecular descriptors were used for representing the properties of APIs.
All drugs' name were described with the three molecular descriptors, including molecular weight, logP, and pKa.
The excipient types were encoded to different numbers.
The process parameters include lyophilization process and other excipient characteristics with compositions.
A three-dataset (training/validation/test datasets) splitting strategy was used.
The training set is for training models, and the validation set is for tuning hyper-parameters to find the best model.
The accuracy of the test set shows the prediction ability on unknown data.
This strategy is widely adopted in machine learning.
For each dosage form, the pharmaceutical data were split into three subsets, both the validation set and the test set include 109 formulations, the rest of the data were used to train the models (Rowe R.
C,et.
al.
, Zhang ZH, et.
al.
, Mendyk A et.
al.
) [6-8]
Three machine learning methods were introduced to construct regression models to compare with DNNs, including multiple linear regression (MLR),, random forest (RF) and k-nearest neighbors (k-NN).
These regression models were trained using the scikit-learn package [9].
For excipients, the number of components was set to maximum of 3.
In RF, the maximum depth of the tree was set to 3.
In k-NN, the number of neighbors was set to 5.
The 3 models were developed using the same hyperparameters.
In RF, the maximum depth of the tree was set to 5.
In k-NN, the number of neighbors was set to 3.
In machine learning, correlation coefficient and coefficient of determination are usually adopted as evaluation metrics for regression problems.
Correlation coefficient indicates the linear relationship between two variables.
The coefficient of determination shows the correlation between the predicted values and the real values.
However, the correlation coefficient and the coefficient of determination cannot properly evaluate the performance of the pharmaceutical formulation prediction models.
In pharmaceutics the good models for predicting drug dissolution profiles should have less than 10% error40.
Thus, specific criteria suitable for pharmaceutics should be introduced to evaluate the model performance.
Following the FDA (the U.
S.
Food and Drug Administration) recommendation using the similarity factor f2 to evaluate the similarity of drug dissolution profiles40, the similarity factor f2 was introduced to evaluate the performance of the models for predicting the cumulative drug release curves.
If the f2 is greater than or equal to 50, it is considered a successful prediction (Han X et.
al.
) [10]
The results are summarized as below in Figure 7-12.
Multiple linear regression approaches were employed in an attempt to learn a linear combination of input features which could predict the output.
Multiple linear regression is simple and easy to model.
The weights and biases of multiple linear regression could be directly calculated by using the least squares method.
Multiple linear regression have better interpretability than other nonlinear machine learning models because the weights could indicate the importance of the input features in the prediction.
However, multiple linear regression and partial least squared regression could only fit the linear function mapping, while obviously the relationship between the formulation and the key in vitro characteristics is complex and non-linear.
For comparison, we also trained and evaluated a series of ML models without any initial drug release measurements included as input features (i.
e.
, without the features T= 0.
1, 0.
2,0.
3,0.
4,0.
5, and T= 0.
6,0.
7,0.
8,0.
9 and 1.
0 (as shown correlation factor in Figure 9.
).
Models that include initial experimental measurements as input are referred to as few-shot models, and models without such inputs are referred to as zero-shot models.
Few-shot models necessitate the measurement of the first few experimental points before the predictions can be performed; however, the result is often a more accurate model.
In this study, it was found that the addition of initial drug release measurements (i.
e.
, few-shot models) resulted in better performance than the corresponding models without these features.
Random forest is an ensemble learning method.
Ensemble learning methods combining multiple base learners could obtain better generalization ability of models than a single base learner.
Random forest usually shows better performance than other ensemble learning models in many learning tasks.
In random forest, the diversity of the base learners is not only from the sample disturbance but also from the attribute disturbance, which makes the difference between the base learners increase and the generalization ability be further improved.
Support vector machine maps the sample from the original space to the higher dimensional feature space, therefore, the sample can be divided in the higher dimensional feature space.
However, the conventional machine learning methods highly rely on the feature extractors designed by the subjective expert experiences.
Furthermore, the representative abilities of artificial neural networks enhance with the increase of the hidden layers and hidden nodes.
The larger the model capacity, the more complex function the model can achieve.
Therefore, deep learning containing more hidden layers could make multiple abstractions and feature extractions, making deep learning be able to accomplish more complex tasks to higher accuracy than the shallow artificial neural networks.
In summary, cutting-edge ML technologies are now freely available to pharmaceutical and materialsscientists.
The results obtained in this study demonstrate the potential for ML to expedite the development of innovative drug-delivery technologies.
Among the strengths of modern ML techniques are their ability to provide insights into how models reach their predictions.
Herein we demonstrate that ML models can not only be used to predict in vitro drug release from different excipients with a high degree of accuracy but also that interpretation of such models can be used to guide the design of new formulation candidates.
In the current study, we found that for this dataset, the tree-based DNN and RF models provided the most accurate prediction of fractional drug release.
Given the small size of the dataset (<2000 observations) and that most of the data points contain variables that are properties of the drug or excipients, it is perhaps not surprising that the neural network models investigated perform well.
As the use of ML in drug formulation development increases, we anticipate that larger datasets will become available, leading to an increase in the utility of neural networks .
CONCLUSIONS
In this paper, deep learning models were successfully developed to predict drug product on 3 different injectable drugs with more than 100 excipient composition on small data.
The good generalization performance of the models was demonstrated by the external datasets.
The proposed models could effectively predict the key characteristics in regression problems than the models trained by other machine learning methods, because deep learning can find out the complex correlation in the data.
Modern successful pharmaceutical development needs to incorporate quality by design (QbD) concepts throughout the drug development process.
Machine learning methods could not only help to predict the in vivo and in vitro characteristics based on the drug product compositions and process data, but also assist in the pharmaceutical experimental design and help to control the product quality in the whole product cycle.
Deep learning shows great potential in the implementation of QbD.
We expect deep learning to significantly shorten the drug product development timeline and decrease the material usage.
Furthermore, the crossdisciplinary integration of pharmaceutics and artificial intelligence may shift the paradigm of pharmaceutical research from experience-dependent studies to data-driven methodologies.
CLINICALTRIAL
N/A.
Related Results
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Smart manufacturing has been developed since the introduction of Industry 4.0. It consists of resource sharing and networking, predictive engineering, and material and data analyti...
Study of Serum Calcium and Magnesium Levels in Reproductive Women Using Oral and Injectable Contraceptives
Study of Serum Calcium and Magnesium Levels in Reproductive Women Using Oral and Injectable Contraceptives
Background: The oral and injectable contraceptives fulfill the human need for birth control with great effectiveness. These can effectively prevent pregnancy and alleviate menstrua...
No-Charge Nonsurgical Facial Aesthetic Clinic in a Residency Program
No-Charge Nonsurgical Facial Aesthetic Clinic in a Residency Program
Background
In 2014, the Accreditation Council for Graduate Medical Education set minimum case requirements for injectable procedures as a surrogate for procedural compe...
Comparison of Unintended Pregnancy Rates between OCP and Injectable Contraceptive Users; A Prospective Cohort Study in Bahawalpur
Comparison of Unintended Pregnancy Rates between OCP and Injectable Contraceptive Users; A Prospective Cohort Study in Bahawalpur
Objective: To compare the rates of unintended pregnancies, adherence, and contraceptive usage patterns between oral contraceptive pill (OCP) users and injectable contraceptive user...
Feature selection strategies for drug sensitivity prediction
Feature selection strategies for drug sensitivity prediction
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. The major difficulty of this problem stems from the fact that the sensitivity of cancer...
Diverse mutant selection windows shape spatial heterogeneity in evolving populations
Diverse mutant selection windows shape spatial heterogeneity in evolving populations
ABSTRACTMutant selection windows (MSWs), the range of drug concentrations that select for drug-resistant mutants, have long been used as a model for predicting drug resistance and ...
An Approach to Machine Learning
An Approach to Machine Learning
The process of automatically recognising significant patterns within large amounts of data is called "machine learning." Throughout the last couple of decades, it has evolved into ...

