Recent findings in identification of potential drug target sites for HIV by HIV-1 and HIV-2 structural and regulatory proteins, HIV miRNA/RNAi based drug identification, siRNA is found to be therapeutic agents, Subcellular based drug identification, membrane protein based and many more are identified & classified by using Bioinformatics and machine learning approaches. Discovering of potential target sites include complete assessment of experimental, mechanistic and pharmacological studies not only theoretical but molecular druggability assessment is also important, it also includes opportunity of suitability of disease, like HIV/AIDS. Here in this article, a set of potential drug target sites are focussed to be summarized which are identified by machine learning techniques with great accuracy. Keywords: Therapeutics, target, Disease, HIV/AIDS, Machine learning
INTRODUCTION:
The human immunodeficiency virus (HIV) is a lentivirus (a subgroup of retrovirus) that causes
HIV infection and over time acquired immunodeficiency syndrome (AIDS). The second most
common infectious cause of death globally [1,2]. Since the first reported cases of AIDS and the
discovery of HIV as its cause in the early 1980's, researchers and clinicians have struggled to
develop and administer effective therapeutics to combat HIV/AIDS, and the production of a
viable vaccine remains an unrealized goal [3]. The difficulty of developing effective HIV
therapeutics and vaccines is due largely to the extraordinary mutation rate of HIV, which
enables the virus to rapidly evade the selective pressures imposed by antiretroviral medications
and potential vaccines by generating a large and genetically diverse population through
mutagenesis [4]. A comprehensive knowledge of the genetic diversity characteristic of HIV
populations in infected individuals - what have been termed viral quasispecies - is therefore
essential for the discovery and delivery of effective HIV medications and vaccines. Both HIV-1
and HIV-2 are believed to have originated in non-human primates in West-central Africa, and
are believed to have transferred to humans (a process known as zoonosis) in the early 20th
century[5,6]. HIV-1 is thought to have jumped the species barrier on at least three separate
occasions, giving rise to the three groups of the virus, M, N, and O. The RNA genome consists of
at least seven structural landmarks (LTR, TAR, RRE, PE, SLIP, CRS, and INS), and nine genes (gag,
pol, and env, tat, rev, nef, vif, vpr, vpu, and sometimes a tenth tev, which is a fusion of tat, env
and rev), encoding 19 proteins. Three of these genes, gag, pol, and env, contain information
needed to make the structural proteins for new virus particles [7]. For example, env codes for a
protein called gp160 that is cut in two by a cellular protease to form gp120 and gp41. The six
remaining genes, tat, rev, nef, vif, vpr, and vpu (or vpx in the case of HIV-2), are regulatory
genes for proteins that control the ability of HIV to infect cells, produce new copies of virus
(replicate), or cause disease [8]. The complete genomic structure is Using bioinformatics approaches and machine learning techniques, the target sites for HIV are
identified and classified. Although vast majority of targets being currently addressed for drug
discovery are proteins, in the near future nucleic acids could gain more and more importance as
drug targets [9,10] drug targets. The overall drug target families are recently been analysed
applying the DRUGBANK database [11]. It is the most important source of information for drugs
and drug targets. Cutting edge chemical approaches or chemoinformatics approaches have
identified novel mechanisms of drug molecule interaction and suitability of drug in treatment
process. A druggable target is a protein, peptide or nucleic acid with activity that can be
modulated by a drug, which can consist of a small molecular weight e.g., enzymes, receptors,
protein-protein interface, Nucleic acid i.e. RNA. And Biologics i.e. extracellular proteins, cell-
surface receptors etc. A target is said to be good because of following properties:
Properties of an ideal drug target:
Target is disease-modifying and/or has a proven function in the pathophysiology of a
disease. Modulation of the target is less important under physiological conditions or in
other diseases. If the druggability is not obvious (e.g. as for kinases) a 3D-structure for
the target protein or a close homolog should be available for a druggability assessment.
Target has a favourable ‘assay ability’ enabling high throughput screening. Target
expression is not uniformly distributed throughout the body.
A target/disease-specific biomarker exists to monitor therapeutic efficacy. Favourable
prediction of potential side effects according to phenotype data (e.g. in knockout. mice
or genetic mutation databases).
Target has a favourable IP (intraperitoneal or interpharengeal) situation i.e. a situation
where no competitors on target, freedom to operate in a target at any way [12].
The approaches of drug trial of any disease by machine learning and computational methods
help to develop a model for wet lab with low cost and high efficiency. The schema of this has
been shown as:
Understanding of Disease Identification of molecular drug target
Characterization of molecular drug target interaction of drug with target site
Clinical data if existing
Expression at molecular level functional pathway analysis
Utilize data phenotypically
Target modulation
Control mechanism with
Suitable model development
By machine learning techniques
Host disease interaction w.r.t target identification
Effect on biomarkers
Insilico drug trial on model
If above steps successful
Apply to animal model for clinical trials
Figure 2: Schematic representation of drug finding opportunities to clinical drug trial.
The above schema clearly presents the steps followed during in-silico model development for
drug trial, if anyhow successfully not treated with animal model than again back to
characterization of molecular drug target is necessary. And remaining steps are followed
accordingly. This will provide low cost, high efficient approach for clinical drug trials.
In this present review article potential drug target sites for HIV are tried to summarize for
future identification and characterization of HIV drug delivery. They are discussed as: [A] HIV1and HIV2 structural and functional proteins: The structural and functional proteins of
HIV are also found to be potential drug target sites on the basis of their amino acid composition
[12]. Knowledge of protein structure plays a crucial role in analysis of protein function,
simulation of protein ligand interaction, rational drug discovery and in many other applications
[49]. The HIV1 lead to faster disease progression as compared to HIV2. HIV1 and HIV2 are
classified using amino acid composition of structural and regulatory proteins which are
classified by support vector machines [13]. According to that amino acid composition of
structural proteins of HIV-1 and HIV-2 are found similar. There are variance in only regulatory
proteins that is vpu and vpx which are found to be uniquely in HIV-1 and HIV-2 respectively. In
this review it is emphasized that the difference between vpu vs vpx to be major potential drug
targeting proteins for HIV.
Viral protein U (Vpu) is a lentiviral viroporin encoded by human immunodeficiency virus type 1
(HIV-1) and some simian immunodeficiency virus (SIV) strains. This small protein of 81 amino
acids contains a single transmembrane domain that allows for supramolecular organization via
homoligomerization or interaction with other proteins. The topology and trafficking of Vpu
through subcellular compartments result in pleiotropic effects in host cells. Notwithstanding
the high variability of its amino acid sequence, the functionality of Vpu is well conserved in
pandemic virus isolates The regulation of cellular physiology by Vpu and the validity of this
viroporin as a therapeutic target are also discussed. It is possible that HIV-1 regulatory proteins
produced from multiply spliced transcripts as a result of basal transcription in latently infected
cells might alter several pathways to enhance the homing, spreading, and survival of infected
lymphocytes, thus contributing to the establishment and maintenance of viral latency.
HIV-1 Infection Ensures a Balance between Cell Survival and Apoptosis—HIV-1-induced
apoptosis plays an important role in the pathogenesis of AIDS. Several viral proteins contribute
to the induction of apoptosis including Vpr, Vpu, and Tat [19-22]. Although a growing body of
evidence suggests that the HIV-1 accessory proteins, namely Nef and Vpr, could be involved in
depletion of CD4+ and non-CD4+ cells and in tissue atrophy, they also have been implicated in
delaying the death of HIV-1-infected cells [23]. These apparently contradictory observations can
be explained by the fact that cell depletion is likely to be predominantly a bystander effect by
extracellular or cell surface-associated components of HIV-infected cells [24,25].
[B] miRNA/RNAi based drug identification: MicroRNAs (miRNA’s) are small RNAs of 21–25
nucleotides that specifically regulate cellular gene expression at the post-transcriptional level.
miRNA’s are derived from the maturation by cellular RNases III of imperfect stem loop
structures of ~ 70 nucleotides. Correct identification of miRNA that regulate cellular processes
and impact economically important traits for drug. This requires better understanding of
characteristics of miRNA’s which can be done by understanding the differences between
miRNA’s of different organisms [14]. They have applications in forensic science where miRNA
belonging to organism can be identified and as the classification is extended further
incorporating all organisms in the mirBASE registry [50]. But from literature survey it appears
that no attempt has been made to develop computational approaches for classification of
plant, animal and HIV miRNA’s [15], Thus there is a need to develop newer algorithms which
are robust, fast and economical considering the financial and time constraint which it poses on
existing lab techniques.
siRNAs are also being evaluated as potential therapeutic agents. A number of publications have
shown that siRNAs can inhibit the replication of HIV [44] and Hepatitis B [45].The most
significant hurdle for the therapeutic use of siRNA is delivery: how can siRNAs be targeted to
specific cells? Delivery of nucleic acids to specific organs, tissues and cells will require significant
advances in nucleic acid chemistries, including possible novel conjugations and/or formulations
to specifically target certain cells. The first indication for siRNA to reach clinical trials is likely to
target the VEGF receptor for wet acute macular degeneration. Other indications that require
systemic applications of siRNA will require new formulations to ensure targeting of the siRNA to
the desired organ and tissue. Two other hurdles for siRNA therapeutics relate to challenges faced by all nucleic acid
therapeutics: drug stability and manufacturability. These modifications are the result of years of
research for antisense therapeutics, ribozyme therapeutics and aptamer technologies,
providing a head start for siRNA therapeutics. If hurdles are resolved and clinical trials are
successful then these new technologies will likely be required to support a major
pharmaceutical product for mankind [16].
[C] Subcellular localization based Drug identification: Amino acids are critical to life, and have a
variety of roles in metabolism. One particularly important function is as the building blocks of
proteins, which are linear chains of amino acids. Every protein is chemically defined by this
primary structure, its amino acids can be linked together in varying sequences to form a vast
variety of proteins [16]. Due to their importance a machine learning simulation model is being
developed to classify and predict subcellular localization of HIV apoptosis proteins [17].
Commercially available softwares i.e. EukMPloc, Subloc, VirusPloc are ued to predict the
subcellular location of HIV apoptosis proteins in the given study. Comparative Analysis of these
softwares with support vector machines shows which one is better for particular analysis of
available data. As studies done by Dubey et al [17]:
Subcellular
Localization of
proteins
EukMPloc
SubLoc
Virusploc
No. of
proteins
accuracy
No. of
Proteins
accuracy
No. of
proteins
Accuracy
Plasma
membrane
115
99.900
165
86
-
-
Cytoplasm
7
98.889
7
82.9167
112
94.382
Nucleus
91
96.667
94
81.3187
29
94.953
Mitochondria
21
99.1304
1
90
18
99.2308
Secreted
proteins
22
98.113
-
-
-
-
Cytoskeleton
8
-
-
-
-
-
Extracellular
6
96.2104
1
87.9167
94
98.889
Table 1. Comparative analysis of various softwares with their accuracies
Eukaryotic Mploc is most suited for finding subcellular localizations in Plasma membrane and
cytoplasm. The site of Subcellular localization can be used to predict the HIV progression i.e. in
mitochondria with great number of dying cells suggest infected person is in IIIrd or IVth stage of
HIV, whereas subcellular localization in plasma membrane, cytoplasm and extracellular space
shows infected person is in Ist or IInd stages[17].
[D] Membrane Protein based drug identification: Membrane proteins are attractive drug
targets but determination of membrane protein structures or topologies by experimental
methods is expensive and time consuming. So there is a need of effective computational
methods in predicting the membrane protein types or transmembrane helices can provide
useful information for large amount of protein sequences [18]. HIV protein sequences from
Uniprot database are collected and bioinformatics and machine learning techniques help to
identify and classify proteins into membrane proteins and soluble proteins [18].WEKA software
package is used for classification of membrane and soluble proteins [51]. The Support Vector
Machine based classification of HIV membrane proteins and soluble proteins on the basis of
amino acid based composition gives 97% accuracy [18, 26]. Further analysis of this study shows
gp120,gp41, gag, pol, gag-pol polyprotein are the most classified HIV membrane proteins.
These may prove better opportunity for targeted drug delivery.
Artificial intelligence-based techniques such as SVM and the neural network and WEKA
classifiers are elegant approaches for the extraction of complex patterns from biological
sequence data.
[E]Motif /Domain based prediction: The domain based classification of HIV-1, HIV-2 and there
subtypes [26] would help in the development of novel approaches to wet lab techniques in
devising novel drugs and therapeutics. The correlation of protein domain with its structure
explored in [26] can be useful to obtain better insights about these proteins. The accuracy
prediction of SBASE proves better in predicting protein domains in dataset given. It is definitely said that as more and more sequences are being updated in databases, the model developed is
further improved [26].
Domains
Function
RNA recognition motif
Bind single stranded RNA
K homology domain
Nucleic acid bindings RNA binding and
recognition
Glycine rich domain
Nuclear localization, protein binding
Arginine-glycine –glycine box
RNA binding
Proline rich domain
Protein interaction domain
Zinc finger domain
DNA binding
Asp-glu rich acidic domain
DNA/RNA mimicry
Table1. Domains identified by commercially available software’s.
According to Dubey A, it was showed that Zinc finger domain and asp-glu rich acidic domain are
mostly observed in all sub-types of HIV-1 being a potential drug target site.
[F] G-protein coupled receptors (GPCR) based drug designing: GPCR crystal structures used for
structure-based drug design (SBDD) based on three dimensional (3D) protein structures [27].
The impact of GPCR crystal structures on SBDD has been immediate and has led to the
discovery of novel ligands for multiple GPCRs. The crystal structures have also provided
opportunities for homology modelling to identify novel GPCR target site for HIV and CXCR4
proteins [27, 28]. These GPCR play an important role in evolutionary relationships to other
species for interpreting naturally occurring receptor mutation in patients and for guiding
structural relevance of individual GPCRs to other associated diseases. Signal pathway analysis
can be done through this study which will be helpful in GPCR protein –protein interaction for
making right path for drug delivery.
[G] Residue based drug designing: Protease can be used for drug design because it has an
extremely elegant, economical structure, made up largely of b-strands and preserve perfect
two-fold symmetry if no substrate or inhibitor is bound [29]. Structure and function are
intimately related. Hence X-ray Crystallography is the only method for determining the absolute
configuration of a protein molecule of HIV. Detailed classification of alpha, beta and residues
are identified by machine learning techniques are studied by Dubey et al [30]. Hence this can be
further used in computer aided drug discovery, structural identification and comparison of
functional sites. Protein interaction of targeted residues with drug molecule proves the better
sites for HIV.
[H] Target related Biomarkers based drug designing:
There are two major types of biomarkers: biomarkers of exposure, which are used in risk
prediction, and biomarkers of disease, which are used in screening, and as diagnosis and
monitoring of disease prediction [53]. Biomarkers used in risk prediction are CD4+ T cell count
and increase of viral load screening shows the stage of HIV [48]. Biomarkers of disease are
interferon γ, RANTES, MIP-1β, d-dimer, IP-10, Fibrinogen and others [53]. The immune system
produces a milieu of cytokines that may work to help or hinder virus growth and reservoir
establishment. Cytokines like Interferon γ induced protein 10, IFN γ, IL7, IL-15, IL-6, IL-12p40/70
works for prediction of future prediction of viral load and disease progression when used as
biomarkers. Results suggest that CD4+cells, IL-10, and sometimes P24 could be useful
biomarkers for diagnosis of HIV/AIDS and for Individuals that screen positive regardless of
whether or not they have AIDS. Also, treatment should be available for those who screen HIV
positive with AIDS. Machine learning techniques are used to describe and classify HIV
biomarkers [48] and these are effective targeting drug for HIV/AIDS.
The above discussed potential targets, or combinations of the multi-target drugs and drug
combinations were collected from the existing database and literature. Multi target drugs were
obtained from Therapeutic target database TTD and clinical trials.gov database [21]. Followed
by literature and the developmental status will be collected. Drug combinations were obtained
from Drugs @FDA or PubMed [31, 32]. By combining the primary therapeutic target of all drugs
in a certain drug combination, target combination would be generated. The biochemical class,
structural fold, and pathway information of each target in a specific target combination were
also obtain from Uniprot /Swiss prot database [33], i.e. HIV1and HIV2 Sequences. Protein Data
Bank (PDB) structures are also available for each enzymes of HIV also mutant variety are
available for further drug related experiments. CATH, Gene 3D, and other databases like KEGG
are also functional to study enzymatic pathway. And effect of drug design on model in- silico is
es classifier is a simple probabilistic classifier based on applying
Bayes' theorem with strong (naive) independence assumptions. A more descriptive term for the
underlying probability model would be "independent feature model". In simple terms, a naive
Bayes classifier assumes that the presence (or absence) of a particular feature of a class is
unrelated to the presence (or absence) of any other feature, given the class variable [38,41].
Using Bayes' theorem, conditional probability can be written as
1
11
( ) ( , , | )
( | , , ) ( , , )
n
nn
p C p F F C
p C F F p F F
(2.6)
Or in simple terms it may be written as
prior likelihood
posterior evidence
(vi) BAYES NET: It is based on Bayesian theorem.
(vii) LOGISTIC: In statistics, logistic regression or logistic model is used for prediction of the
probability of occurrence of an event by fitting data to a logit function logistic curve. It is a
generalized linear model used for binomial regression. Like many forms of regression analysis, it
makes use of several predictor variables that may be either numerical or categorical. For
example, the probability that a person has a heart attack within a specified time period might
be predicted from knowledge of the person's age, sex and body mass index. Logistic regression
is used extensively in the medical and social sciences fields, as well as marketing applications
such as prediction of a customer's propensity to purchase a product or cease a subscription
[39,40].
An explanation of logistic regression begins with an explanation of the logistic function, which,
like probabilities, always takes on values between zero and one:
1
() 11
z
zz
e
fz ee
Hence these are called intelligent machine learning techniques because Machine Learning is a
technique which works intelligently by using some complex algorithms and set of predefined
rules. It uses the past data to read the patterns and then based on the analysis it generates the
relevant data or performs the intended task abiding the defined rules and algorithms
DISCUSSION: An important aspect used to judge the validity of a given target depends on the
indication of for which the target is considered. More importantly the requirements in terms of
safety and tolerability for such a drug in a preventive way are more challenging in diseases like
HIV/AIDS. It needs a careful evaluation whether a multiple target approach is to be preferred or
“one drug one target guidance needs to be followed [43].
The above discussed potential drug target sites of HIV needs to prove therapeutic use. The
contribution of a pharmaceutical company to the value chain is a patentable chemical
compound that becomes a drug rather than the target itself. As many targets are initially
identified in scientific literature, there is a need to build up a direct relationship between the
degree of validation and the competition around a given target. So there is a need to working
together of scientific community and pharmaceutical companies to save time and cost in
achieving the target, simultaneously producing beneficial products for mankind. Knowledge of
subcellular localization of a protein can be significantly improving target identification during
the drug discovery process. As secreted proteins and plasma membrane proteins are easily
accessible by drug molecules due to their localization in the extracellular space or on the cell
surface. The study of hybrid model for classification of HIV1 and HIV2 proteins on the basis of
amino acid composition and dipeptide composition shows the interaction between these two.
These prediction results help to find the dipeptide motifs, domain interactions, protein
interactions, protein folding, since it provides global information of a protein.
Structural and regulatory proteins of HIV1 & II have been an active area of research. Due to
high efficient techniques of data mining or machine learning, structural classification of HIV
proteins/enzymes can be done with fair accuracy. Here structural classification done on the
basis of alpha, beta and residues. Such approaches can develop new insights for structural
classification of HIV proteins to find drug targets and protein engineering and to develop
databases. And any new protein engineered or find out can further be classified as the models
developed. The above model is useful for generating information which can be of great use in
prediction of structure and function of all the enzyme structures present since they are key
drug targets. The protein structure belonging to a particular class will have functional domains, CONCLUDING REMARK
The present manuscript provides the point of view on potential drug target sites for HIV/AIDS. It
is believed that these intelligent machine learning techniques or combination of these
approaches would help in thoroughly performing target validation for HIV/AIDS. Which should
help to reduce attrition rates in the later stages of drug development. Complementary
approaches, such as molecular barcoding, will also be required if we are to understand how the
mutant spectrum changes temporally or spatially within an infected host. Finally, future drug
and vaccine studies will need to be carried out in well-defined animal models, as subtle
differences can have a significant impact on experimental outcome. Despite these obstacles, it
is observed that quasispecies theory will soon move out of the laboratory and begin to
influence the control and treatment of HIV/AIDS. As new therapeutics are identified or
validated the databases are further improved and analysis can be done for future use. The
potential sites discussed so far in this review will also play an important role in Vaccine
development for HIV/AIDS, which is in infancy soon it will take boom with the help of machine
learning techniques.
DR ANUBHA DUBEY INDEPENDENT RESEARCH INDIA
PHONE NO,9993210963
/https://www.facebook.com/Kanishk-103603547852455/
https://ashutoshdubey3489.wixsite.com/kanishksocialmedia
Social media is bold.
Social media is young.
Social media raises questions.
Social media is not satisfied with an answer.
Social media looks at the big picture.
Social media is interested in every detail.
social media is curious.
Social media is free.
Social media is irreplaceable.
But never irrelevant.
Social media is you.
(With input from news agency language)
If you like this story, share it with a friend!
Social media is young.
Social media raises questions.
Social media is not satisfied with an answer.
Social media looks at the big picture.
Social media is interested in every detail.
social media is curious.
Social media is free.
Social media is irreplaceable.
But never irrelevant.
Social media is you.
(With input from news agency language)
If you like this story, share it with a friend!
We are a non-profit organization. Help us financially to keep our journalism free from government and corporate pressure.
0 Comments