STOCK MARKET UPDATE

Ticker

6/recent/ticker-posts

Machine learning model for analysis of critically important antimicrobials for human medicine.


Abstract
With the development of antimicrobials, microbes have adapted and become resistant
to previous antimicrobial agents. Hence WHO recommended complete list of critically
important antimicrobials, highly important and important antimicrobials. So there is
a need to classify critically important antimicrobials for human medicine so these can
be used only for humans. Therefore machine learning model is developed in this paper
to classify critically important antimicrobials based on their amino acid composition
with great accuracy.

Keywords: antimicrobials, WHO, machine learning, amino acid composition,
Introduction

The science and practice of the diagnosis, treatment,
and prevention of disease is called medicine. Properties of medicine are
maintenance and restoration of health by the preventing and treating the
ill effects. They are responsible for killing or slow down the microbial
growth. Any kind of bacteria, viruses etc that are not visible to naked
eyes are called micro-organisms or microbes. Some category of
microbe is available in Table 1.
Table 1 Variety of microbes with example and their infection
Microbe Example Type of infection caused
Bacteria Staphylococcus aureus, etc  Some staph infections
Virus Inuenza  Flu
Fungi Candida albicans, etc  Yeast infections
Parasites Plasmodium falciparum, etc  Malaria
For treating human diseases different variety of antimicrobial
classes are used. These antimicrobials if used regularly develop
resistance called antimicrobial resistance. And the genes responsible
for resistance are called anti microbial resistance. For example,
the ndm-1 gene encodes resistance to the carbapenem family was
rst  discovered  in  Klebsiella pneumonia that was isolated from an
infected person.1 Most of  the AMR  are  hazardous to human  health.
Characteristics by which antimicrobials are classied are as follows:
Characteristic 1 (C1): The class that treat serious ill effects caused
by bacteria in people.
Characteristic  2  (C2):  The  action  of  antimicrobials  include:(a)
Bacteria that transmitted to humans from nonhuman sources, (b)
Bacteria that may acquire genes for resistance from sources other than
humans.

Antimicrobials vs antibiotics
The preventive measure in form of medicine are called antibiotics
which work against bacteria and treat bacterial infections. When
bacteria change their forms in response to the repeated use of
antibiotics develops antibiotic resistance. Broadly antimicrobial
resistance to drugs to treat infections caused by other microbes
such as parasites (e.g. malaria), viruses (e.g. HIV) and fungi (e.g.
Candida). Hence Antimicrobials are one of few alternatives for the
treatment of serious bacterial infections in humans that occupies an
important place in human medicine. Serious infections are likely
to  result  in  signicant  morbidity  or  mortality  if  left  untreated.
Multidrug resistance is also the outcomes of disease which relate to
the site of infection e.g. pneumonia, meningitis or the host e.g. infant,
antidepressant. The use of such antibacterial agents is preserved,
as loss  of efcacy  in these drugs due to the emergence  of resistance
leads to signicant impact on human health, especially for people with
life-threatening infections. These are the alternatives for the treatment
of serious bacterial infections in human that play an important role in
human medicine. If infections left untreated there would be signicant
morbidity or mortality. Sometimes multidrug resistance would also
occur like pneumonia, meningitis etc. The antimicrobial agents that
used to treat diseases caused by bacteria are transmitted to humans
from non-human sources i.e. water, food, environment or animal.
These are considered as highly important antimicrobials because such
infections are most amenable to risk management. Nonhuman sources
and the bacteria causing human diseases are linked. Such example
includes non-typhoidal salmonella, campylobacter spp. E. coli etc.
This is called commensalism. The commensalisms themselves may
also be pathogenic in immuno suppressed hosts. The transfer of their
genes shows the transmission of AMR. Interpretation of categorization
of antimicrobial class:

Critically important: Antimicrobial classes which meet both C1
and C2 are termed critically important for human medicine.
Highly important: Antimicrobial classes which meet either C1 or
C2 are termed highly important for human medicine.
Important: Antimicrobial classes used in humans which meet
neither C1 nor C2 are termed important for human medicine. The list
below is meant to show examples of members of each class of drugs. There are many antimicrobials like Aminoglycocides, ansamycins,
carbapenems and other penems, Cephalosporins, Glycopeptides,
Glycylcyclines, lipopeptides, Macrolids and ketolids, monobactrum,
Oxazolidinones,  Penicillins,  Phosphonic  acid  derivatives,
Polymyxins, Quinolones, sulfones, Tetracyclines, Nitrofuratoins, etc
are classied according to their  mode of action  and above explained
three categories. All the details of these antimicrobials are explained
in Table 2 which also describes  their signicance  of treating disease
and their causative organism respectively,
Materials & methods
For  classication  of  antimicrobials,  machine  learning  (ML)
techniques are employed. Because it is good in data analysis and
model building. ML  is a branch  of  articial intelligence3–9 it makes
system learn from data, identify patterns and make great decisions
without human interference. As there is huge amount of variety of
data computational processing are a need to understand huge data in
a better way for further use. These ML computational techniques are
cheaper and powerful tools to apply. Here in this paper author tries to
classify and develop model for critically important antimicrobials for
human medicine by support vector machines (SVM). It can be dened
as a  discriminative classier  means two objects or set of objects are
classied by a separating hyperplane. It could be said that, as labelled
training data (supervised learning) is given, the algorithm outputs
an  optimal  hyperplane  which  categorizes  new  examples.  Hence
hyperplane is a line dividing a plane in two parts where in each class
lay in either side in a two dimensional space.10–16
Data
In this section, preparation of training and testing dataset is
described. The amino acid composition of all the protein sequences
are taken from PROCOS (Protein composition server).17 It is very
time consuming and accurate. Predictions of sub cellular localization
of proteins are also used amino acid composition as described in 4 But
due to importance of amino acids, related work was also done. It is
said that the fraction of each type of amino acid type within a protein
is called as amino acid composition.
total number of amino acid i
Amino acid composition=
total number of amino acid in a protein

      equation1
After gathering all the protein sequence data which are called
peptides are divided into different groups called datasets. There are
three different datasets according to importance of antimicrobials.18
Datasets
Dataset 1:  Critically  important  antimicrobials:  The  microbes’
protein data which is available in Uniprot database is taken. And there
amino acid composition is taken by PROCOS software as input for
SVM. These are called training set and are positive samples needed to
be classied. For testing we took negative samples of other enzymatic
group.
Dataset 2:  Highly important  antimicrobials:  Same as  dataset  1
dataset 2 is prepared.
Dataset 3:  Important  antimicrobials:  similarly  dataset  3  for
important antimicrobials are also prepared.
Negative samples examples: With respect to positive samples,
it requires negative interaction examples to process the positive
samples accurately, as the SVM is a discriminative approach. When
experimental methods do not report an interaction between two
proteins, it means there positive signal does not imply a negative
signal. Hence no interaction between amino acids. It is required that
real negative examples are of important part for providing better
results.
Feature selection with SVD: (SVD) is a method to reduce the
dimensions and select the most relevant and informative features.
Principal component analysis19,20 is also used for feature selection and
dimensionality reduction. The higher the value of linear combination
of attributes, the more important it is. For any feature corresponding
eigen-value for PCA or singular value for SVD is found. Since
singular value are good to choose for features. In this work SVD
has lower computational cost. In SVD, the row belongs to proteins
play  good  role  in  combination  coefcients.  In  PCA  the  training
proteins are altogether calculated the covariance between attributes.
Suppose A={MO;ST} be the training dataset containing positive and
negative examples, a  matrix of size d*l is generated where d=p+n, it
is the number of train vectors, p is the number of positive examples,
n=number of negative examples, l= length of each vector. After extracting amino acid composition of different datasets, these results
fed as input to Support vector machines and by performing feature
selection and outlier  detection. It’s important to  nd the hyperplane
which clearly distinguish are dataset from one another with respect to
their negatives. For each run  of SVM the classier is developed and
their performance is measured.
Performance evaluation:
The performance  of our classier was
judged by 10 fold cross validation. The LIBSVM provides a parameter
selection tool using the RBF kernel: cross validation via grid search.
For each Dataset 1, Dataset 2, and Dataset 3 grid search is performed
using c and gamma. Test set was performed for 10% of all samples
and remainder samples are used for training. Generally SVM faces the
problem of  “over- tting”  where the  system converges  on the  set of
rules but it can be solved efciently. The test set and train set trees are
identied properly. To know the correct classication cross validation
process is used. This requires for each run 10% of sample is used as
test  set. Different rule  set up  test cases  are classied.  It was  found
that which rule has the most beautiful predictive ability to improve
is raised as best model evaluator. Over tting of the data leads to the
pruning.21
Results & discussion
Machine learning algorithm for classication of antimicrobials for
human medicine is implemented in this paper. All the three datasets
run in LIBSVM. And best result is obtained in the form of model.
Model development
It  is the  nal  step  when the  data  is  classied as  wanted. After
labelling  testing data and generating  several classiers.  It’s  nal to
choose which t best classication and develop model for future use.
Figure 2 shows the model for critically important antimicrobials.
Figure 2 Model for critically important antimicrobials.
According to the model development in SVM, there c,g and
accuracy are calculated simultaneously and can be written in the form
of Table 3 and all the required details are described later in this paper.
Table 3 Support vector machine results
Dataset C G Accuracy
Dataset 1 120 0.007813 99.8012
Dataset 2 120 0.0025 99.5
Dataset 3 120 0.0078 98.5
Figure  2 & Table 3 proves  better that  are datasets  are classied
accurately with great accuracy. As we focus on CIA, it was classied
with 99.8012% accuracy. And also proves for similar sequences.
Amino acid compositions are best suited to classify such sequences.
Detail description is as follows:
Accuracy can be calculated as: =
tp tn
tp tn fp fn
+
++ +
Where tp=all the true positives in the samples
tn=all the true negatives in the sample
fp=all the samples which behave as positive
fn=those samples which behave as negative
Precision and recall, accuracy all functions are inbuilt in LIBSVM.
By choosing correct c,g, software calculate all parameters and reect
the correct answer within minutes as per the volume of data. As the
result obtained clearly differentiate characteristics of antimicrobials
in three different groups. Any new antibiotic discovered can be
grouped in  above dened these categories. The  correct values of c,g
and accuracy  of all  the three  datasets identied. The c and g are  the
two parameters for RBF kernels. It can’t be judged which is best. But
the  LIBSVM  has  the  parameter selection tool which best nds the
c,g, and  accuracy. If good  (c) is  identied by  the classier  then it  is
better prediction. The prediction accuracy indicates the performance
on classifying an independent dataset. Hence it is good to know about
‘unknown” dataset. Again cross-validation is performed. In this n-fold
cross-validation the training set is rst divided into n-subsets of equal
size.  It would  work  sequentially  by  (n-1)  subsets.  Therefore cross
validation  is the  percentage  of data  which  is  accurately  classied.
This  cross  validation  removes  the  over  tting.  The  grid  search
approach is used because (a) it avoids exhaustive parameter search by
approximations or heuristics, (b) Computational time is less as there
is only two parameters. (c) Both c and g are independent. Hence SVM
is one of the best computational methods which reduce the cost of CV
and best is biological data classication.

Conclusion
Machine learning being an active area of research requires experts
that handle data safely and understand the data as information retrieval
system. Here machine learning model is developed for antimicrobials
which are used in human medicine. Hence WHO initiates how to
recommend critically important antimicrobials for human medicine?
It’s  a  need  to  describe  importance  of  human  medicine  publically.
So in this paper author well tried to classify critically important
antimicrobials for human medicine with great accuracy. Future
treatment should be given by seeing the effect of antimicrobials.
And any other microbe or antimicrobial is generated it should be
grouped according to its amino acid composition based category as
the machine learning model is being developed.
DR Anubha Dubey
Maulana Azad National Institute of Technology, Bhopal | MANIT · Department of Bioinformatics
MSc biotechnology, PhD bioinformatics,
phone No,9993210963
 
 If you like this story, share it with a friend! We are a non-profit organization. Help us financially to keep our journalism free from government and corporate pressure.
 

Post a Comment

0 Comments

Custom Real-Time Chart Widget

'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })();

market stocks NSC