Title: | Prediction of Amyloid Proteins |
---|---|
Description: | Predicts amyloid proteins using random forests trained on the n-gram encoded peptides. The implemented algorithm can be accessed from both the command line and shiny-based GUI. |
Authors: | Michal Burdukiewicz [cre, aut] , Piotr Sobczyk [ctb], Jaroslaw Chilimoniuk [ctb] , Stefan Roediger [ctb] , Dominik Rafacz [ctb] |
Maintainer: | Michal Burdukiewicz <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2024-11-03 03:50:50 UTC |
Source: | https://github.com/michbur/amylogram |
Amyloids are proteins associated with the number of clinical disorders (e.g., Alzheimer's, Creutzfeldt-Jakob's and Huntington's diseases). Despite their diversity, all amyloid proteins can undergo aggregation initiated by 6- to 15-residue segments called hot spots. Henceforth, amyloids form unique, zipper-like beta-structures, which are often harmful. To find the patterns defining the hot spots, we developed our novel predictor of amyloidogenicity AmyloGram, based on random forests.
AmyloGram is available as R function (predict.ag_model
) or
shiny GUI (AmyloGram_gui
).
The package is enriched with the benchmark data set pep424
.
Maintainer: Michal Burdukiewicz <[email protected]>
Burdukiewicz MJ, Sobczyk P, Roediger S, Duda-Madej A, Mackiewicz P, Kotulska M. (2017) Amyloidogenic motifs revealed by n-gram analysis. Scientific Reports 7 https://doi.org/10.1038/s41598-017-13210-9
Launches graphical user interface that predicts presence of amyloids.
AmyloGram_gui()
AmyloGram_gui()
Any ad-blocking software may cause malfunctions.
Random forest grown using the ranger
package with additional
information.
A list of length three: random forest, a vector of important n-grams and the best-performing encoding.
Checks if an object is a protein (contains letters from one-letter amino acid code).
is_protein(object)
is_protein(object)
object |
|
TRUE
or FALSE
.
Benchmark dataset for PASTA 2.0. 5 sequences shorter than 6 amino acids (1% of the original dataset) were removed.
pep424
pep424
a list of 424 peptides (class SeqFastaAA
).
Walsh, I., Seno, F., Tosatto, S.C.E., and Trovato, A. (2014). PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Research gku399.
Recognizes amyloids using AmyloGram algorithm.
## S3 method for class 'ag_model' predict(object, newdata, ...)
## S3 method for class 'ag_model' predict(object, newdata, ...)
object |
|
newdata |
|
... |
further arguments passed to or from other methods. |
data(AmyloGram_model) data(pep424) predict(AmyloGram_model, pep424[c(4, 10)])
data(AmyloGram_model) data(pep424) predict(AmyloGram_model, pep424[c(4, 10)])
Prints ag_model
objects.
## S3 method for class 'ag_model' print(x, ...)
## S3 method for class 'ag_model' print(x, ...)
x |
|
... |
further arguments passed to or from other methods. |
data(AmyloGram_model) print(AmyloGram_model)
data(AmyloGram_model) print(AmyloGram_model)
Prints ag_prediction
objects.
## S3 method for class 'ag_prediction' print(x, ...)
## S3 method for class 'ag_prediction' print(x, ...)
x |
|
... |
further arguments passed to or from other methods. |
data(AmyloGram_model) data(pep424) pred <- predict(AmyloGram_model, pep424[c(4, 10)]) print(pred)
data(AmyloGram_model) data(pep424) pred <- predict(AmyloGram_model, pep424[c(4, 10)]) print(pred)
Read sequence data saved in text file.
read_txt(connection)
read_txt(connection)
connection |
a |
The input file should contain one or more amino acid sequences separated by empty line(s).
a list of sequences. Each element has class SeqFastaAA
. If
connection contains no characters, function prompts warning and returns NULL
.
Sensitivity, specificity and Matthew's Correlation Coefficient
of AmyloGram for different cutoffs computed on pep424
dataset.
spec_sens
spec_sens
a data frame with four columns and 99 rows.
Walsh, I., Seno, F., Tosatto, S.C.E., and Trovato, A. (2014). PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Research gku399.