Package 'AmyloGram'

Title: Prediction of Amyloid Proteins
Description: Predicts amyloid proteins using random forests trained on the n-gram encoded peptides. The implemented algorithm can be accessed from both the command line and shiny-based GUI.
Authors: Michal Burdukiewicz [cre, aut] , Piotr Sobczyk [ctb], Jaroslaw Chilimoniuk [ctb] , Stefan Roediger [ctb] , Dominik Rafacz [ctb]
Maintainer: Michal Burdukiewicz <[email protected]>
License: GPL-3
Version: 1.2
Built: 2024-11-03 03:50:50 UTC
Source: https://github.com/michbur/amylogram

Help Index


Prediction of amyloids

Description

Amyloids are proteins associated with the number of clinical disorders (e.g., Alzheimer's, Creutzfeldt-Jakob's and Huntington's diseases). Despite their diversity, all amyloid proteins can undergo aggregation initiated by 6- to 15-residue segments called hot spots. Henceforth, amyloids form unique, zipper-like beta-structures, which are often harmful. To find the patterns defining the hot spots, we developed our novel predictor of amyloidogenicity AmyloGram, based on random forests.

Details

AmyloGram is available as R function (predict.ag_model) or shiny GUI (AmyloGram_gui).

The package is enriched with the benchmark data set pep424.

Author(s)

Maintainer: Michal Burdukiewicz <[email protected]>

References

Burdukiewicz MJ, Sobczyk P, Roediger S, Duda-Madej A, Mackiewicz P, Kotulska M. (2017) Amyloidogenic motifs revealed by n-gram analysis. Scientific Reports 7 https://doi.org/10.1038/s41598-017-13210-9


AmyloGram Graphical User Interface

Description

Launches graphical user interface that predicts presence of amyloids.

Usage

AmyloGram_gui()

Warning

Any ad-blocking software may cause malfunctions.


Random forest model of amyloid proteins

Description

Random forest grown using the ranger package with additional information.

Format

A list of length three: random forest, a vector of important n-grams and the best-performing encoding.

See Also

ranger


Protein test

Description

Checks if an object is a protein (contains letters from one-letter amino acid code).

Usage

is_protein(object)

Arguments

object

character vector where each elemenents represent one amino acid.

Value

TRUE or FALSE.


pep424 data set

Description

Benchmark dataset for PASTA 2.0. 5 sequences shorter than 6 amino acids (1% of the original dataset) were removed.

Usage

pep424

Format

a list of 424 peptides (class SeqFastaAA).

Source

Walsh, I., Seno, F., Tosatto, S.C.E., and Trovato, A. (2014). PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Research gku399.


Predict amyloids

Description

Recognizes amyloids using AmyloGram algorithm.

Usage

## S3 method for class 'ag_model'
predict(object, newdata, ...)

Arguments

object

ag_model object.

newdata

list of sequences (for example as given by read.fasta).

...

further arguments passed to or from other methods.

Examples

data(AmyloGram_model)
data(pep424)
predict(AmyloGram_model, pep424[c(4, 10)])

Print AmyloGram object

Description

Prints ag_model objects.

Usage

## S3 method for class 'ag_model'
print(x, ...)

Arguments

x

ag_model object.

...

further arguments passed to or from other methods.

Examples

data(AmyloGram_model)
print(AmyloGram_model)

Print AmyloGram prediction

Description

Prints ag_prediction objects.

Usage

## S3 method for class 'ag_prediction'
print(x, ...)

Arguments

x

ag_prediction object.

...

further arguments passed to or from other methods.

Examples

data(AmyloGram_model)
data(pep424)
pred <- predict(AmyloGram_model, pep424[c(4, 10)])
print(pred)

Read sequences from .txt file

Description

Read sequence data saved in text file.

Usage

read_txt(connection)

Arguments

connection

a connection to the text (.txt) file.

Details

The input file should contain one or more amino acid sequences separated by empty line(s).

Value

a list of sequences. Each element has class SeqFastaAA. If connection contains no characters, function prompts warning and returns NULL.


Specificity/sensitivity balance

Description

Sensitivity, specificity and Matthew's Correlation Coefficient of AmyloGram for different cutoffs computed on pep424 dataset.

Usage

spec_sens

Format

a data frame with four columns and 99 rows.

Source

Walsh, I., Seno, F., Tosatto, S.C.E., and Trovato, A. (2014). PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Research gku399.