Weighted k-word matches: a sequence comparison tool for proteins

Date

2011

Authors

Jing, Junmei
Wilson, Susan
Burden, Conrad

Journal Title

Journal ISSN

Volume Title

Publisher

Australian Mathematical Society

Abstract

The use of k-word matches was developed as a fast alignment-free comparison method for dna sequences in cases where long range contiguity has been compromised, for example, by shuffling, duplication, deletion or inversion of extended blocks of sequence. Here we extend the algorithm to amino acid sequences. We define a new statistic, the weighted word match, which reflects the varying degrees of similarity between pairs of amino acids. We computed the mean and variance, and simulated the distribution function for various forms of this statistic for sequences of identically and independently distributed letters. We present these results and a method for choosing an optimal word size. The efficiency of the method is tested by using simulated evolutionary sequences, and the results compared with blast.

Description

Keywords

Citation

Source

ANZIAM Journal

Type

Journal article

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

2037-12-31