The following article is Open access

On splice site prediction using weight array models: a comparison of smoothing techniques

, and

Published under licence by IOP Publishing Ltd
, , Citation Leila Taher et al 2007 J. Phys.: Conf. Ser. 90 012004 DOI 10.1088/1742-6596/90/1/012004

1742-6596/90/1/012004

Abstract

In most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called "splicing". The positions where introns are cut and exons are spliced together are called "splice sites". Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.

Export citation and abstract BibTeX RIS

Please wait… references are loading.
10.1088/1742-6596/90/1/012004