Leila Taher et al 2007 J. Phys.: Conf. Ser. 90 012004 doi:10.1088/1742-6596/90/1/012004
Leila Taher1,2, Peter Meinicke3 and Burkhard Morgenstern3
Show affiliationsIn most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called "splicing". The positions where introns are cut and exons are spliced together are called "splice sites". Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.
87.15.B- Structure of biomolecules
Issue 1 (2007)
Leila Taher et al 2007 J. Phys.: Conf. Ser. 90 012004
Satoshi Yamaguchi JHEP10(2002)002
Vijay Balasubramanian et al JHEP05(2000)014
Clifford V. Johnson et al JHEP05(2001)036
Vanicson L. Campos et al JHEP06(2000)023
Vatche Sahakian JHEP05(2000)011
P C W Davies 1978 Rep. Prog. Phys. 41 1313
K J Hinton 1983 J. Phys. A: Math. Gen. 16 1937
P G Grove and A C Ottewill 1983 J. Phys. A: Math. Gen. 16 3905
J S Dowker 1977 J. Phys. A: Math. Gen. 10 115