# Using a VOM model for reconstructing potential coding regions in EST sequences

@article{Shmilovici2007UsingAV, title={Using a VOM model for reconstructing potential coding regions in EST sequences}, author={Armin Shmilovici and Irad Ben-Gal}, journal={Computational Statistics}, year={2007}, volume={22}, pages={49-69} }

This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOM models is that their order may vary for different sequences, depending on the sequences’ statistics. As a result, VOM models are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for… Expand

#### 19 Citations

Gene-finding with the VOM model

- Mathematics, Computer Science
- J. Comput. Methods Sci. Eng.
- 2007

Experiments with the proposed gene-finder (GF) on three prokaryotic genomes indicate its potential advantage on the detection of short genes. Expand

Single Species Gene Finding

- Computer Science
- 2015

This chapter covers a five of the most commonly used mathematical models used as main algorithms in single species gene finding, which are hidden Markov models, generalized hidden MarkOV models, interpolated Markov model, neural networks, and decision trees. Expand

MicroRNA Prediction Using a Fixed-Order Markov Model Based on the Secondary Structure Pattern

- Biology, Medicine
- PloS one
- 2012

A new generation of miRNA prediction algorithm is provided, which successfully realizes a full-function recognition of the mature miRNAs directly from the hairpin sequences and presents a new understanding of the biological recognition based on the strongest signal’s location detected by FOMmiR. Expand

Classical and Quantum Algorithms for Constructing Text from Dictionary Problem

- Computer Science, Mathematics
- Nat. Comput.
- 2021

The classical algorithm is optimal up to a log factor, and the quantum algorithm shows speed-up comparing to any classical algorithm in a case of non-constant length of strings in the dictionary. Expand

Representing higher-order dependencies in networks

- Computer Science, Physics
- Science Advances
- 2016

The higher-order network (HON) representation is proposed, including accuracy, scalability, and direct compatibility with the existing suite of network analysis methods, and it is illustrated how HON can be applied to a broad variety of tasks, such as random walking, clustering, and ranking. Expand

Distributions of pattern statistics in sparse Markov models

- Mathematics
- Annals of the Institute of Statistical Mathematics
- 2019

Markov models provide a good approximation to probabilities associated with many categorical time series, and thus they are applied extensively. However, a major drawback associated with them is that… Expand

High-Order Entropy-Based Population Diversity Measures in the Traveling Salesman Problem

- Computer Science, Medicine
- Evolutionary Computation
- 2020

Three types of population diversity measures that address high-order dependencies between the variables to investigate the effectiveness of considering high- order dependencies are proposed. Expand

A boosting method with asymmetric mislabeling probabilities which depend on covariates

- Mathematics, Computer Science
- Comput. Stat.
- 2012

A new boosting method for a kind of noisy data is developed, where the probability of mislabeling depends on the label of a case. The mechanism of the model is based on a simple idea and gives… Expand

Representing Big Data as Networks: New Methods and Insights

- Computer Science, Physics
- ArXiv
- 2017

This dissertation proposes theHigher-order network, which is a critical piece for representing higher-order interaction data; it introduces a scalable algorithm for building the network, and visualization tools for interactive exploration, and presents broad applications of the higher- order network in the real-world. Expand

Measuring the Efficiency of the Intraday Forex Market with a Universal Data Compression Algorithm

- Economics
- 2009

Universal compression algorithms can detect recurring patterns in any type of temporal data—including financial data—for the purpose of compression. The universal algorithms actually find a model of… Expand

#### References

SHOWING 1-10 OF 41 REFERENCES

ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences

- Biology, Computer Science
- ISMB
- 1999

It is shown that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors. Expand

Modeling sequencing errors by combining Hidden Markov models

- Computer Science, Medicine
- ECCB
- 2003

This research improves the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. Expand

A VOM based gene-finder that specializes in short genes

- Computer Science
- 2004 23rd IEEE Convention of Electrical and Electronics Engineers in Israel
- 2004

The proposed VOM gene-finder outperforms traditional gene-finders that are based on fifth-order Markov models for short newly sequenced bacterial genomes. Expand

Interpolated markov chains for eukaryotic promoter recognition

- Computer Science, Biology
- Bioinform.
- 1999

A new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes based on three interpolated Markov chains of different order which are trained on coding, non-coding and promoter sequences is described. Expand

ExonHunter: a comprehensive approach to gene finding

- Biology, Medicine
- ISMB
- 2005

ExonHunter is a new and comprehensive gene finding system that outperforms existing systems and features several new ideas and approaches and gives a new method for modeling the length distribution of intergenic regions in hidden Markov models. Expand

DIANA-EST: a statistical analysis

- Medicine, Computer Science
- Bioinform.
- 2001

The goal of this work is the development of a new program called DNA Intelligent Analysis for ESTs (DIANA-EST) based on a combination of Artificial Neural Networks and statistics for the characterization of the coding regions within ESTs and the reconstruction of the encoded protein. Expand

Finding borders between coding and noncoding DNA regions by an entropic segmentation method.

- Computer Science, Medicine
- Physical review letters
- 2000

It is found that this method is highly accurate in finding borders between coding and noncoding regions and requires no "prior training" on known data sets. Expand

Variations on probabilistic suffix trees: statistical modeling and prediction of protein families

- Biology, Computer Science
- Bioinform.
- 2001

Exhaustive evaluations show that the PST model detects much more related sequences than pairwise methods such as Gapped-BLAST, and is almost as sensitive as a hidden Markov model that is trained from a multiple alignment of the input sequences, while being much faster. Expand

Assessment of protein coding measures.

- Biology, Medicine
- Nucleic acids research
- 1992

This paper reviews and synthesizes the underlying coding measures from published algorithms and concludes that a very simple and obvious measure--counting oligomers--is more effective than any of the more sophisticated measures. Expand

EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance

- Medicine, Biology
- BMC Bioinformatics
- 2002

A new automated gene-finding method, EasyGene, which estimates the statistical significance of a predicted gene based on a hidden Markov model (HMM) that is automatically estimated for a new genome. Expand