Sequence Collection: Entries from 3699 RefSeq prokaryotic genome, plasmid or bacteriophage projects available in Nov 2012 were searched with the Bruce and Aragorn tmRNA-finding software. Complete bacterial genomes that did not yield a convincing Bruce/Aragorn tmRNA sequence were searched by multiple additional methods, identifying additional tmRNA sequences but leaving for the first time some complete bacterial genomes with no convincing tmRNA sequence. Resulting sequences were merged with those from the previous collection, yielding 1633 unique primary tmRNA sequences. These sequences were used to query all NCBI BLAST nucleotide databases (except pdbnt), retaining only perfect complete matches or perfect matches truncated at entry ends and unambigously matching only one query. Among the 12509 hits, some allowed completion of previously truncated primary sequences.
Gene Coordinates: Determining the gene coordinates for standard tmRNA genes is easy because of their canonical tRNA-like sequences. For permuted genes for two-piece tmRNAs the outer gene coordinates are idiosyncratic and not tRNA-like. We use data from Mao et al 2009, Sharkady and Williams 2004 and Gaudin et al 2002 on promoter relationships and terminal secondary structures to make these calls.
Abbreviated Taxonomy: We use NCBI taxonomy, but in an abbreviated three-level form, to speed taxonomy-based browsing. Each taxon is assigned a "Phylum" name that is usually the legitimate phylum name of the taxon, otherwise using BACTERIOPHAGES, Bacteria (when no phylum is assigned), artificial sequences or unclassified sequences. The two largest phyla, Firmicutes and Proteobacteria, are further broken down by class. For hits to eukaryotic taxa, a meaningful high-level taxonomic grouping near the phylum level was chosen, and these were sorted into mitochondrial, plastid, or "co-eukarayote" categories, the latter being probable bacterial symbiont or microflora tmRNA sequences found contaminating other eukaryotic genome projects. Each taxon is also assigned a "Genus" name which is simply the first word (unless "Candidatus") from the NCBI species name; in the case of the metagenomic taxon names of the "unclassified sequences" Phylum these are descriptors such as "marine", "mine", and "metagenome".
Sequence Webpages: There is one page per primary tmRNA sequence. Each is named according to its first-sequenced taxon.
Sequence: The following is in lower case: the tag reading frame, the position of the CCA tail regardless of the DNA sequence, the intervening sequence in two-piece tmRNA genes, and group I introns.
Proteolysis Tag: The encode proteolysis tag peptide sequence, preceded by (A) to remind that the alanine charging tmRNA itself is the first added residue, and tailed with an asterisk for each tandem stop codon.
CDS: coordinates of the proteolysis tag reading frame.
Acceptor Piece, IVS, Coding Piece: coordinates of the three segments of permuted tmRNA genes.
Exon1, Intron, Exon2: coordinates of the three segments of group I intron-interrupted tmRNA genes.
CCA: coordinates and genome-encoded sequence of the three positions of the CCA acceptor tail of mature tmRNA.
Identical tmRNA Sequences: hits to Genbank entries are grouped by taxon, showing the accession and coordinates hit, also noting truncated hits.
SmpB: NCBI DNA sequence files were six-frame translated, and SmpB sequences were collected using five CDD (cd09294, COG0691,pfam01668, PRK05422, TIGR00086) profiles with RPSTBLASTN and the Pfam HMM SmpB with HMMER. The sequences presented here are from the start codon we determined comparatively to the stop codon. Amino acids whose codons are potential start codons are capitalized in the sequence text; Z symbolizes the stop.