To browse other articles on a range of HSL topics, see the A-Z index.
Nucleotides are molecules that comprise the structural elements of RNA (ribonucleic acid) and DNA (deoxyribonucleic acid). As such, nucleotides are the basic building blocks in nucleic acids. RNA and DNA, for example, are polymers made up of long chains of nucleotides. A nucleotide consists of a sugar molecule (either ribose in RNA or deoxyribose in DNA) which is attached to a phosphate group and a nitrogen-containing base. The bases used in DNA are adenine (A), cytosine (C), guanine (G), and thymine (T). In RNA, the base uracil (U) takes the place of thymine. Nucleotide sequence homology (sameness) cannot be reliably detected below roughly 75% identity. Below 50%, most hits in a search database are probably noise. Most nucleotide searches are therefore medium or high-identity matches and the NCBI algorithms are usually effective.
Nucleotide searching requires the use of Entrez's Nucleotide database where gene and nucleotide sequences are freely-searchable. The Nucleotide database is a collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB. When searching databases of nucleotide or protein sequences, finding a local alignment of two sequences is one of the main tasks. A key fact is that genome, gene and transcript sequence data provide the foundation for biomedical research and discovery. Gene and nucleotide sequences are used to find:
You can search for nucleotides at PubMed’s main page, under "Popular" (bottom) or switch between databases on main page. Above search, there is a pull-down menu to select Nucleotide. When possible, use BLAST with amino acid sequences using BLASTp. BLASTn for nucleotide sequences assumes that all substitutions in base pairs are equal when this is not true. The rate of transition mutations (purine to purine or pyrimidine to pyrimidine) is approximately 1.5-5X that of transversion mutations (purine to pyrimidine or vice versa) in all genomes where it has been measured (see Wakely, Mol Biol Evol 11(3):436-42, 1994).
Code Degeneracy. Some amino acids are coded by more than one codon (eg. serine is coded by UCU or AGC). This leads to great variation in how the BLAST algorithm may interpret a nucleotide sequence. However, it is useful to run BLAST on nucleotide sequences. Treat it like an experiment: try blastn, megablast and blastx or tblastx.
BLAST search tool
What is being searched?
Nucleotide searches retrieve results from three databases:
Sources for the sequences
How to search Nucleotide
How search results are displayed
The results are displayed from the newest addition to the oldest date that any given sequence has been entered into the database. Searchers can select to display results by accession number, organism name, taxonomy ID or date entry was modified or released. Click on the “Sort By” drop-down menu to make your selection.
How to read a sequence record
There are a number of formats available to view records such as GenBank, FASTA, Graphics, ASN.1, Revision History, GenBank (Full). The following is a description of the GenBank flat file, the default display.
See a sample sequence record.