Condensed representation

The chemical structure representation of a protein sequence can be compressed efficiently by representing all unmodified natural amino acids by single so-called pseudo atoms.

[Image: Example protein sequence.]

Example protein sequence. The second residue of the second chain has been modified from Glu to 4-carboxy-glutamate.

[Image: Full structure version of example protein.]

Full structure representation of example protein sequence.

[Image: Condensed structure version of example protein.]

Condensed structure representation of example protein sequence. Only the 4-carboxy-glutamate is represented as a full structure. All other residues are represented by single atoms.


Below are examples of full structure molfiles generated from protein entries with a varying number of expressed amino acids.

If you attempt to register these molecules in a normal chemistry database, you will find that registration eventually becomes a very lengthy process as the number of residues grow.

By using the condensed representation instead the size and complexity of the molfile is greatly reduced - without chemical information loss. Compare the file sizes and registration performance of the full-structure files above with the equivalent condensed representations using pseudo atoms below.

The condensed representation above is known as the "simple condensed" format. Symyx/MDL Draw users will probably want to use the "MDL condensed format" which includes extra annotations that enable the drawing tool to expand/contract residues and layout disulfide bridges nicely.