Logo
Toggle Menu
  • home
  • blog
  • about
    • about this site
    • contact
    • privacy policy
  • cinema 4d
    • plugins
    • creating c4d plugins
    • plugin cookbook
    • software
  • models
    • nodes
    • plants 1
    • other models - 1
    • other models - 2
  • osl
    • writing osl shaders
    • osl shaders for download

Creating a new plugin: 5

Displaying bonds between atoms

Now we have atoms, we need to display the bonds between them. There are different kinds of bonds we might want to show: single bonds, double bonds, peptide bonds (which are single bonds but we might want to display them differently) and disulphide bonds between peptide chains, if there are more than one.

None of which is difficult. All we need to do is create a spline to represent a bond, and this can then be rendered directly or dropped into a Sweep object to create geometry. But where are the splines located? We can draw them between atom positions, which we already know from creating the atom spheres. The problem is, which atoms connect to which? Unlike .mol or .sdf files, PDB files contain little or no details about bonds. In some files there may be a few CONECT records which specify a bond between two atoms, but these are usually few and are only present when there is a reason for them.

Take a look at this slice from the glucagon PDB file:

Leucine shown in a PDB file.

This is a single leucine residue. There are eight atoms (the hydroxyl ion which is part of the terminating carboxyl group is never included in the atoms list because it is lost when one amino acid is linked to another). We can see from the rightmost column that there are six carbon atoms , a nitrogen and an oxygen. You might guess that, in the order shown, the atoms simply bond to one another in a string, but that's definitely not the case. The leucine molecule looks like this (carbon is grey, oxygen red and nitrogen blue):

Leucine molecule.
Leucine molecule (hydrogen atoms not shown); alpha carbon indicated

Which atom bonds to which other one(s) is the reason for giving each atom a name - these are the strings in the third column, so 'N', 'CA', 'CD1' and so on. Note that 'CA' is absolutely NOT the atomic symbol for calcium! To understand how this works, it's useful to know a couple of details about amino acid structure. All amino acids have an amino group (-NH2) and a carboxyl group (-COOH). Both these groups link to the same carbon atom known as the 'alpha carbon'. In fact, in the very simplest of all amino acids, glycine, that's all there is, as shown here with the alpha carbon indicated:

Glycine molecule.
Glycine molecule (hydrogen atoms not shown); alpha carbon indicated

Atom names and their use

All amino acids other than glycine have a side chain, which also links to the alpha carbon. So, going back to the slice from the PDB file, it seems a fair guess that 'CA' is the alpha carbon - which is correct. 'N' is the nitrogen from the amino group, 'C' is the carbon of the carboxyl group, and 'O' is the oxygen in the carboxyl group, linked by a double bond to 'C'.

Fine. The other four carbons in a leucine molecule are the side chain. As you might expect, the first is 'CB', which links to 'CA'. But...then there is a 'CG' followed by two 'CD' atoms, 'CD1' and 'CD2'. Where do those names come from? Why aren't they in alphabetical order? In fact they are, but to understand this, there are two important principles: no atoms in the same residue can have the same name; and the first letter of the atom's name identifies the element (so 'CG1' would be a carbon, 'NZ' would be a nitrogen, and so on).

What happens when you have two or more atoms of the same type? This is where the trailing characters 'B' or 'G' or 'D1' come into play. The concept is that as an atom increases in remoteness - as measured by the number of intervening links - from the alpha carbon 'CA' it gets a suffix in alphabetic sequence. To make sense of this, you have to know that the sequence used is the Greek alphabet. This goes alpha, beta, gamma, delta...etc. These characters can' t be used in a plain ASCII file, so there are corresponding letters like so:

  1. alpha = A
  2. beta = B
  3. gamma = G
  4. delta = D
  5. epsilon = E
  6. zeta = Z
  7. eta = H
  8. ...and so on

This explains the letter sequence. The first side-chain carbon atom linked to 'CA' is 'CB'. Linked to that is 'CG' (not 'CC' as you might expect - Greek alphabet, remember). The next would be 'CD' but if you look at the image of the leucine molecule above, you can see that there are two carbon atoms linked to 'CG' and both have the same remoteness from 'CA'. Since they can't have the same name, they are named 'CD1' and 'CD2'.

I mention all this, not because a user of the loader needs to understand it, but because if you are trying to make sense of a PDB file it is so easy to become confused (as I did). For example, suppose an atom has the name 'NH2'. It is difficult at first to realise that this is NOT an amino group with one nitrogen and two hydrogen atoms, but a nitrogen atom at the 'eta' level of remoteness and that there is at least one other atom at the same level.

What do we do with all this information?

Once you understand all this, you can see how links can be made between atoms in the molecule. Fortunately, in the vast majority of cases the atoms in any given amino acid are the same in any protein, and the internal structure of the molecule is unchanged. How do we use that data to work out the bonds?

There are three ways it could be done. They are:

  1. Assume that atoms join to the nearest atom to them in physical space. We can work that out by checking the distance between atoms. With leucine, doing that would work fine, but for other molecules it might not, so bonds might not be made where they should or might be made where they don't exist in reality. Despite these problems, it may be necessary to do that if we have to handle molecules whose internal structure is not always known.
  2. Develop an algorithm that works out, depending on atom names, which other atom(s) they should link to. Again, this would work with leucine as it's such as simple molecule, but for others it might (would!) become extremely complicated.
  3. Since there are only 20 'standard' amino acids in proteins plus two others that are much less common, and since the internal structure of the amino acid is fixed, we could produce a lookup table of bonds in any amino acid. For example, for leucine that table would specify seven bonds - CA to N, CA to C, C to O, CA to CB, CB to CG, CG to CD1 and CG to CD2. All we would need to do for each residue is to get the list of bonds for that residue type, then for each bond get the positions of the relevant atoms in that specific residue in the PDB file, which would give us the two points of the spline representing the bond. The lookup table could also specify the type of bond - single, double, etc.

I'm going to use option 3, because although it involves the most initial work in determining all those bonds, coding the bonds is easiest that way. It does not, of course, take into account any other molecules in a protein that aren't amino acids, but we'll get to that later. So in the next part of this series, we'll code the addition of bonds in the amino acid residues themselves.

Page last updated May 26th 2026

Blog articles

Creating a new plugin: 8 (June 4th 2026)

Creating a new plugin: 7 (May 31st 2026)

Creating a new plugin: 6 (May 28th 2026)

Creating a new plugin: 5 (May 26th 2026)

Creating a new plugin: 4 (May 23rd 2026)

Creating a new plugin: 3 (May 22nd 2026)

Creating a new plugin: 2 (May 20th 2026)

Creating a new plugin: 1 (May 17th 2026)

The 'Space' plugins (May 7th 2026)

Creating PBR materials (March 24th 2026)

Data Storage - Then and Now (March 6th 2026)

Old Poser assets (March 2nd 2026)

More Noise, please (January 18th 2026)

Using GIT in VS 2022 (December 26th 2025)

Handling missing plugins (December 12th 2025)

Plugin compatibility with R2026 (November 24th 2025)

Affinity is now free! (November 3rd 2025)

World Creator 2025.1 (October 10th 2025)

So that was Cinema R2026? (September 19th 2025)

How to browse 3D assets (August 24th 2025)

Using Unity assets in Cinema 4D (August 15th 2025)

Plant Factory->Cinema 4D->World Creator (August 12th 2025)

Viewing glTF files (August 9th 2025)

Tessellation part 2 (August 5th 2025)

Shader writing with OSL - 3 (July 11th 2025)

Tessellation (June 23rd 2025)

Creating plants for C4D (June 15th 2025)

Adobe alternatives (May 28th 2025)

Using Graswald assets in C4D (May 7th 2025)

Which Mac for plugin development? (May 3rd 2025)

Why do plugin writers do it? (April 11th 2025)

Updating StarScape (February 26th 2025)

Using Cinema 4D shaders in Redshift (January 31st 2025)

PHP and MySQL (December 19th 2024)

Shader writing with OSL - 2 (November 11th 2024)

Shader writing with OSL (October 29th 2024)

StarScape (September 25th 2024)

Converting plugins from C4D 2024 to 2025 (September 16th 2024)

Cinema 4D 2025 and macOS plugins (September 15th 2025)

© 2021-2025 Microbion. All Rights Reserved.