Counting and Enumerating Isomers

Structural isomers are chemical compounds having the same molecular formula (i.e., the same atoms) but different structural formulas (i.e., different topologies). Isomer counting and enumerating are long standing problems. Isomer counting was first studied more than 100 years ago by A. Caley who developed a counting series to compute the number of isomers of paraffin structures. The counting problem has also been investigated by G. Polya, who illustrated the first paper on his theory of counting with organic compounds. Although, counting series have been developed in specific cases, it is very unlikely that a general formula will ever be found to count all isomers. Due to the advance in computer technology in the 1960s, the isomer problem switched from counting to enumeration, that is the search of a systematic algorithm to list and construct all the chemical structures corresponding to a given molecular formula. The first attempt was made with the DENDRAL project and lead to the design of the first expert system. Isomer enumeration is still a very active field of research since no algorithm has yet been found that generates an exhaustive and irredundant (non-isomorphic) list of compounds in a reasonable computational time. In particular, the question whether or not a polynomial delay (i.e., polynomial time per output) algorithm can be found, remains open.

We have proposed a new technique to enumerate isomers that drastically reduces the number of isomorphic structures output by the algorithm. The method consists of partitioning vertices into equivalent classes during the generation process. We have proven that the final list of isomers generated is exhaustive and irredundante. The technique is polynomial delay is specific cases, such as, when enumerating of non-cyclic structures. We have also designed a randomized version of the enumeration algorithm for approximate counting purposes. Our randomized counting algorithm is based on Knuth technique to approximate the number of leaves of a backtracking tree. The time complexity of the approximate counting algorithm is illustrated in the figure below and compared with the time complexity of an exact counting.


References

  • Faulon, J.L., On Using the Graph-Equivalent Classes for the Structure Elucidation of Large Molecules, J. Chem. Inf. Comput. Sci., 1992, 32, 338-348. (.pdf)
  • Faulon J. L., Visco D. Churchwell, C. J., The Signature Molecular Descriptor. 2. Enumerating Molcules from their Extended Valence Sequences, J. Chem. Inf. Comput. Sci., 43 (3), 721 -734, 2003. [PMID: 12767130]. (link to journal)