# Chemical Graph Theory

This is a PLOS Topic Page draft

Public peer review comments will be posted here

Authors

Mehmet Aziz Yirik
AFFILIATION: Analytical Chemistry, University of Jena , Lessingstrasse 8, 07743, Jena, Germany
https://orcid.org/0000-0001-7520-7215

Kumsal Ecem Colpan
AFFILIATION: Molecular Biology and Genetics, Boğaziçi University , Boğaziçi University, 34342 Bebek/Istanbul Turkey
https://orcid.org/0000-0002-8689-218X

AFFILIATION: Analytical Chemistry, University of Jena , Lessingstrasse 8, 07743, Jena, Germany
https://orcid.org/0000-0002-4802-228X

Maria Sorokina
AFFILIATION: Analytical Chemistry, University of Jena , Lessingstrasse 8, 07743, Jena, Germany
https://orcid.org/0000-0001-9359-7149

Christoph Steinbeck
AFFILIATION: Analytical Chemistry, University of Jena , Lessingstrasse 8, 07743, Jena, Germany
https://orcid.org/0000-0001-6966-0814

## Abstract

The chemical graph theory is an interdisciplinary field, combining methods and theorems from mathematics and chemistry to solve chemical problems. In this field, molecular structures are represented as mathematical graphs. In such molecular graph, nodes and edges represent atoms and bonds. These mathematical graphs can then be reduced to graph-theoretical descriptors or indices, which reflect the physical properties of molecules. One of the most famous examples of a graph-based molecular descriptor is the Wiener index, which corresponds to the sum of the lengths of all shortest paths in a molecule and correlates with its boiling points. Besides chemical indices, applications of graph theory in chemistry includes isomer enumeration, molecular substructures searching in chemical databases and molecular structure generation.

## History

Graph theory and chemistry have been successfully combined for a long time so solve chemistry-related tasks and problems. Different studies related to both disciplines proved the utility of their association and formed the interdisciplinary field now known as chemical graph theory[1]. First, graph theory started with the problem Leonard Euler introduced in 1736: the Königsberg bridges problem. The question there was to know whether it was possible to pass by the seven bridges on the river Pregel in Königsberg without passing by any of them twice. Euler found a solution to the question by using the very first graph theoretical techniques, thus laying the foundation for this new discipline[2]. Since then, verious studies from different scientific fields, such as Cayley’s isomer enumeration[3] and Kirchhoff’s electrical circuit studies[4], enriched the implementations of graph theory.

The first applications of chemical graphs were registered in the late eighteenth century, where chemistry, together with other disciplines, was heavily influenced by the ideas of Isaac Newton. It was particularly the case of the studies of the interactions between the atoms that got special attention during that century, despite the fact that the chemical bonds were not yet identified. Therefore, the very first use of chemical graphs was representing the hypothetical forces between atoms within molecules[5]. In 1805, [[w:John_Dalton|John Dalton] built the very first atomic model by representing different atom types with circles, a representation that is now commonly used to depict molecular structures[6]. However, his work was limited to his chemical knowledge and he only could show the chemical positions and atom numbers in a molecule[7]. Some years later, August Kékule showed both physical positions and orientations of atoms in a molecule, when in his “Tetrahedral Carbon Atom” model, he classified several organic molecules and depicted the bond orders between atoms, using the benzene ring as example[8]. This work led to three-dimensional thinking in chemical modelling and his tetrahedral carbon atom model inspired structure modelling for both organic and inorganic chemical compounds. After Kékule, Alfred Werner developed the coordination chemistry, enunciating the idea that the atoms have specific natural properties which are related to their location in a molecule [9]. He also was the first to represent complex compounds with octahedral models. Following that, more molecules have been represented with a three-dimensional structure [10]. In 1861, Alexander Butlerov finally introduced the term “molecular structure” leading to the acceptance that every chemical substance should have a fixed structure: an immutable set of atoms and bonds between them, explaining also some of the chemical properties of the substances[11]. To develop this concept further, illustrations, analyses and formulations of different chemical compounds have been proposed across the years by various scientists such as Johann Döbereiner, Alexander Williamson or Archibald Couper. In particular, the modern bond representation as a line between an atom pair was first used by William Higgins in his chemical structure models[12]. However, these lines were not yet representing specific bonds, but rather only the interatomic forces. For this reason, it is Couper who is considered as the father of the chemical bond representation, with his usage of graphical edges to depict them. He also defined the molecular formula of acetic acid and illustrated its chemical structure with straight lines between atoms to represent the chemical bonds[13].

The idea of fixed valence bonds between different atom types was first published in the book of Edward Frankland: “Lecture Notes for Chemical Students” in 1866. In his book, he showed several atom types, such as hydrogen, zinc, boron and carbon, with their specific valence bonds[14]. Following this study, more chemistry-based graph-theoretical analyses were performed, in particular by Arthur Cayley[15] and James Joseph Sylvester[16], who constructed chemical graphs with respect to the structural formulations of chemical substances. Cayley also developed the representation as a tree of alkanes, so-called kenograms, that enable structural isomer enumeration of this chemical class. At the same time, Sylvester got the idea to label vertices with letters to enhance chemical graphs with a variety of properties and improve their readability. The cyclomatic number[17] also needs to be mentioned, as it played an important role in chemical graph theory, as it is important for the analysis of rings in molecules. It can be defined as the number of cycles in a graph and using on this number, a hydrocarbon form without any cyclic substructures was demonstrated by William Kingdon Clifford in 1875[18].

Discoveries of structure and valence theories in chemistry showed that chemistry and mathematical analysis work well together to expand the knowledge about chemicals and their properties. The development of chemical graph theory particularly advanced in the twentieth century with the discovery and synthesis of new molecules. To determine the possible number of chemical structures for molecules, isomer enumeration became popular in the 1930s and following that, in the late 1950s, the topological properties of chemical bonding, like the topological indices, gainer interest of researchers. Besides isomer enumeration, construction of chemical graphs is another combinatorial problem in chemical graph theory. In general, these chemical graph generators engender all possible structures based on given molecular formulas and substructures. The earliest chemical graph generator, CONGEN, came from a Stanford team in the 60s.

## Graph Theory Background

Fig 1. Graph representation of dopamine molecule. (A) Molecular structure of dopamine. (B) Graph representation of the molecule.

In chemistry, molecules are commonly represented as graphs, where vertices and edges respectively represent atoms and bonds. Vertex degrees and edge multiplicities correspond to atom valences and bond multiplicities. The distance between two vertices is the number of edges in the shortest path. In a graph, a path means a walk in which a vertex is only visited once. In a connected graph, between every vertex pair, there must be at least 1 path. Mostly, molecular graphs are connected graphs. In a graph, vertices are adjacent if they are joined by an edge. Graph adjacency is mostly represented by adjacency matrices. In 1874, Sylvester represented an organic molecule connectivity with an adjacency matrix. It is a square matrix whose dimension is ${\displaystyle n*n}$ for a molecule with n atoms.

${\displaystyle [A(G)]_{i,j}:={\begin{cases}1&{\mbox{if}}\ i\neq j\ {\mbox{and (i,j) ∈ E(G)}}\\0&{\mbox{if}}\ i=j\ {\mbox{or (i,j) ∉ E(G)}}\\\end{cases}}}$

By providing the bond orders between atoms, the adjacency matrix can be turned to a connectivity matrix. Molecular connectivity is also represented by connectivity matrices. Different graphs might be topologically equal. In classification of graphs, their equivalence classes comprise isomorphic graphs. Two graphs are called isomorphic if there is an edge-preserving bijection mapping a pair of adjacent vertices to a pair of adjacent vertices in the other graph. Isomers of a molecule are the set of isomorphic molecular graphs. Graph isomorphism and connectivity are key criteria in chemical graph theory, especially in chemical graph generation. The term, subgraph, is equivalent to molecular substructures in chemical graph theory. In the field, the key substructures are cycles. From graph theory, the cyclomatic number means the number of cycles in a graph. The formulation of the cyclomatic number is:

${\displaystyle c=n_{e}+n_{v}+1}$

In this equation, ${\displaystyle n_{e}}$ represents the number of edges and ${\displaystyle n_{v}}$ represents the number of vertices in a graph. In addition, several mathematicians developed different formulations to determine the number of cycles in both organic and inorganic substances. One of the extended versions of the cyclomatic number formula was developed by the mathematician Oliver Lodge.[19] He used the valence concept to obtain closed chemical formulas for different organic and inorganic compounds, and he invented the formulation below:

${\displaystyle c=1/2[r(k-2)+s(l-2)+t(m-2)+{\mbox{...}}]+1}$

With this formulation, where ${\displaystyle k,l,m}$ represent the atom valences, he calculated the number of cycles in different covalent molecules.

## Applications

### Isomer Enumeration

Fig 2. Three isomers of dichlorobenzene. (A) Molecular structure of 1,2-Dichlorobenzene. (B) Molecular structure of 1,3-Dichlorobenzene. (C) Molecular structure of 1,4-Dichlorobenzene.

Isomer enumeration was at the origin of the first application of graph-theoretical techniques for solving a chemical problem. In 1811, Gay-Lussac found out that some compounds have the same atom and bond sets but different structures, which became the current definition of molecular isomerism. Before this, isomerism for chemical compounds was defined as "possessing the same chemical constitution and molecular weight but differing properties" by Berzelius, in 1830[20]. The modern definition for chemical isomerism was confirmed and popularized by Butlerov, who obtained different isomers of methane molecules substituted by chloride molecules in 1862[21][22].

Nowadays, isomers are classified into two groups: constitutional and steric. Constitutional isomers provide only atom neighbourhood information. The configurational information, such as bond angle, bond length, and distance between disconnected atoms, are retained only in stereoisomerism. Pasteur was the first scientist who introduced the term stereoisomers[23], which are molecules that have the same number and type of chemical bonds, same planar structure, but different spatial orientations of the latter[24]. Chemical graphs are particularly suitable for enumeration of structural isomeric compounds due to their ability to show only the topological positions of the atoms in a molecule. They are, however, not convenient for stereoisomeric substances, because of their inability to give information about spatial positions of atoms in molecules.[25]

James Joseph Sylvester was the first mathematician who contributed to systematically enumerate isomers. In his study, he used the enumeration of rooted trees, which in chemistry correspond to alkyl radicals with i-non-hydrogen atoms. This enumeration method relied on the polynomial representation of molecular connectivity. In other words, the coefficients of the polynomials represented the number of bonds between atoms. In 1874, Cayley extended the problem to the enumeration of unrooted trees, using Alkanes as chemical examples for this approach. This contribution increased the interest of the scientific community in the isomer enumeration problem. Following this study, Schiff, Hermann, Tiemann and many others worked on the enumeration of alkanes. However, none of them found the general formula for alkane isomer enumeration. Only later, in 1937, the general formula for isomer enumeration has been finally found by Polya. Polya’s enumeration theory was a general counting method with implementation into a variety of fields. The method basically relied on symmetry operations. From the mathematical point of view, symmetry groups and their actions were used for the polynomial representations, and cycle indices of each symmetry were calculated for polynomial coefficients. The calculation of the cycle indices themselves was based on conjugacy classes of the acting symmetry group. The step-by-step construction of Polya’s polynomial can now be found in the mathematical chemistry and cheminformatics book[26].

Polya’s enumeration method was then successfully used for enumeration of the boat and chair cyclohexane isomers by Pevac[27]. For a given molecule, first, its symmetry group was calculated. Then, a polynomial representation was constructed based on cycle indices. In the literature, isomer enumeration studies were mostly used for the enumeration of special compound classes such as alkanes[28], aromatic hydrocarbons[29] and polycyclic aromatic compounds[30].

### Chemical Graph Generation

Molecular structure generation is a branch of graph generation problem. The earliest molecular structures were modified versions of graph generators. Structure generators generate computer representations of chemical structures adhering to certain boundary conditions. These generators are mostly BFS or BFS based combinatorial algorithms requiring these basic inputs: molecular formula and substructures. To elucidate the structure of an unknown molecule, all combinatorially possible molecular extensions should be taken into account. These algorithms extend intermediate structures in a recursive manner until the molecular saturation. Extension of intermediate structures causes a combinatorial explosion. Thus, many structure generators have been designed in line with mathematical theorems. Group theory, especially permutation group theory, has been applied to accelerate the calculation of bond extension. Compared to bond by bond extension, group theoretical methods complete the extension process in one go. CONGEN was the earliest structure generator, a part of the first CASE system, DENDRAL. [31] CONGEN, first, built a tree based structure. Each node of the tree represented a substructure of the unknown molecule. These substructures were extended based on group theoretical lemmas. Besides group theory, many other mathematical theorems have been applied in the field. MASS, a tool for mathematical synthesis and analysis of molecular structures was a matrix based structure generator. This method was considered as an adjacency matrix generation algorithm.[32] In the literature, structure generators are classified into two groups: structure assembly and structure reduction methods.

Assembly methods start the generation with a set of atoms from a molecular formula. Atoms are combinatorially assembled until the saturation. The earliest assembly method was ASSEMBLE from Munk and Shelley.[33] This generator was a part of CASE system, called very trivially, "CASE".[34] ASSEMBLE was not able to deal with substructural overlaps. Contrary to ASSEMBLE, GENOA was a constructive substructure search-based assembly method; which well dealt with substructural overlaps.[35] In the 80s, a set of CASE papers, CHEMICS, was contributed to the field by Japanese scientists.[36] The vector representation of components and their usage in the generation process were the novelties of the study. All component sets were ranked from primary to tertiary used in the extension process.

In assembly methods, molecular extensions usually end up with a combinatorial explosion. To cope with this problem, orderly structure generators have been developed. MOLGEN is a well known orderly structure generator.[37] As descendants of DENDRAL and MASS methods, MOLGEN also generates structures first as connectivity matrices with respect to chemical constraints; then stores in an output file. In the matrix generation, rows (or columns) are built in descending order. MOLGEN is an efficient but commercial structure generator. Besides MOLGEN, another commercial structure generator is from one of the MASS developers, Michael Elyashberg, ACD Labs. The structure generator is a part of the commercial CASE system, StrucEluc.[38]

Unlike assembly methods, reduction methods construct a hypergraph with all possible bonds among atoms. First, the existence of substructures are checked in the hypergraph, then the irrelevant bonds are removed. If a substructure is not in the hypergraph anymore, it is removed from chemical constraints. In some assembly methods, substructural overlaps were not taken into account; however, all structure reduction methods well dealt with structural overlaps due to the hypergraph structure. COCOA was the earliest reduction method from Munk and Shelley[39]; later integrated into the CASE system, called SESAMI.[40] In COCOA algorithm, substructures were represented as atom centered fragments, the list of atoms’ first neighbours. As the improved version of COCOA, HOUDINI was also released.[41] Although structural overlaps were taken into account, the massive size of hypergraphs was the main disadvantage of reduction approaches. Bohanec combined two methods to overcome the disadvantages of both method types; assembly and reduction. The algorithm, GEN[42], avoided irrelevant bonds and assembled atoms based on substructural information.

### Molecular Fragmentation

Fig 3. Four fragments of ZINC42650.

In metabolomics, one major challenge is the identification of unknown molecules. For this purpose, Liquid chromatography (LC), mass spectrometry(MS), tandem mass spectrometry(MS/MS) and Nuclear Magnetic Resonance are commonly used techniques.[43] Once a spectrum for a molecule acquired, the very first step is to search if this spectrum present in spectral libraries, and if yes, to retrieve the corresponding structure. However, searching in the spectral libraries has limitations, in particular, the library size and the reliability of the spectral data.[44] Therefore, for successful structure identification using spectral libraries, a massive increase in the number and size of accurate spectral libraries is needed [45]. Besides direct spectral matching against relevant libraries, indirect search in comprehensive molecular structure databases can also be used. This approach consists of computationally predicting spectra for known molecular structures from these databases, and then match these spectra to the unknown one. The computational spectra prediction for molecular structure involves simulating their fragmentation in the same way the molecules would be fragmented in the mass spectrometer. For this purpose, a variety of rule-based and combinatorial in silico fragmentation methods have been developed. The rule-based methods rely on manually determined rules, developed by experimental fragmentation patterns, which provide particular structural properties[46]. The DENDRAL project which started in 1965, was the first to use such an approach for structure prediction for unknown spectra but unfortunately was not successful with the mass spectral data-based structure elucidation. Since then, rule-based fragmenters have been considerably improved and the most popular ones, Mass Frontier, ACD/MS and MOLGEN/MS are being successfully used in the scientific community. Nowadays, using manually determined rules to fragment structures provides consistent, fast and accurate simulated spectra for molecular structures, ready for matching against experimental spectra. The main disadvantage of rule-based methods is the need of expert-curated rules[47] since the constant discovery of new molecular classes needs new fragmentation rules [48].

To face the constant increase of known molecular classes, combinatorial fragmentation approaches, which do not require any manual curation become more and more popular[49]. Contrary to the rule-based methods, combinatorial algorithms generate fragments based on the cleavage of chemical bonds in a molecule: the cleaving bonds are first scored, then the bond disconnection is performed based on these penalty scores. In this way, for a given molecular graph, a range of chemically relevant fragmentation is performed and possible spectra are computed based on them. These synthetic spectra are then ready to match against the experimental spectrum. Among the most used combinatorial molecular graph fragmenters can be cited FiD (Fragment iDentificator)[50], MetFrag[51], MAGMA[52], CFM-ID[53] and Sirius[54]. In both approaches, molecular fragmenters rely on molecular graph theory for the most chemically relevant ways to fragment molecules.

### Molecular Descriptors

Molecular descriptors are defined by Todeschini and Consonni, modern leaders in research on the topic, as:

“The molecular descriptor is the final result of a logical and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment.”[55][56]

Molecular descriptors are, in a broad sense, a way to describe and quantify a chemical structure with mathematical and cheminformatics tools. It needs to be clear that there is no molecular descriptor that fits all applications and that the same molecule can be meaningfully analyzed and described with different descriptors depending on the question to be answered and aims to be reached. Various types of molecular descriptors exist [57], and many involve chemical graph theory in their definition. Among those, there are chemical indices, topological indices, autocorrelation descriptors, geometrical descriptors and some of the molecular fingerprints.

##### Topological indices

Topological indices are two-dimensional molecular descriptors based on the topology of the molecular structure when represented as a graph. The molecular graph is the first topological index, as it is the 2D graph representation of a molecule. Therefore, the graph G=(V,E) represents the molecular structure, the set of vertices V represents the set of atoms where each vertex is an atom and the set of edges E represents the bonds between the atoms. The molecular graph is an undirected weighted sparse multigraph. The usage of a graph to depict a molecular structure allows applying well-known graph theory algorithms to it, in order to extract meaningful topological information. These molecular graphs are commonly represented as adjacency matrices or as collections of adjacency lists, where two atoms are adjacent if there is a chemical bond connecting them.

However, adjacency matrices are only one of the possible matrices that can be calculated from a molecular graph, and that can contain different information. These matrices, also called graph-theoretical matrices can be molecular descriptors by themselves or starting points for other molecular descriptors. There are three main types of graph-theoretical matrices: vertex matrices, edge matrices and incidence matrices. Vertex matrices are square matrices where rows and columns refer to graph vertices that are the atoms of the molecule, and each element of the matrix contains information related to the pair of atoms. Edge matrices are also square matrices where rows and columns refer to graph edges that are the chemical bonds of the molecule, and each element of the matrix contains information related to the pair of bonds. Incidence matrices are generally not square and contain information about different types of objects in their rows and columns, such as atoms versus edges, or even bigger molecular elements, such as cycles, molecular fragments or paths.

The adjacency matrix, which is a vertex matrix, allows calculating the Lovasz–Pelikan index [58], which is the largest eigenvalue of the matrix. The other most-used vertex matrix is the topological distance matrix, where each value d_ij of it represents the number of edges in the shortest path between the vertices i and j in the molecular graph. The most known and used topological index in chemistry, based on the topological distance matrix, is the Wiener index. This index is defined by summing the length of all shortest paths between non-hydrogen atom pairs. For a molecule, the half-sum of its distance matrix returns its Wiener index. Much more vertex, edge and incidence matrices-based molecular descriptors have been described across the years by numerous researchers and have then been compiled by Todeschini and Consonni [59].

Using a matrix representation of molecular graphs allows applying a multitude of matrix operators and manipulators enabling calculation of numerous sets of molecular descriptors that highlight diverse information about the molecules.

##### Chemical Indices

The first usage of chemical indices came from the studies of Calingaert and Hladky [60]. In the study, the proportion of molecular volume and the number of carbons in a hydrocarbon was described as a chemical index for the properties of hydrocarbons. The similar index was used also in the study of Kopp [61], summing the different atom types to describe the volume and densities of molecules. Besides the usage of atom numbers as indices, Wiener introduced one of the earliest graph invariants for the correlation between molecular properties and structural features. In his study, the structural index was used for the determination of paraffin’ boiling point [62]. In 1971, Hosoya introduced a new index: the Hosoya index, which is the number of edge matchings in a graph and formulated as:

${\displaystyle H=\sum _{i=0}^{n/2}P(G,i)}$

In this formula, P(G,i) is the number of matchings of i-mutually non-adjacent edges in the graph; in other words, i-covering of chemical graphs. The index has been used in a variety of applications such as the modelling of physicochemical properties of hydrocarbon atoms [63]. The usage of the index was described in Hosoya’s review [64]. In 1947, bonds were differentiated by valences of their end vertices with the study of Hartmann [65]. This study was extended by Randic’s topological index in 1975 [66]. It was also called connectivity index since the formula was based on bond weights. In the formula, given below, d(i) and d(j) represent atom valences for vertices i and j. Bond weight is calculated with the formula [d(i).d(j)]-½ . Thus, the Randic index is calculated by summing all bond weights :

${\displaystyle \chi =\sum _{alledges}[d(i)*d(j)]^{1/2}}$

This index is not a reliable descriptor for the characterization of molecules since non-isomorphic structures might have the same Randic index [67]. Later, Balaban contributed to the field by slightly modifying the Randic Index [68]. Different than the Randic index, the average of the bond weight sum was taken in the Balaban index. The index formula is :

${\displaystyle J=[M/(\mu +1)]\sum _{alledges}[d(i)*d(j)]^{1/2}}$

The average value is calculated by multiplying the Randic index by [M/(m+1)]. M is the number of edges and m is the number of cycles in the graph.

In the literature, there are more than 100 different chemical indices, often described as topological indices, which they are generally similar to.

##### Other theoretical molecular descriptors

Geometrical descriptors are based on three-dimensional molecular structures, where the position of the atoms in the 3D space is known and the connections between them are defined. Molecular 3D structures can be experimentally elucidated from crystallographic or NMR data or computed using molecular optimization algorithms. Geometrical descriptors have a higher information content than those based on 2D structures, such as topological descriptors and chemical indices, but have to be treated with the awareness that their values heavily depend on the molecule conformation, and can vary depending on the latter. Two of the most known classes of three-dimensional descriptors are the WHIM (Weighted Holistic Invariant Molecular) descriptors [69] and the GETAWAY (Geometry, Topology, and AtomWeights Assembly) descriptors [70].

Molecular fingerprints are structural keys that are a well-defined bit list of molecular features present or absent in a molecule. These features can be 2D and/or 3D structural properties and have been very useful for molecular searches, in particular the similarity search, due to their computational effectiveness. They are particularly suitable for Tanimoto and Tversky similarity indices. The molecular fingerprints are determined by fragmenting the molecule in all possible substructures following a set of rules. MACCS and PubChem fingerprints are the most widely used nowadays. The MACCS fingerprint has been developed by the MDL Information Systems (now BIOVIA) and the public version contains 166 pre-computed molecular features that might be present or absent in a molecule. The PubChem fingerprint has been developed to enable efficient and fast similarity search in the PubChem database and is composed of 881 pre-computed structural features. An alternative to the structural keys is hashed fingerprints. This type of fingerprint does not require a pre-computed set of molecular features, but rather breaks the molecular structure on a set of all possible substructures following a set of pre-defined rules. The path-centred Morgan fingerprints and the atom-centred circular fingerprints are the most-used hashed fingerprints, and they are generally used to find structural patterns and substructural similarities in molecules.

##### Rules for good molecular descriptors

Research on (theoretical) molecular description has been growing since the beginning of the 21st century with the exponential availability of chemical structures in databases and the necessity to predict not available experimental data (for drug toxicity for example). The increasing amount of molecular descriptors required the establishment of 13 simple rules by Randic [71] they should comply with, completed later with one more rule by Guha and Willighagen [72]:

1. Should have structural interpretation
2. Should have a good correlation with at least one property
3. Should preferably discriminate among isomers
4. Should be possible to apply to local structure
5. Should be possible to generalize to"higher" descriptors
6. Descriptors should preferably be independent
7. Should be simple
8. Should not be based on properties
9. Should not be trivially related to other descriptors
10. Should be possible to construct efficiently
11. Should use familiar structural concepts
12. Should have the correct size dependence
14. Should preferably have calculated values in a suitable numerical range for the set of molecules where it is applicable to

## Toolkits

All cheminformatics toolkits can also be considered as chemical graph theory software. Besides cheminformatics toolkits, the general graph theory libraries are also listed. These graph libraries are used for different purposes, such as subgraph search and graph isomorphism. The list of notable software is given below:

ASSEMBLE - - http://www.upstream.ch/main.html?src=%2Findex.html
CDK Open source Java https://cdk.github.io [73]
COCON - - http://cocon.nmr.de -
Cytoscape Open source Java https://cytoscape.org -
DENDRAL - - http://www.softwarepreservation.org/projects/AI/DENDRAL/DENDRAL-CONGEN_GENOA.zip/view -
iGgraph GNU GPL 2 C/C++, Mathematica, R, Python https://igraph.org/ -
JGraphT LGPL 2.1 and EPL 2.0 Java, Python https://jgrapht.org/ -
LSD - - http://eos.univ-reims.fr/LSD/index_ENG.html -
MOLGEN Proprietary C http://www.molgen.de/ -
MolSig Open source C http://molsig.sourceforge.net -
NAUTY Open source C http://pallini.di.uniroma1.it [74]
OMG Open source C, Java https://sourceforge.net/p/openmg/code/ci/master/tree/src/org/omg/ -
OpenBabel Open source C, Python, Ruby http://openbabel.org/ [75]
PMG Open source C, Java https://sourceforge.net/projects/pmgcoordination/ -
RDKit Open source C++, Java, Knime, Python http://www.rdkit.org/
SENECA Open source - https://github.com/steinbeck/seneca -

## References

1. ^ Bonchev D. Chemical graph theory: introduction and fundamentals. CRC Press; 1991.
2. ^ Alexanderson G. About the cover: Euler and Königsberg’s Bridges: A historical view. Bull Am Math Soc. 2006;43: 567–573.
3. ^ Cayley A. On the analytical forms called trees, with application to the theory of chemical combinations. Rep Brit Assoc Adv Sci. 1875;45: 257–305.
4. ^ Kirchhoff G. Ueber die Auflösung der Gleichungen, auf welche man bei der Untersuchung der linearen Vertheilung galvanischer Ströme geführt wird. Ann Phys. 1847;148: 497–508.
5. ^ Bonchev D. Chemical graph theory: introduction and fundamentals. CRC Press; 1991.
6. ^ Cardwell DSL, Dalton J, others. John Dalton & the progress of science. 1968.
7. ^ Dalton J. A New System of Chemical Philosophy: Pt. 1/2. Dawson; 1808.
8. ^ Hein GE. Kekule and the architecture of molecules. Adv Chem Kekule Centen. 1966;61: 1–12.
9. ^ Werner A. Chem. 3 (1893) 267.(b) A. Werner. Justus Liebigs Ann Chem. 1912;386.
10. ^ Karrer P. Helv. dum. Acta 3 (1920) 620; ders. u. Mitarbeiter. Helv Chim Acta. 1921;4: 185–263.
11. ^ Butlerov AM. Einiges über die chemische Struktur der K [Page no. 1] per. Z Chem Pharm. 1861;4: 549–60.
12. ^ FRS WH. A Comparative View of the Phlogistic and Antiphlogistic Theories.. J. Murray; 1791.
13. ^ Couper AS. Sur une nouvelle théorie chimique. Annales de chimie et de physique. 1858. pp. 488–489.
14. ^ Frankland E. Lecture notes for chemical students: Embracing mineral and organic chemistry. J. Van Voorst; 1866.
15. ^ Cayley P. LVII. On the mathematical theory of isomers. Lond Edinb Dublin Philos Mag J Sci. 1874;47: 444–447.
16. ^ Sylvester JJ. On an application of the new atomic theory to the graphical representation of the invariants and covariants of binary quantics, with three appendices. Am J Math. 1878;1: 64–104.
17. ^ Acharya B. On the cyclomatic number of a hypergraph. Discrete Math. 1979;27: 111–116.
18. ^ Lodge OJ. XLII. On nodes and loops in connexion with chemical formul\a e. Lond Edinb Dublin Philos Mag J Sci. 1875;50: 367–376
19. ^ Cayley P. LVII. On the mathematical theory of isomers. Lond Edinb Dublin Philos Mag J Sci. 1874;47: 444–447.
20. ^ Berzelius J. On the composition of tartaric acid and racemic acid (John’s acid from the Vosges Mountains), on the atomic weight of lead oxide, together with general remarks on those substances which have the same composition but different properties. Poggendorf’s Ann Phys Chem. 1830;19: 305.
21. ^ Butlerov AM. Ueber die Verwandtschaft der mehraffinen Atome.”. Z Für Chem. 1862;5: 297–304.
22. ^ Al R, others. Carbohydrate chemistry: Fundamentals and applications. World Scientific Publishing Company; 2018.
23. ^ Pasteur L. Comptes rendus hebdomadaires de l’Académie des Sciences, Paris. 1848.
24. ^ Al R, others. Carbohydrate chemistry: Fundamentals and applications. World Scientific Publishing Company; 2018.
25. ^ Bonchev D. Chemical graph theory: introduction and fundamentals. CRC Press; 1991.
26. ^ Kerber A, Laue R, Meringer M, Rücker C, Schymanski E. Mathematical chemistry and chemoinformatics: structure generation, elucidation and quantitative structure-property relationships. Walter de Gruyter; 2013.
27. ^ Pevac S, Crundwell G. Polya’s isomer enumeration method: A unique exercise in group theory and combinatorial analysis for undergraduates. J Chem Educ. 2000;77: 1358.
28. ^ Bytautas L, Klein DJ. Chemical combinatorics for alkane-isomer enumeration and more. J Chem Inf Comput Sci. 1998;38: 1063–1078.
29. ^ Dias JR. A periodic table for polycyclic aromatic hydrocarbons. IV. Isomer enumeration of polycyclic conjugated hydrocarbons. 2. J Chem Inf Comput Sci. 1984;24: 124–135.
30. ^ Balasubramanian K. Combinatorial Enumeration of Isomers of Superaromatic Polysubstituted Cycloarenes and Coronoid Hydrocarbons with Applications to NMR. J Phys Chem A. 2018;122: 8243–8257.
31. ^ Sutherland G. DENDRAL - A computer program for generating and filtering chemical structures. Stanf Artifical Intell. 49: 34.
32. ^ Serov VV, Elyashberg ME, Gribov LA. Mathematical synthesis and analysis of molecular structures. J Mol Struct. 1976;31: 381–397. doi:10.1016/0022-2860(76)80018-X
33. ^ Badertscher M, Korytko A, Schulz KP, Madison M, Munk ME, Portmann P, et al. Assemble 2.0: A structure generator. Chemom Intell Lab Syst. 2000;51: 73–79. doi:10.1016/S0169-7439(00)00056-3
34. ^ Shelley CA, Munk ME. Case, a computer model of the structure elucidation process. Anal Chim Acta. 1981;133: 507–516. doi:10.1016/S0003-2670(01)95416-9
35. ^ Carhart RE. GENOA: A computer program for structure elucidation utilizing overlapping and alternative substructures. J Org Chem. 1981;46: 1708–1718.
36. ^ Sasaki SI, Abe H, Hirota Y, Ishida Y, Kudo Y, Ochiai S, et al. CHEMICS-F: A Computer Program System for Structure Elucidation of Organic Compounds. J Chem Inf Comput Sci. 1978;18: 211–222. doi:10.1021/ci60016a007
37. ^ Gugisch R, Kerber A, Kohnert A, Laue R, Meringer M, Rücker C, et al. MOLGEN 5.0, a Molecular Structure Generator in Advances in Mathematical Chemistry. Adv Math Chem Basak SC Restrepo G Villaveces JL Eds.
38. ^ Blinov K, Elyashberg M, Molodtsov S, Williams A, Martirosian E. An expert system for automated structure elucidation utilizing 1H-1H, 13C-1H and 15N-1H 2D NMR correlations. Fresenius J Anal Chem. 2001;369: 709–714.
39. ^ Christie BD, Munk ME. Structure Generation by Reduction: A New Strategy for Computer-Assisted Structure Elucidation. J Chem Inf Comput Sci. 1988;28: 87–93. doi:10.1021/ci00058a009
40. ^ Madison M, Schulz K, Korytko A, Munk M. SESAMI: An integrated desktop structure elucidation tool. Internet J Chem. 1998;1: CP1–U22.
41. ^ Korytko A, Schulz K-P, Madison MS, Munk ME. HOUDINI: A New Approach to Computer-Based Structure Generation. J Chem Inf Comput Sci. 2003;43: 1434–1446. doi:10.1021/ci034057r
42. ^ Bohanec S. Structure Generation by the Combination of Structure Reduction and Structure Assembly. J Chem Inf Comput Sci. 1995;35: 494–503. doi:10.1021/ci00025a017
43. ^ Allard P-M, Péresse T, Bisson J, Gindro K, Marcourt L, Pham VC, et al. Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication. Anal Chem. 2016;88: 3317–3323. doi:10.1021/acs.analchem.5b04804
44. ^ Djoumbou-Feunang Y, Pon A, Karu N, Zheng J, Li C, Arndt D, et al. CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification. Metabolites. 2019;9: 72. doi:10.3390/metabo9040072
45. ^ Hufsky F, Böcker S. Mining molecular structure databases: Identification of small molecules based on fragmentation mass spectrometry data. Mass Spectrom Rev. 2017;36: 624–633. doi:10.1002/mas.21489
46. ^ Allard P-M, Péresse T, Bisson J, Gindro K, Marcourt L, Pham VC, et al. Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication. Anal Chem. 2016;88: 3317–3323. doi:10.1021/acs.analchem.5b04804
47. ^ Hufsky F, Scheubert K, Böcker S. Computational mass spectrometry for small-molecule fragmentation. TrAC Trends Anal Chem. 2014;53: 41–48. doi:10.1016/j.trac.2013.09.008
48. ^ Ridder L, van der Hooft JJJ, Verhoeven S, de Vos RCH, van Schaik R, Vervoort J. Substructure-based annotation of high-resolution multistage MS(n) spectral trees. Rapid Commun Mass Spectrom RCM. 2012;26: 2461–2471. doi:10.1002/rcm.6364
49. ^ Bohanec S. Structure Generation by the Combination of Structure Reduction and Structure Assembly. J Chem Inf Comput Sci. 1995;35: 494–503. doi:10.1021/ci00025a017
50. ^ Heinonen M, Rantanen A, Mielikäinen T, Kokkonen J, Kiuru J, Ketola RA, et al. FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data. Rapid Commun Mass Spectrom. 2008;22: 3043–3052. doi:10.1002/rcm.3701
51. ^ Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S. MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J Cheminformatics. 2016;8: 3. doi:10.1186/s13321-016-0115-9
52. ^ Ridder L, van der Hooft JJJ, Verhoeven S, de Vos RCH, van Schaik R, Vervoort J. Substructure-based annotation of high-resolution multistage MS(n) spectral trees. Rapid Commun Mass Spectrom RCM. 2012;26: 2461–2471. doi:10.1002/rcm.6364
53. ^ Felicity Allen, Allison Pon, Michael Wilson, Russ Greiner, David Wishart, CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra, Nucleic Acids Research, Volume 42, Issue W1, 1 July 2014, Pages W94–W99, https://doi.org/10.1093/nar/gku436
54. ^ Sebastian Böcker, Matthias C. Letzel, Zsuzsanna Lipták, Anton Pervukhin, SIRIUS: decomposing isotope patterns for metabolite identification, Bioinformatics, Volume 25, Issue 2, 15 January 2009, Pages 218–224, https://doi.org/10.1093/bioinformatics/btn603
55. ^ Todeschini R, Consonni V. Handbook of Molecular Descriptors. John Wiley {&} Sons; 2008. Available: https://books.google.com/books?hl=en%7B&%7Dlr=%7B&%7Did=TCuHqbvgMbEC%7B&%7Dpgis=1
56. ^ Mauri, A., Consonni, V., & Todeschini, R. (2016). Molecular Descriptors. Handbook of Computational Chemistry, 1–29. doi:10.1007/978-94-007-6169-8_51-1
57. ^ Mauri, A., Consonni, V., & Todeschini, R. (2016). Molecular Descriptors. Handbook of Computational Chemistry, 1–29. doi:10.1007/978-94-007-6169-8_51-1
58. ^ Lovász L, Pelikán J. On the eigenvalues of trees. Periodica Mathematica Hungarica. 1973;3: 175–182.
59. ^ Todeschini R, Consonni V. Molecular descriptors for chemoinformatics: volume I: alphabetical listing/volume II: appendices, references. John Wiley & Sons; 2009.
60. ^ Calingaert G, Hladky JW. A Method of Comparison and Critical Analysis of the Physical Properties of Homologs and Isomers. The Molecular Volume of Alkanes. Journal of the American Chemical Society. 1936;58: 153–157.
61. ^ Kopp H. Ueber den Zusammenhang zwischen der chemischen Constitution und einigen physikalischen Eigenschaften bei flüssigen Verbindungen. Justus Liebigs Ann Chem. 1844;50: 71–144.
62. ^ Wiener H. Structural determination of paraffin boiling points. J Am Chem Soc. 1947;69: 17–20.
63. ^ Hosoya H, Hosoi K, Gutman I. A topological index for the totalπ-electron energy. Theor Chim Acta. 1975;38: 37–47.
64. ^ Hosoya H, Trinajstic N. Mathematics and Computational concepts in Chemistry. Horwood Chichester. 1986; 110.
65. ^ Hartmann H. Eine neue quantenmechanische Behandlung von CH4 und NH4+. Z Für Naturforschung A. 1947;2: 489–492.
66. ^ Randic M. Characterization of molecular branching. J Am Chem Soc. 1975;97: 6609–6615.
67. ^ Trinajstic N. Chemical graph theory. Routledge; 2018
68. ^ Balaban AT. Highly discriminating distance-based topological index. Chem Phys Lett. 1982;89: 399–404. doi:10.1016/0009-2614(82)80009-2
69. ^ Todeschini R, Gramatica P. New 3D Molecular Descriptors: The WHIM theory and QSAR Applications. In: Kubinyi H, Folkers G, Martin YC, editors. 3D QSAR in Drug Design: Ligand-Protein Interactions and Molecular Similarity. Dordrecht: Springer Netherlands; 1998. pp. 355–380. doi:10.1007/0-306-46857-3_19
70. ^ Consonni V, Todeschini R, Pavan M. Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. 1. Theory of the novel 3D molecular descriptors. Journal of Chemical Information and Computer Sciences. 2002;42: 682–692. doi:10.1021/ci015504a
71. ^ Randić M. Molecular bonding profiles. J Math Chem. 1996;19: 375–392. doi:10.1007/BF01166727
72. ^ Guha R & Willighagen E (2012) A survey of quantitative descriptions of molecular structure Curr Top Med Chem 12:1946-56 [PMID: 23110530][DOI]
73. ^ Willighagen et al. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform. 2017; 9(3), doi:10.1186/s13321-017-0220-4
74. ^ McKay, B.D. and Piperno, A., Practical Graph Isomorphism, II, Journal of Symbolic Computation, 60 (2014), pp. 94-112,https://doi.org/10.1016/j.jsc.2013.09.003
75. ^ N M O'Boyle, M Banck, C A James, C Morley, T Vandermeersch, and G R Hutchison. "Open Babel: An open chemical toolbox." J. Cheminf. (2011), 3, 33. DOI:10.1186/1758-2946-3-33