Quantitative Structure-Activity Relationships (QSAR) are empirical relationships that use molecular descriptors to quantify a specific biological activity or chemical property from the molecular structure. Typically, QSAR is used to refer to a process in which the structures of a set of compounds are quantified and then trained against their numerical values of the biological activity or physical property. The result is a mathematical model that can be used to predict the activity or property value of new compounds.
The independent variables in the QSAR equation are given in terms of the molecular descriptors, or operators on the molecular graph that strive to characterize the molecular structure. Such descriptors are based on molecular orbital theory, molecular topology, and molecular properties (lipophilicity, electronic, thermodynamic, quantum-chemical).
The quality and predictive capacity of the QSAR equation depends on the size and the diversity of the molecules found in the training set. The larger and more diverse the training set, the better suited the QSAR equation is equipped to predict activities of a new compound.
The QSAR problem uses independent variables, represented by molecular descriptors, to solve for the activity (dependent variable) of a new compound. We call this the forward QSAR problem. In contrast, the inverse-QSAR problem seeks to find values for the molecular descriptors that possess a desired activity/property value.
This problem is difficult for a number of reasons. First, one needs to solve the forward QSAR problem for a given activity. If this can be done, then the solutions will be given in terms of the molecular descriptors. The problem then lies in constructing a viable molecule from these descriptors. This is typically the limiting factor of most inverse-QSAR methods, since the descriptors are not reversible.
It is clear that the key to an effective solution methodology lies in the use of a molecular descriptor that facilitates the reconstruction of the solutions into actual compounds. Such a descriptor needs to be information rich, have good correlative abilities in QSAR applications, and most importantly, be computationally efficient. A computationally efficient descriptor should have a low degeneracy, meaning it should lead to a limited number of solutions when used with inverse-QSAR. We describe one such descriptor, named signature, that we believe meets the above criteria.