CAESA, Computer-Aided Estimation of Synthetic Accessibility, has a simple
aim: to score a target compound by estimating the difficulty faced in
attempting to synthesise it in the chemical laboratory and produce a list
of starting materials for the synthesis. It can do so within a few
seconds, far faster than a team of post-docs checking laboratory shelves
and supply catalogues, Indeed, CAESA can provide feasible solutions to
some very complex synthetic problems.
The program works partly by analysing a target chemical's structure for
complex features such as fused or bridged rings and assigning a complexity
score. CAESA also recognises that apparent complexity might be circumvented
if starting materials incorporating these complex features are available.
By carrying out a retrosynthetic analysis, CAESA program works backwards
from the target looking for appropriate and available starting materials
that could be stitched together by known chemical reactions to rebuild the
product. A higher score means a more difficult or a lower-yielding
reaction.
It sounds like a straightforward and sensible idea. A medicinal chemist
may have designed a nice molecule to dock with a diseases enzyme, but how
easy will be making the compound for testing. To be commercially useful
CAESA must match synthetic chemists doing the job themselves. There are
four criteria on which a judgement might be made.
First, the program should find the lowest required number of synthetic
steps. Secondly, it must take into account the reaction difficulty and/or
the plausible yields at each step. Thirdly, starting materials selected
have to be either "off-the-shelf" or easily made in any laboratory. The
developers collaborate with supply companies including Acros, Lancaster
and Sigma Aldrich to keep the databases of starting materials current.
There are about 75000 structures included in the starting materials
database of the CAESA package at present. Finally, the scheme must involve
the minimal number of FGIs (Functional Group Interconversions); syntheses
that have many FGIs are never easy, or cheap.
For example, the simple FGI halide conversion to an alcohol is assigned a
rating of 1. In contrast, a much more sophisticated chemical change, such
as the Pauson-Khand reaction, which is a [2+2+1] cycloaddition and
converts an alkene, alkyne, and carbon monoxide producing a cyclopentenone
scores 4 in CAESA's eyes.
Users have a degree of control over how CAESA will rank a product's
synthetic complexity. Synthetic distance is input to begin the analysis
and this can be used to limit the total difficulty or number of steps a
user will tolerate. CAESA then selects starting materials based on their
total coverage of the target compound, so that the starting material which
covers the most area will be the most favoured. CAESA's underlying mode of
action is a rule-based expert system, various built-in knowledge bases
carry out different functions in order to identify starting materials and
estimate the synthetic accessibility.
CAESA has three ever-growing retrosynthetic knowledge bases, each
containing different transformations and information to allow an
evaluation to be carried out. The first contains strategic disconnections,
the retrosynthetic equivalent of a bond-making reaction. These simple and
versatile disconnections, such as amide or ester disconnections, all CAESA
to cut a large target molecule into manageable chunks. The second
knowledge base contains more general disconnections. This database is the
main information source of the program called into action in most
analyses. The third knowledge base contains simple FGIs, and is used to
transform simple intermediates. Another trick up CAESA's sleeve is in
exploiting a two-directional approach to the analysis. Some simple
synthetic transformations are applied to all starting materials in the
database generating an expanded virtual library of available materials.
The developers carried out a number of validation tests on CAESA with 75
disparate chemical structures. The primary test, involved comparing
CAESA's answers with those devised by a group of expert synthetic chemists
at a large pharmaceutical company. Out of the 75 target structures, 65
analyses were in full agreement with the schemes generated by the experts.
Of the ten cases that were not quite right, seven contained fused
heterocycles. Information on these common but synthetically complex groups
is not currently included in CAESA in great detail but will be added in a
future release. The same applies to compounds with phosphate groups, an
area with few transformations. Most synthetic chemists know far more
reactions than CAESA has in its current knowledge base, but in contrast,
CAESA knows far more about the commercial availability of starting
materials. CAESA has arrived, analysed and could one day conquer.
Industrial chemists and pharmaceutical scientists will love CAESA - it
saves them time and effort! It allows targets to be assessed and organised
according to complexity of synthesis as well as helping in the search for
more efficient routes to a compound.
CAESA started life in the chemistry department at Leeds University. The
client side works via a web-browser on any operating system and in a
server side system can be customized to the users own databases and ways
of working.