|
Bayesian Inference"Beyond active and inactive" - Bayesian statistical modelling and ROC techniques.Traditional QSAR techniques are opaque to the user. Too many models function as a black box - all that there is to show for sometimes days of computation is a single verdict, "active" or "inactive". But without any means of interpreting the computation, the result is next to worthless. How confident is the computer of the prediction? On what evidence is it based and how was it derived? How much of the result is hopeful extrapolation and how much is firmly founded? Choracle are world leaders in providing answers to these questions to users of its models. Both Choracle's modelling techniques and prediction strategies are soundly based on Bayesian statistical techniques, allowing the greatest possible value to be extracted from the data, and greatest power in interpreting the predictive results. Bayesian statistical principles were first formulated by Laplace in 1818 and after a period of neglect in the 20th century are increasingly used in all areas of information processing (medicine, astronomy, physics) where data are scarce and robust inference is necessary. Choracle uses Bayesian modelling techniques, which automatically prevent models being developed which make untenable inferences from their data by "overfitting" to peculiarities and noise in the original data set. This automatically allows the data to be used more efficiently, since less has to be held out as a "control" set for independent validation. Furthermore, Choracle uses Bayesian methods for presentation and intepretation of model results. All knowledge (especially from QSAR modelling) is uncertain, and it is critical for users of a prediction to know how much confidence should be attached to it. Rather than a simple rating of "active" or "inactive", Choracle models provide an "odds ratio" in favour of activity that summarises the information that has been obtained by running the model. Each model prediction is plotted on a "Receiver Operating Characteristic" (ROC) curve, which allows users to see at a glance the portion of its operating curve that a given model prediction comes from. ROC curves have enjoyed a long history of productive use in the medical community, where the outcome of possibly expensive tests must be associated with the correct diagnostic response reliably and rapidly. Bayesian techniques inherently recognise the fact that QSAR prediction is but a small component in a much larger chain of activities, and that at some point a prediction result will be used to inform some action (testing a compound, taking forward a hit to a further stage of development) which has different costs associated with different outcomes. By providing odds-based output and ROC information, Choracle allows model users to account for both the inherent inaccuracies within the model itself as well as those for a particular prediction. In this way, chemists are given the maximum possible help in deciding which computer predictions are best followed up and which best ignored. Each Choracle prediction is traceable back to literature references and/or test data on which it is founded, allowing chemists to apply their own judgement in identifying sound inferences. Read more about Bayesian Inference in the Physical Sciences. |