Tuesday Lunchtime Discussion
Tuesday Afternoon Discussion: Is our code doing the correct thing?
Some standard set of tests that different groups to doing the same thing
- Some debate if we should get within experimental error or statistical error?
- We ideally want to replace the experiment, so we need to agree within it
- But we need to agree to statistical error for consistent
- It's hard to converge these numbers
- We can actually learn a lot by converging simple calculations
- Between packages, we really can't make direct comparisons since certain aspects are treated differently (e.g. temperature)
- But we really need to converge to the same answer that someone will use to make a prediction
- We need to come up with some baseline number of what should our convergence criteria be
- What is considered a good correlation?
- Companies are throwing tons of money at a number they can trust
- For most companies, there is a limit to what they think is efficient, at some compute cost, experiments are cheaper
- For people developing new code, there needs to be some kind of standard test set
- We have regression tests which test the code, but not the methods
- GROMACS for example has hundreds of regression tests but still isn't good enough to catch many bugs.
We are not necessarily competing with experiments
Assays come out, chemists make the easiest compounds, simulations can work on the other stuff so after 1 week, we have an idea for more compounds to test from computations
There really is not much agreement on what and how we should test some kind of optimized set
- We need numbers that are converged to be reliable
- MRS/Pande paper of sidechain analogue results is a good benchmark set
- We should get down to a very small number
How accurate do we predict the binding affinities Wall Street approach:
- Model is proposed, historical data is used, see if we can get what actually happened
- May have issue of overfitting
- This may be viable where we look at previously designed dugs and lead opimization procedures
- We don't really have the data yet to do this though
- We are beginning to accrete a body of data that we did not have 2 years ago, we can come up with the questions people would actually want to ask
- e.g. How long would it take to converge a calculation if I introduce a new atom, or what happens if i change ring size?
- gathering the body of data will allow people to phrase their questions within the database of what we have done
- As industry uses it, more practical problem questions can actually be raised
Going straight to lead optimization may be going to fast
- Let's build on what we are confident in (test accuracy on it first)
- How much more complicated from there
- Once we get precise, then we can work on accurate
- We had a question previously of FEP databases ever done, where is it?
- We have databases of chemical reactions, why cant we have then for Free Energy transformations?
How fast does do we get a useful result?
- If used in tandem with expt. its okay to be a bit slow (~1week)
- Lead optimization can really be a 5year process, so there is a bit of time we have
- Mostly agree, but there are different kinds of problems in lead optimization
- Lots of chemists design problems that are solvable in the time frame of the posed question
- There are different thresholds to cross, a 24hr problem, a 1week problem, a 1mo problem
It may be realistic that molecular weight correlates with affinity
- Dispersion always makes a big contribution
- Water is always a less dense fluid than the protein
- so dispersion is enhanced in pocket
- so tighter still fitting ligand will always bind better (tox etc. excluded)
Are we getting the right answers for the right reasons?
- do we get the same answer from different programs? different people? different protocols?
- deviation from X-ray structures may indicate problems? how to quantify?
- evaluate crystal quality beforehand, especially ligand density; automated?
- insufficient number of counterions; poor counterion sampling/decorrelation
- what parts of active site move for ligands in a congeneric series? if a part of the protein we hadn’t seen move before changes conformation, that is potentially a red flag
- infrequent events are bad for convergence; how can we detect them? number of rotamer transitions? MSMs? correlations with dV/dlambda?
- detecting ergodicity problems with simulations we’ve run vs. gross errors vs. setup errors
- compare different community simulation pipelines for same input?
- consistency in thermodynamic pathways / cycle closure / redundancy
- replication of the same simulation, varying things we think don’t matter (e.g. velocities, initial binding modes, water placement, slightly different preps of same protein,
bound vs unbound structure); check consistency
- error bar estimates from multiple replicates of same simulation
- “TurboTax” style warnings and sensible default recommendations
- “Phenix”-style choices for modeling, tailoring options to user
- tooltips are useful for users in setup
- reverse convergence of DeltaG with time
- energy drift or large fluctuations; check energy conservation
- using right method (e.g. can’t use Zwanzig formula if work distribution too large)
- RMSD of protein and ligand
- were motions we expected to sample actually sampled? (e.g. loops)
- time autocorrelation of property you are interested in (simple statistical hygiene)
- need at least N independent data points to have some idea of how things are behaving