2014 Workshop Wednesday Lunchtime Discussion
Discussion leaders: Vijay Pande and David Case
Discussion: What sort of forcefields are needed?
- Water
- Proteins
- Case shameless promotion: two new fixed-charge Amber ff: ff14SB, ff14ipq (implicit polarization charges)
- other "prominent": c35/6, OPLS 2.1, GROMOS
- polarizable forcefields, other "advanced" forms
- Druglike molecules, cofactors, substrates, ...
- virtual screening applications: eg GAFF, GAFF2, CGenFF, OPLS-AA, perma-Frosst
- lead opt
- what is current (best?!) practice
- is a community effort feasible? atom typing, paramters, etc
- Lipids, carbohydrates, nucleic acids
Bernie Brooks: What you need depends on what you're trying to do. e.g. reweight to QM like B3LYP, need to match deficiencies
Alan Mark: What target are we trying to reproduce?
Dave Case: We all want to get the fluctuations and response to perturbations of proteins right
GAFF is null model now; have to show you can do better than GAFF
GAFF2 is a major extension that will happen sometime soon
What do people actually use?
Chodera: What parameterization engines are actually available? Antechamber, paramchem, Schrödinger tool, ATB,
Mobley: Our approach has just been to try to get the right answer for the forcefield
Alan Mark: Transferrability is an enormous problem. We make a lot of assumptions about functional forms. We probably have to be able to depart from rigid combination rules. QM calculations for 10,000 molecules stored on back-end. Want to iteratively be able to improve over time. Working on using hydration free energies right now.
?: Benoît's group has developed a tool called GAAMP.
Case: Would it be useful to have a repository of bad things in GAFF? Not a centralized way of encouraging people to share this information.
Chodera: GitHub? Blog post? Need a central maintainer
Woolf: Our framework could help with this; could host it at Hopkins; could help with this, but can't be point person
Mobley: We've asked to have dielectric corrections to GAFF incorporated, but no developments.
Mark: Warning about many contributors. "Berger lipid parameters" represent many different forms of this forcefield all called the same thing with the same citation.
Bayly: Forcefield development is not a light undertaking. At least a repository of issues with GAFF would be good.
Chodera: Would love to see open community forcefield effort with governing board, open GitHub repo with versioned forcefields and datasets.
?: If we're using hydration free energies to improve forcefields, we should make sure the water models can also reproduce bulk water properties very well.
?: Is there some sort of exercise that evaluates all drugs on market to reproduce properties.
Stouch: What functional form should the forcefield have? Other examples like CFF forcefields and forcefields with cross-terms
Challenges
- Finding the best parameters
- do we have enough experimental data?
- what is the right level of QM for parameterization?
- Choosing the right functional
- what's physical? (12-6 LJ?)
- what's necessary (better electrostatics, polarization, etc)
- Building a community effort
- a lot for a single lab to do
Vijay's suggestion: if we can agree on a protocol (e.g. like ForceBalance) then we an break up the effort into multiple subparts (water, protein, lipid, NA, carbohydrates, small molecules) and still be consistent
Dave Case: QM is a surprisingly good target function.
Bayly: Torsions are the waterloo of forcefields. No alternative for torsion terms.
Beauchamp: Lots of progress has been made on protein forcefield using NMR data. Are QM and NMR-derived torsions converging? What about utility for small molecules?
Case: NMR may not be so helpful for small molecules?
?: How do we validate forcefields?
Pande: Have to have a training set and a test set
Case: Small molecule crystal simulations are probably a great test. Many small molecule structures also have enough waters to be partially hydrated.
Bernie Brooks: This is how charmm19 parameters were generated!
Bayly: Division into different types of forcefields communicates challenges are different for each class. Only one water, but want to be really really good. Protein forcefield limites set of residues but need transferability. Small molecule space is huge.
Case: Progress will be incremental. Two years ago we all agreed there needs to be a test set.
?: To address funding issue, would like to know from Schrödinger how many man-years went into new forcefield development effort.
Abel: A lot. Worked at it for five years. Lots of people involved, but changing set of people.
Stouch: Alex MacKerrell has a parameterization engine (ParamChem). How fast is that?
Shirts: If enough people supported it, could get time on XSEDE resources for a server.
Case: So we can't have a Kickstarter campaign?
Mark: Expensive part is doing QM calculations. These are fixed for the sorts of molecules. Would like to see QM calcultions in a repository somewhere. 99% of computational cost is coming to repeat calculations that others have done with Gaussian, GAMESS, etc. Database is currently a private database, but if we could arrange this to be a public shared dataset, then it would prevent duplication of effort.
Shirts: NIST has a lot of these.
Mark: This sort of database could potentially be a commercial product.
?: Materials science community has managed to precompute QM properties for 10M chemicals. "Materials Genome Initiative"
Stouch: Could we crowdsource QM calculations once we set up a database?
Christ: We should carefully read our software licenses before we do things like that.