AI Rebuilds Molecules From Exploding Fragments

By  Ula Chrobak

Researchers at the Department of Energy’s SLAC National Accelerator Laboratory and collaborating institutions recently built a generative AI model that can recreate molecular structures from the movement of the molecule’s ions after they are blasted apart by X-rays, a technique called Coulomb explosion imaging.

The research, published in Nature Communications, is an important step toward being able to take snapshots of molecules during chemical reactions – an advance that could have important impacts in medicine and industry. The machine learning model closely predicted the geometries of a range of different molecules made of less than ten atoms, paving the way for applying the technique to larger molecules. “We were pretty excited about this,” said Xiang Li, an associate scientist at SLAC’s Linac Coherent Light Source (LCLS) and lead author of the study. “It is the first AI model built for molecular structure reconstruction from Coulomb explosion imaging.”

A new way to see molecules

Currently, there are limited options available for imaging isolated gas phase molecules. With electron microscopy, for example, subjects must be fixed in place, making it impossible to image free-floating molecules. And for diffraction-based techniques to work, the sample of molecules needs to be dense enough to generate a strong signal in the detector. The resulting image is technically an average of many molecules, restricting researchers from studying details only visible when imaging isolated molecules.

In the paper, the researchers instead focused on Coulomb explosion imaging. In this technique, an X-ray pulse hits a single molecule in a vacuum chamber, ripping off the molecule’s electrons. This leaves behind positive ions that explosively repel away from each other and smash into a detector. The detector captures their momentum, which can be used to reconstruct the structure of the molecule. “This technique has the ability to isolate minor details that are chemically relevant,” said James Cryan, LCLS interim deputy director for science, research and development, associate professor of photon science at SLAC and coauthor of the paper.

But this reconstruction process has so far been largely infeasible due to computing constraints. After the X-ray pulse strips away electrons, the remaining ions do not explode apart instantly. During this brief delay, the atoms can shift slightly, making it difficult to reconstruct the original structure using Coulombs law for electrostatic forces. “It will not be accurate because a simple use of that law only works if the charge-up process is instantaneous,” explained Li.

Making things even messier, every additional atom in the molecule adds an exponential level of complexity. “It’s very challenging to work backwards to get the original structure,” said co-author Phay Ho, a physicist with DOE’s Argonne National Laboratory. “It’s kind of like breaking a glass and trying to put it back together from how the pieces flew apart. Many problems in modern physics and chemistry involve reconstructing hidden structures from indirect measurements. This work demonstrates how AI can help tackle such inverse problems.”

Machine learning for molecular structures

The research team set out to build a machine learning model that could overcome this computing constraint. They developed and trained the model at SLAC’s Shared Science Data Facility (S3DF). Generative AI models are well-suited for the task because they “think” differently than a standard computer simulation. Instead of working through a series of equations, they learn by finding patterns in training data. Then, they use those patterns to make statistical predictions.

To gather training data, the team turned to a simulation built by Ho. The simulation analyzes molecular structures and calculates the momentum of their ions following a Coulomb explosion. After running for over a month, the computing-intensive simulation, using both quantum mechanics and classical physics equations, produced a dataset of 76,000 molecular samples.

Initially, the researchers trained the AI on this dataset alone, which is small by AI-training standards, and they found the model predicted inaccurate structures from explosion data. So, they re-did the training, adding in another dataset derived using only classical physics. The second set was less precise but about 100 times larger than the first one.

This two-step training was the trick for predicting precise structures.

The researchers tested the AI model by prompting it to predict molecular structures in a portion of the simulation data it had not seen in training. The model, which the team named MOLEXA (short for “molecular structure reconstruction from Coulomb explosion imaging”), took the ion momenta and calculated the most likely structures. “We found that this two-step training process suppressed the prediction error by a factor of two,” said Li.

The team then tested MOLEXA with experimental datasets recorded at the Small Quantum Systems (SQS) instrument of the European X-ray Free-Electron Laser facility (European XFEL) in Germany. The molecules they tested included water, tetrafluoromethane and ethanol. They entered the experimental ion momenta into the model, reconstructed the molecular structures, and then compared the reconstructions to known structures listed by the National Institute of Standards and Technology.

They found the predictions largely overlapped with the established structures. Overall, the bonds were in the right spots, with only slight variations in their angles. The errors in position were generally less than half the length of a typical chemical bond. “The model is actually, most of the time, doing better than that,” added Li. “It is only a starting point for future research, which will not only improve model accuracy but also extend its applicability to larger molecular systems.”

Expanding to larger molecules and chemical reactions

The paper is a major step in advancing Coulomb explosion imaging, which has long been limited by the challenge of reconstructing molecular structures from experimental measurements. In future work, the researchers plan to scale up the number of atoms the machine learning model can piece back together and apply the model to time-resolved experiments at the LCLS and European XFEL. That will help researchers to reconstruct snapshots of molecules in motion, creating flip-book-like molecular movies with insights into how chemical reactions unfold. It will also help with the interpretation of data collected at the high X-ray pulse rates delivered by SLAC’s superconducting X-ray laser, Cryan said.

The team is also now testing the model’s ability to reconstruct molecules from incomplete data. Much of the time, the detector misses an ion produced in the Coulomb explosion. Li wants to know, for example: Can the AI still reconstruct an ethanol molecule if one or more of its hydrogen ions are not registered in the detector?

If these challenges are resolved, the technique could become more applicable in biology and chemistry research. Proteins, for instance, can consist of thousands of atoms. “That’s really the goal,” said Li. “We will be able to study systems that are more biologically or industrially relevant.”

The team also included researchers from the Stanford PULSE Institute; Stanford University; Kansas State University; European XFEL, Germany; the Max Planck Institute for Nuclear Physics, Germany; Fritz Haber Institute, Germany; and Sorbonne University, France. Large parts of this work were funded by the Department of Energy’s Office of Science. LCLS is an Office of Science user facility.

CATEGORIES:

Tags:

No Responses

Leave a Reply

Your email address will not be published. Required fields are marked *