New markup language a FAIR way of sharing data in enzymology and catalysis

Author: Deanna Csomo Ferrell

An international collaboration of scientists developed a new markup language to improve the way experiments can be reproduced in the biomedical sciences, especially within fields that study how enzymes speed up chemical reactions.

The EnzymeML toolbox is a seamless communication channel among experimental platforms, electronic lab notebooks, tools for modeling enzyme kinetics, publication platforms, and enzymatic reaction databases. Santiago Schnell, the William K. Warren Foundation Dean of the College of Science at the University of Notre Dame, and professor in the Departments of Biological Sciences and Applied and Computational Mathematics and Statistics, co-led the project, which was described in Nature Methods.

Santiago Schnell and Jan Range
Schnell and Range

The collaboration included scientists from Europe, South Africa and the United States, all who delved into the issue of managing and communicating data in a consistent manner for the fields of enzymology and biocatalysis. The paper shares six user scenarios in which the publicly accessible markup language is used to analyze experimental raw data and process experimental design protocols of different enzymatic reactions in research software tools and databases typically used by scientists in the fields of enzymology and biocatalysis. 

The development of the markup language follows the Standards for Reporting Enzymology Data guidelines recommended by international chemistry journals, which help compare, evaluate and reproduce results, and supports what is known as the FAIR (findable, accessible, interoperable and reusable) representation of experimental data. 

“The biggest problem for us in this field has been the way data is captured; it can be biased,” said one of the lead authors and EnzymeML developers Jan Range, a doctoral student at the Institute of Biochemistry and Technical Biochemistry at the University of Stuttgart, in Stuttgart, Germany. “Some publications will say the temperature used in the experiment was ‘room temperature.’ But what is room temperature? When you’re in the lab you cannot reproduce that.

“Other times, pH value isn’t always documented, but it is vital for enzymatic reactions,” he continued. “When you work with biological or chemical systems, you know, they are very sensitive to small changes, and when we find these deviations due to non-reporting of certain values, we can’t really pinpoint the reason for the results.”

An EnzymeML document contains experimental information about reaction conditions, the substrate and product concentrations, and information on kinetic modeling. The tool will replace manual management of data, which can be laborious and error-prone, said Jürgen Pleiss, bioinformatics group leader and professor at the Institute of Biochemistry and Technical Biochemistry at the University of Stuttgart. In addition to developing the new instrument, the project also made colleagues aware of the relevance and opportunities of managing data according to FAIR, he said.

The explosion of data through a variety of new experimental techniques has added to the volume of information, and the overload can actually limit a researcher’s productivity.

“More and more researchers feel like they are drowning in a data tsunami,” Pleiss said. Digitalizing the information, and allowing researchers to plug their information into a user-friendly interface managed with EnzymeML, will increase efficiency.

“It improves the reproducibility of experiments and data analysis, and thus promotes trust in science,” said Pleiss.

The international nature of the collaboration, with researchers in various fields, has also strengthened the tool because EnzymeML is not simply one institution’s or one researcher’s instrument, Schnell said.

“ Catalysts and enzymes are truly fundamental parts of life. For example, every cell runs on enzymes. Enzymes are important pharmacological targets.  We use them to make and degrade plastics, textiles, housecleaning, food and beverages processing, animal nutrition, fuel cars or energy generation,” he said. “The Gold Standard of Science is replicability. The most effective way to establish a “FAIR” standard for reporting of physical constants of catalysis and enzymes with rigor, precision and replicability is through a multidisciplinary and international collaboration.”

Next, the group plans a Round-Robin replicability study that includes many other researchers to investigate how they can report the physics constants of enzymes and catalysis and use EnzymeML to enhance communication of data across electronic lab notebooks and instruments, software tools, databases and publications,  Range said.

In addition to Range, Pleiss and Schnell, other collaborators on the project include researchers from the Institute of Bio-and Geosciences in Jülich, Germany; the Aachen University in Aachen, Germany; Universitade do Porto in Porto, Portugal; Georgia Institute of Technology in Atlanta, Georgia; Technical University of Denmark; Durban University of Technology in Durban, South Africa; Heidelberg University in Heidelberg, Germany; Stellenbosch University in Stellenbosch, South Africa; Heidelberg Institute for Theoretical Studies in Heidelberg, Germany; Beilstein-Institut in Frankfurt am Main, Germany; and the University of Liverpool in Liverpool, United Kingdom.

The research was partially sponsored by the Beilstein Institute, and funded by several organizations and institutions, including the German Research Foundation,  the German Federal Ministry of Education and Research, Klaus Tschira Foundation, German Federal Ministry of Education and Research, National Research Foundation of South Africa, U.S. National Science Foundation, Sino-Danish Center for Education and Research, Technical University of Denmark, and the U.S. Food and Drug Administration. Additional support was provided by the University of Michigan and the University of Notre Dame.


Originally published by Deanna Csomo Ferrell at on February 10, 2023.