Allostery plays a key role in catalytic mechanism of many enzymes. Notwithstanding its importance, the mechanism of allostery is still not fully understood. Purine Nucleoside Phosphorylases (PNPs) are oligomeric enzymes that catalyse the synthesis of purine nucleotides in the purine salvage pathway and represent an ideal case for studying allostery. PNPs are essential enzymes in some microorganisms such as Helicobacter Pylori which rely exclusively on recycled purines for their protein synthesis. Thus, inhibiting PNPs represents a promising way of supressing bacterial growth and curing bacterial infection. Consequently, finding novel modes of inhibition of PNPs opens new treatment possibilities against serious pathogens. Understanding allostery in PNPs would explore new routes for designing allosteric inhibitors, with higher specificity that are not tied to targeting exclusively the enzyme’s active site, as is currently the case.
To understand such a fundamentally dynamical phenomenon as allostery, it is necessary to employ not only static structures from X-ray crystallography (XC), but also their dynamic counterparts from molecular dynamics (MD) simulations. This project will merge XC and MD information from PNPs and store it in a form of specialized graph database. Graph databases are well suited for tracing information that spans many nodes, as it is the case with allosteric interactions between protein residues. Storing data in such an organised way will allow for employment of machine learning algorithms to find correlations between different interactions in XC and MD domain, leading to the discovery of hidden allosteric pathways in PNPs and directing the current research.
The phenomenon of allostery  has been recognized for more than a century now, although the term allostery itself was coined fifty years ago in a seminal work of Monod and Jacob . Ever since, various models have been proposed to explain the mechanism of allostery but despite huge advances, the effect is far from being fully understood . The accumulated knowledge clearly shows that the complex and multifaceted mechanism of allostery can only be revealed by a multidisciplinary approach . Such approach must consider not only the intrinsic structural basis of the mechanism of any enzyme, but also the dynamical nature of proteins and various interactions between constituent units at various levels, from amino acids to entire chains and subdomains .
Proteins can naturally be represented as networks, where nodes are amino acids and edges are various interactions between them, be it peptide bonds along the main chain, or various non-covalent interactions that may be present during time evolution. If allosteric communication between amino acids is transmitted through non-covalent interactions, then the most natural way of tracing the allosteric pathways through protein is by following the time evolution of underlying interaction networks. Thus, it is no surprise that recently the network view of allostery has shown as one of the major approaches in the research of allostery [6, 7], leading to significant breakthroughs .
A class of oligomeric enzymes, purine nucleoside phosphorylases or PNPs , that exhibits allosteric signalling and regulation have been a subject of research of the host laboratory for several years [10–13], most recently as part of the project Allosteric communication pathways in oligomeric enzymes (ALOKOMP, https://alokomp.irb.hr/). PNPs catalyse the synthesis of purine nucleotides in purine salvage pathway and represent excellent model enzymes for studying allostery and oligomerization. They are oligomeric enzymes that are trimeric in eukaryotes and hexameric in bacteria . Each monomer in the oligomer can be found in two states, open or closed, depending on the conformation of the active site . The crystal structures of hexameric PNPs have shown a wide variety of distributions of open and closed states of monomeric units (all open, five open one closed, four open two closed, three open three closed etc.). Along with the unknown mechanism of allosteric communication between monomers, different possible distributions of active site conformations represent unsolved puzzles in these enzymes. The end goal of this project is to find a possible correlation between these conformational changes and allosteric regulation, and the implications of this on the catalytic mechanism in PNPs.
Beyond being a model for allosteric signalling, PNPs have high pharmacological significance. Some bacteria, such as Helicobacter pylori, rely exclusively on recycled purines for their protein synthesis, where PNP acts as the essential enzyme. The first structure of PNP from H. pylori was determined recently in the host laboratory . Inhibiting PNP as the key enzyme represents a promising way of supressing bacterial growth and curing bacterial infection . Consequently, finding novel modes of inhibition of PNP opens new treatment possibilities against this very serious pathogen . Understanding allostery in PNPs would explore new routes for designing allosteric inhibitors, with higher specificity and not tied to targeting exclusively the enzyme’s active site, as is currently the case.
The main activity of the host laboratory is the determination of three-dimensional structures of enzymes at atomic resolution using X-Ray crystallography (XC) on protein crystals. Although this method gives us the most detailed view of the enzymes, it suffers from one major drawback, which is that it only gives a static view of the enzyme, and the determined structure of the enzyme is just one of the many possible states in which it can find itself. Therefore, XC alone is not capable of disclosing a fundamentally dynamic basis of allostery. For this reason, XC will serve as a starting point for the molecular dynamics (MD) simulations in silico. These simulations will provide the missing dynamic information on how the enzymes are moving in time, highlighting the residues that play a key role in transferring information from one part of the enzyme to the other. It is expected that strong correlations should exist in static (XC) and dynamic (MD) data, confirming the importance of certain regions in the enzymes. All this will yield new information on the mechanism that underpins the information transfer between the monomers in this class of oligomeric enzymes.
The movements of proteins (enzymes) in time are at the heart of their apability to transmit the information from one protein domain to the other, hrough so-called allosteric communication pathways. Representing the MD rajectories in a suitable form that is adapted to extracting useful nteraction patterns, can thereby provide insights into allosteric ommunication modes, thus revealing the allosteric pathways in PNPs. Currently he number of determined structures of PNPs from different organisms deposited n the Protein Data Bank (PDB) is around 220. Each structure of hexameric PNP ontains six chains of 200-250 residues, making the structures of PNPs quite emanding for MD simulations due to their large size. Considering that MD tructure simulation would have to be of minimally 100 ns duration and done in everal replicas, this makes the job quite daunting even on modern high erformance computing facilities. But that is only the beginning of the roblem, as processing huge amounts of MD simulations data, requires adequate utomated methods of analysis, which greatly reduce its size and represent it n a manageable form.
A convenient way of reducing such MD simulation data is by using high quality programmatic tools for the data analysis already available in Python programming language (NumPy, SciKitLearn, Pandas, Mdanalysis) and applying them to the data represented in form of the relational database. As part of the ongoing ALOKOMP project in the host institution, a relational database is being constructed which will contain all the XC and dynamic MD data combined. The building blocks of this relational database are residues of all the PNPs available, and the relationships between them are either static (such as hydrogen bonds, close contacts etc.) or dynamic (such as time evolution of appearance of certain contacts). Having all of this information connected in a single database is essential.
This proposal brings a further key enhancement to the ALOKOMP project: a switch to the entirely different form of underlying database: instead of using a relational database, amino acid interactions would be represented in the form of a graph database. Graph database is a form that stores the data in a form of a graph, essentially a network of nodes, which is ideally suited to represent network structure of proteins. It is no wonder that the PDB has recently put its entire database in this form [https://www.ebi.ac.uk/pdbe/pdbe-kb/]. But the reason for employing a graph database is more fundamental and has to do with the performance of databases with respect to the length of the queries that can be performed. Namely, allosteric interaction pathways usually span many residues. Executing queries that involve many table lookups in a relational database is computationally prohibitive. As soon as the number of hops reaches a certain (small) number, the time it takes for the query to execute increases exponentially. This is in sharp contrast to the graph databases where the relationships are objects themselves and the query time scales linearly with their length. Therefore, it is in principle possible to follow relationships between amino acids irrespective of their physical distance, which is just what we need for allostery.
To fully take advantage of such enormous amounts of data in a form of the graph database, it is necessary to leverage machine learning (ML) algorithms that will operate on this highly structured and connected data. Along with the above-mentioned libraries, Python features a mature ML eco-system which will allow the identification of patterns in the relationships. The main goal of the ML analysis is to identify highly correlated changes in residue parameters (such as concerted movements, simultaneous changes in conformations, coinciding hydrogen bond formation etc.), but also to find regularities and similarities in residue environments between different enzymes.
This project would aim to utilize this novel combination of XC, MD and ML that will operate on highly interconnected data in form of the graph database. The first part of project would focus on obtaining the dynamic MD data using the static XC data as the starting point. This is ongoing work at the host institute, and I would join to help finalizing it. The MD data would then be organised as a graph database allowing for different ML algorithms to be tested in search for allosteric pathways. Finally, amino acids identified as allosteric “hotspots” would be probed using molecular biology, namely site-directed mutagenesis followed by testing the impact of mutation on the enzymes’ kinetics. All-together this project will yield a new perspective on the phenomenon of allostery in PNPs, with the possibility of application to the enzymes in general.