Rafał Adamczak, PhD, Department of Informatics, NCU: “Combining Fragment-based and Topological Profiles for Efficient Clustering of Big Macromolecular Data”
Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. uQlust and PhiClust are the packages developed in our department and are a versatile and easy-to-use tools for ultrafast ranking and clustering of macromolecular structures. They make use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. uQlust is dedicated for ranking and clustering of large sets of models of protein or RNA molecules and PhiClust is design for gene expression data.