The interactions between nucleic acids (DNA/RNA) and other molecules (e.g., ligands, proteins and nucleic acids) are essential for fundamental biological processes. Characterization and prediction of nucleic acid binding sites would be helpful in understanding the mechanism of these interactions. Compared with the considerable efforts for protein binding sites, the corresponding study on nucleic acids is still in its infancy. By systematically investigating different types of binding sites in RNA and DNA, we illustrated that RNA could adopt binding pockets to interact with ligands and protruding surfaces to bind to proteins and nucleic acids, while DNA may use its middle parts to form contacts with other molecules. Based on these biological insights, we developed a feature-based ensemble learning classifier by fully using the interplay among different machine learning techniques, feature spaces and sample spaces. Meanwhile, we built a template-based classifier by exploiting structural conservation. The complementarity between these two classifiers motivated us to establish an integrative framework for improving prediction performance. Furthermore, we proposed a post-processing procedure based on the random walk algorithm to further correct the integrative predictions. The unified prediction framework, called NABS, can achieve promising results for different nucleic acid binding sites and outperform existing methods.
Citation: Dissecting and predicting different types of binding sites in nucleic acids based on structural information.