Volume 25 Issue 1 - October 11, 2013 PDF
AutoBind: automatic extraction of protein–ligand-binding affinity data from biological literature
Darby Tien-Hao Chang1, Chao-Hsuan Ke2, Jung-Hsin Lin3,4 and Jung-Hsien Chiang2,*
1 Department of Electrical Engineering, 2 Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 70101,3 School of Pharmacy, National Taiwan University, Taipei 10051 and 4 Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
Determination of the binding affinity of a protein-ligand complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. In the past decades, several databases of protein-ligand binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection.

Recently, we have created a new database of protein-ligand binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17221 articles.

Table 1. Comparison of AutoBind to other databases
 Protein ComplexNumber of entries
(Affinity Data)
Last updated dateFirst published year
AutoBind3792913616February 13, 2013This work
PDBBind79867986September 22, 20112004
Binding MOAD16955563020102005
BindingDB30561817March, 20102001

