Predicting Biodegradable Quality of Chemicals with the TGI+.3 Classifier

Julia Sidorova, Alberto Fernandez, Josep Cester, Robert Rallo, and Francesc Giralt

Keywords

structure-activity relationship, biodegradation, SMILES, tree grammar inference

Abstract

This work concerns the task of predicting biodegradable quality of chemical compounds. The prediction scheme we propose takes as an input the chemical formula written in the simplified molecular input line entry specification (SMILES). The classification scheme has the syntactic part, which learns a formal language from examples of ready biodegradable and non ready biodegradable chemicals, and the statistical part, which works on feature-vectors containing syntactic information (edit-distances). The syntactic part implements a tree grammar inference algorithm. The statistical part implements an entropy decision tree (the C4.5). Furthermore, we use this scheme wrapped in the hierarchical classification (also referred to as the classification decomposition), where the tree structure (the classification path) reflects the grouping of chemicals based on the difficulty of their SMILES, namely we treat separately 1) simple atomic chains, 2) atomic chains with substituents, and 3) atomic chains with cyclic compounds.

Important Links:



Go Back