An Empirical Comparison of Token Encoding Strategies in the Generation of Vector Representations of Structured Data

J. Vincent and R.C. Mintram (UK)


: Connectionist models, token encoding, vector representation (VREP), General Encoder Decoder (GED)


A consequence of the connectionist approach to artificial intelligence is the requirement for structured data to be encoded into fixed width vector representations (VREPS). This paper provides an empirical comparison of six different strategies for encoding the tokens that appear within tree representations of this structured data. A new two element real-valued token encoding is presented and empirical results show that it produces more compact vec tors than previously possible with conventional encodings. This assessment is conducted within the General Encoder / Decoder (GED) framework and makes use of the VREP recovery profile (VRP) graphical representation to enable quantitative and qualitative judgements to be made.

Important Links:

Go Back