ICONIP 2022
Jun-Hyun Bae*, Taewon Park*, Minho Lee
Kyungpook National University
* Equal Contribution
๐Ÿ“„ Paper

Abstract

Learning associative reasoning is necessary to implement human-level artificial intelligence even when a model faces unfamiliar associations of learned components. However, conventional memory augmented neural networks (MANNs) have shown degraded performance on systematically different data since they lack consideration of systematic generalization. In this work, we propose a novel architecture for MANNs which explicitly aims to learn recomposable representations with a modular structure of RNNs. Our method binds learned representations with a Tensor Product Representation (TPR) to manifest their associations and stores the associations into TPR-based external memory. In addition, to demonstrate the effectiveness of our approach, we introduce a new benchmark for evaluating systematic generalization performance on associative reasoning, which contains systematically different combinations of words between training and test data. From the experimental results, our method shows superior test accuracy on systematically different data compared to other models. Furthermore, we validate the models using TPR by analyzing whether the learned representations have symbolic properties.


Overview

๊ธฐ์กด MANN์ด ์ฒด๊ณ„์ ์œผ๋กœ ๋‹ค๋ฅธ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ ์‹คํŒจํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, modular encoder์™€ TPR ๊ธฐ๋ฐ˜ ์™ธ๋ถ€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ฒฐํ•ฉํ•œ ์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

  1. Modular encoding โ€” Recurrent Independent Mechanisms(RIMs)๋กœ ์ž…๋ ฅ์„ \(N\) ๊ฐœ ๋…๋ฆฝ ๋ชจ๋“ˆ์ด ๊ฒฝ์Ÿ์ ์œผ๋กœ ์ธ์ฝ”๋”ฉํ•˜์—ฌ ์žฌ์กฐํ•ฉ ๊ฐ€๋Šฅํ•œ ํ‘œํ˜„์„ ํ•™์Šตํ•œ๋‹ค.
  2. TPR binding โ€” Tensor Product Representation์œผ๋กœ role๊ณผ filler์˜ ์—ฐ๊ด€ ๊ด€๊ณ„๋ฅผ ์ˆ˜ํ•™์ ์œผ๋กœ ๋ฐ”์ธ๋”ฉํ•œ๋‹ค: \(T = \sum_{k=1}^N \mathbf{r}_k \otimes \mathbf{f}_k\)
  3. Memory-based recall โ€” TPR ๊ธฐ๋ฐ˜ ์™ธ๋ถ€ ๋ฉ”๋ชจ๋ฆฌ์— ์—ฐ๊ด€ ๊ด€๊ณ„๋ฅผ ์ €์žฅํ•˜๊ณ , ํ•™์Šตํ•˜์ง€ ์•Š์€ ์กฐํ•ฉ์—์„œ๋„ ์ฒด๊ณ„์ ์œผ๋กœ ์ถ”๋ก ํ•œ๋‹ค.

Method

๊ธฐ์กด memory augmented neural network (MANN)๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ์ฒด๊ณ„์ ์œผ๋กœ ๋‹ค๋ฅธ(systematically different) ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ ์„ฑ๋Šฅ์ด ๊ธ‰๋ฝํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” modular RNN encoder + TPR-based external memory๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ฒด๊ณ„์  ์ผ๋ฐ˜ํ™”(systematic generalization)๋ฅผ ๋‹ฌ์„ฑํ•œ๋‹ค.

ํ•ต์‹ฌ ๊ตฌ์„ฑ:

  • Recurrent Independent Mechanisms (RIMs): \(N\) ๊ฐœ์˜ RNN ๋ชจ๋“ˆ์ด competitive learning์œผ๋กœ ๊ฐ์ž ๋…๋ฆฝ์ ์ธ ์ธ์ฝ”๋”ฉ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ํ•™์Šต
  • Tensor Product Representation (TPR): role๊ณผ filler์˜ tensor product๋กœ ์—ฐ๊ด€ ๊ด€๊ณ„๋ฅผ ์ˆ˜ํ•™์ ์œผ๋กœ ๋ฐ”์ธ๋”ฉ โ€” \(T = \sum_{k=1}^N \mathbf{r}_k \otimes \mathbf{f}_k\)
  • TPR-based External Memory: ๊ฐ ์‹œ๊ฐ„ ๋‹จ๊ณ„์—์„œ role/filler ํ‘œํ˜„์„ ์ถ”์ถœํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ์— superpose
  • Systematic Associative Recall (SAR): ์ฒด๊ณ„์  ์ผ๋ฐ˜ํ™” ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ์ƒˆ ๋ฒค์น˜๋งˆํฌ ์ œ์•ˆ

Results

Quantitative

SAR Results

SAR ํƒœ์Šคํฌ์—์„œ DNC, FWM, ์ œ์•ˆ ๋ฐฉ๋ฒ•์˜ ํ•™์Šต/ํ…Œ์ŠคํŠธ ์ •ํ™•๋„ ๋น„๊ต. DNC์™€ FWM์€ ์ฒด๊ณ„์ ์œผ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ(test different)์—์„œ ํฐ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๋ณด์ด์ง€๋งŒ, ์šฐ๋ฆฌ ๋ชจ๋ธ์€ ์„ฑ๊ณต์ ์œผ๋กœ ์ฒด๊ณ„์  ์ผ๋ฐ˜ํ™”๋ฅผ ๋‹ฌ์„ฑํ•œ๋‹ค.

ModelTest Accuracy
LSTM80.88%
Transformer-XL87.66%
Meta-learned Neural Memory88.97%
Fast Weight Memory (FWM)96.75%
FWM (our trial)94.94%
Ours96.63%

๋Œ€๊ทœ๋ชจ ์งˆ์˜์‘๋‹ต ํƒœ์Šคํฌ(catbAbI)์—์„œ๋„ FWM์— ํ•„์ ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ๋ชจ๋“ˆ ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”์˜ ์ผ๋ฐ˜์  ์œ ํšจ์„ฑ์„ ํ™•์ธ.

Analysis

ํ•™์Šต๋œ ํ‘œํ˜„์ด ์˜ฌ๋ฐ”๋ฅธ symbolic property๋ฅผ ๊ฐ–๋Š”์ง€ ๊ฒ€์ฆํ•œ๋‹ค. Role ๋ฒกํ„ฐ์™€ unbinding ๋ฒกํ„ฐ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋ถ„์„ํ•˜๋ฉด, FWM์€ orthogonalํ•˜์ง€ ์•Š์ง€๋งŒ ์šฐ๋ฆฌ ๋ฐฉ๋ฒ•์€ ๊ฑฐ์˜ ์™„๋ฒฝํ•œ orthogonality๋ฅผ ๋ณด์ธ๋‹ค.

FWM role-unbinding

(a) FWM

Ours role-unbinding

(b) Ours

Role ๋ฒกํ„ฐ์™€ unbinding ๋ฒกํ„ฐ ๊ฐ„ ์œ ์‚ฌ๋„ ํ–‰๋ ฌ. FWM์€ orthogonalํ•˜์ง€ ์•Š์ง€๋งŒ, ์šฐ๋ฆฌ ๋ฐฉ๋ฒ•์€ ๊ฑฐ์˜ ์™„๋ฒฝํ•œ orthogonality๋ฅผ ๋ณด์—ฌ ์˜ฌ๋ฐ”๋ฅธ symbolic representation์„ ํ•™์Šตํ–ˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๋™์ผํ•œ ๋Œ€์ƒ ๊ฐ์ฒด์— ๋Œ€ํ•œ read ๋ฒกํ„ฐ์˜ ์ผ๊ด€์„ฑ์„ ๋ถ„์„ํ•˜๋ฉด, ์šฐ๋ฆฌ ๋ฐฉ๋ฒ•์—์„œ ์กฐํ•ฉ์— ๊ด€๊ณ„์—†์ด ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค.

FWM read vectors

(a) FWM

Ours read vectors

(b) Ours

๋™์ผํ•œ ๋Œ€์ƒ ๊ฐ์ฒด์— ๋Œ€ํ•œ read ๋ฒกํ„ฐ ๊ฐ„ ์œ ์‚ฌ๋„. ์šฐ๋ฆฌ ๋ฐฉ๋ฒ•์—์„œ read ์ถœ๋ ฅ์ด ์กฐํ•ฉ์— ๊ด€๊ณ„์—†์ด ๊ฑฐ์˜ ๋™์ผํ•˜์—ฌ, ์ฒด๊ณ„์ ์ธ ์—ฐ๊ด€ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.


BibTeX

@inproceedings{bae2022learning,
  author    = {Bae, Jun-Hyun and Park, Taewon and Lee, Minho},
  title     = {Learning Associative Reasoning Towards Systematicity Using Modular Networks},
  booktitle = {International Conference on Neural Information Processing (ICONIP)},
  year      = {2022},
  publisher = {Springer},
  doi       = {10.1007/978-3-031-30108-7_10}
}