ACCV 2024
Jun-Hyun Bae, Minho Lee, Heechul Jung
Kyungpook National University
๐Ÿ“„ Paper

Abstract

Training deep neural networks with empirical risk minimization (ERM) often captures dataset biases, hindering generalization to new or unseen data. Previous solutions either require prior knowledge of biases or utilize training intentionally biased models as auxiliaries; however, they still suffer from multiple biases. To address this, we introduce Adaptive Bias Discovery (ABD), a novel learning framework designed to mitigate the impact of multiple unknown biases. ABD trains an auxiliary model to be adapted to biases based on the debiased parameters from the debiasing phase, allowing it to navigate through multiple biases. Then, samples are reweighted based on the discovered biases to update debiased parameters. Extensive evaluations of synthetic experiments and real-world datasets demonstrate that ABD consistently outperforms existing methods, particularly in real-world applications where multiple unknown biases are prevalent.


Overview

์‚ฌ์ „ ๋ฐ”์ด์–ด์Šค ์ •๋ณด ์—†์ด ๋ฐ์ดํ„ฐ์— ์กด์žฌํ•˜๋Š” ์—ฌ๋Ÿฌ ๋ฐ”์ด์–ด์Šค๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ๋ฐœ๊ฒฌํ•˜๊ณ  ์ œ๊ฑฐํ•˜๋Š” ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

  1. Bias-adapted model โ€” Debiased ํŒŒ๋ผ๋ฏธํ„ฐ \(\theta\) ์—์„œ 1-step gradient descent๋กœ ๋ฐ”์ด์–ด์Šค์— ๋ฏผ๊ฐํ•œ ๋ณด์กฐ ๋ชจ๋ธ \(f_\phi\) ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.
  2. Adaptive group formation โ€” \(f_\phi\) ์˜ ์˜ˆ์ธก์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”์ด์–ด์Šค ์ •๋ ฌ ๊ทธ๋ฃน(\(G^\odot\) )๊ณผ ๋น„์ •๋ ฌ ๊ทธ๋ฃน(\(G^\otimes\) )์œผ๋กœ ๋ถ„ํ• ํ•œ๋‹ค.
  3. Iterative debiasing โ€” Group DRO๋กœ worst-case ๊ทธ๋ฃน ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋ฉฐ, \(\theta\) ๊ฐ€ ํ•œ ๋ฐ”์ด์–ด์Šค์— ๊ฐ•๊ฑดํ•ด์ง€๋ฉด \(\phi\) ๊ฐ€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋‹ค์Œ ๋ฐ”์ด์–ด์Šค๋ฅผ ๋ฐœ๊ฒฌํ•œ๋‹ค.

ABD Framework

ABD ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐœ์š”. ๋‘ ๊ฐ€์ง€ ๋ฐ”์ด์–ด์Šค(Bias1, Bias2)์™€ ๋‘ ํ•™์Šต ์Šคํ…์„ ์˜ˆ์‹œ๋กœ ๋„์‹ํ™”.


Method

ERM์œผ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ์— ์กด์žฌํ•˜๋Š” spurious correlation์„ ์‰ฝ๊ฒŒ ํฌ์ฐฉํ•˜์—ฌ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ์ €ํ•˜๋œ๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ ๋ฐ”์ด์–ด์Šค ์ •๋ณด๋ฅผ ์‚ฌ์ „์— ์•Œ๊ณ  ์žˆ๊ฑฐ๋‚˜, ๋‹จ์ผ ๋ฐ”์ด์–ด์Šค๋งŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

ABD๋Š” ๋‘ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ๋จผ์ € debiased ํŒŒ๋ผ๋ฏธํ„ฐ \(\theta\) ์—์„œ ํ•œ ์Šคํ… gradient descent๋กœ bias-adapted ํŒŒ๋ผ๋ฏธํ„ฐ \(\phi = \theta - \alpha \nabla_\theta \mathcal{L}(f_\theta)\) ๋ฅผ ์–ป๋Š”๋‹ค. ์ด \(f_\phi\) ๋Š” ๋ฐ์ดํ„ฐ์˜ ํ‘œ๋ฉด์  ํŒจํ„ด์— ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘ํ•˜๋ฏ€๋กœ, ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”์ด์–ด์Šค ์ •๋ ฌ ๊ทธ๋ฃน(\(G^\odot\) )๊ณผ ๋น„์ •๋ ฌ ๊ทธ๋ฃน(\(G^\otimes\) )์œผ๋กœ ๋ถ„ํ• ํ•œ๋‹ค. ์ดํ›„ group DRO๋ฅผ ํ†ตํ•ด worst-case ๊ทธ๋ฃน์˜ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋„๋ก \(\theta\) ๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

ํ•ต์‹ฌ์€ \(\phi\) ๊ฐ€ ๋งค ์Šคํ…๋งˆ๋‹ค \(\theta\) ๋กœ๋ถ€ํ„ฐ ์žฌ์ƒ์„ฑ๋œ๋‹ค๋Š” ์ ์ด๋‹ค. \(\theta\) ๊ฐ€ ์ฒซ ๋ฒˆ์งธ ๋ฐ”์ด์–ด์Šค์— ๋Œ€ํ•ด ๊ฐ•๊ฑดํ•ด์ง€๋ฉด, \(\phi\) ๋Š” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋‹ค์Œ์œผ๋กœ ๋‘๋“œ๋Ÿฌ์ง„ ๋ฐ”์ด์–ด์Šค๋ฅผ ํฌ์ฐฉํ•˜๊ฒŒ ๋œ๋‹ค. ์ด MAML ์œ ์‚ฌ ๊ตฌ์กฐ ๋•๋ถ„์— ์‚ฌ์ „ ๋ฐ”์ด์–ด์Šค ์ •๋ณด ์—†์ด๋„ ์—ฌ๋Ÿฌ ๋ฐ”์ด์–ด์Šค๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ๋ฐœ๊ฒฌํ•˜๊ณ  ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋‹ค.

์•„๋ž˜ GradCAM ์‹œ๊ฐํ™”๋Š” biased model \(f_\phi\) ์˜ attention์ด ํ•™์Šต์ด ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ์˜์—ญ์œผ๋กœ ์ด๋™ํ•˜๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค. ABD๊ฐ€ ํ•™์Šต ๊ณผ์ •์—์„œ ๋‹ค์–‘ํ•œ ๋ฐ”์ด์–ด์Šค๋ฅผ ์ ์‘์ ์œผ๋กœ ๋ฐœ๊ฒฌํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

Biased Model Evolution

ERM ๋ชจ๋ธ๊ณผ ABD์˜ biased model $f_\phi$์˜ GradCAM ์‹œ๊ฐํ™”. ํ•™์Šต ์Šคํ…์ด ์ง„ํ–‰๋˜๋ฉด์„œ $f_\phi$์˜ attention์ด ๋‹ค๋ฅธ ๋ฐ”์ด์–ด์Šค ํŠน์ง•์œผ๋กœ ์ด๋™ํ•œ๋‹ค.


Results

Colored MNIST

OoD test accuracy (%). Bias: Color๋งŒ ์žˆ๋Š” ๊ฒฝ์šฐ์™€ Color & Patch๊ฐ€ ๋™์‹œ์— ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ.

AlgorithmColor (OoD)Color & Patch (OoD)
ERM16.414.0
IRM66.913.4
Group DRO13.614.1
PI70.215.3
ABD (Ours)70.762.3
Optimal75.075.0

PI๋Š” ๊ฐ€์žฅ ์ง€๋ฐฐ์ ์ธ ๋ฐ”์ด์–ด์Šค(Color)๋งŒ ๋ฐœ๊ฒฌํ•˜๋Š” ๋ฐ˜๋ฉด, ABD๋Š” Color โ†’ Patch ์ˆœ์œผ๋กœ ์—ฌ๋Ÿฌ ๋ฐ”์ด์–ด์Šค๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ๋ฐœ๊ฒฌํ•œ๋‹ค.

PI Baseline

PI์˜ ๊ทธ๋ฃน ๋‚ด Pearson ์ƒ๊ด€๊ณ„์ˆ˜. PI๋Š” Color ๋ฐ”์ด์–ด์Šค๋งŒ ๋ฐœ๊ฒฌํ•˜๊ณ  Patch๋Š” ํฌ์ฐฉํ•˜์ง€ ๋ชปํ•œ๋‹ค.

Bias Discovery - ABD

ABD์˜ ๊ทธ๋ฃน ๋‚ด Pearson ์ƒ๊ด€๊ณ„์ˆ˜ ์‹œ๊ฐํ™”. ํ•™์Šต์ด ์ง„ํ–‰๋˜๋ฉด์„œ Color โ†’ Patch ์ˆœ์œผ๋กœ ๋ฐ”์ด์–ด์Šค๋ฅผ ๋ฐœ๊ฒฌํ•œ๋‹ค.

Real-World Tasks

CivilComments (worst-case acc.), MultiNLI (worst-case acc.), Camelyon17 (OoD acc.), FMoW (worst-region acc.).

AlgorithmCivilCommentsMultiNLICamelyon17FMoW
ERM56.061.870.332.3
Group DRO70.062.768.430.8
JTT69.363.263.833.4
PI61.161.571.731.2
LISAโ€”โ€”77.135.5
ABD (Ours)71.167.181.134.1

MultiNLI Analysis

MultiNLI์—์„œ ์˜ค๋ถ„๋ฅ˜ ๊ทธ๋ฃน $G^\otimes$์˜ ๋ฐ”์ด์–ด์Šค ๊ตฌ์„ฑ ๋ณ€ํ™”. Negation ๋ฐ”์ด์–ด์Šค ๋ฐœ๊ฒฌ ํ›„ Overlap ๋ฐ”์ด์–ด์Šค๊ฐ€ ์ ์ฐจ ๋“œ๋Ÿฌ๋‚œ๋‹ค.

GradCAM

MetaShift ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ GradCAM ์‹œ๊ฐํ™”. ERM์€ ๋ฐฐ๊ฒฝ์— ์˜์กดํ•˜์ง€๋งŒ, ABD๋Š” ๋Œ€์ƒ ๊ฐ์ฒด์— ์ง‘์ค‘ํ•œ๋‹ค.


BibTeX

@InProceedings{Bae_2024_ACCV,
  author    = {Bae, Jun-Hyun and Lee, Minho and Jung, Heechul},
  title     = {Adaptive Bias Discovery for Learning Debiased Classifier},
  booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)},
  month     = {December},
  year      = {2024},
  pages     = {3074-3090}
}