GFH003 行動就是參數; 損失就是脈絡

目錄

Question

Q. 樹籬算法(Hedge algorithm)的公平性基礎?

Q. 樹籬算法(Hedge algorithm)的公平性基礎?

Formulation

問題設定: 「行動$w_t$」是我們的「估計$\beta_t$」; 「損失$\langle l_{t}, w_{t}\rangle$」是我們的「期望平均 $\langle x_{t}, \beta_{t}\rangle$. 那他們對於「loss vector distribution」的要求是?

問題設定:

  • 學習者選「行動$w_{t}$」是一個d維度向量.(不像bandit是選一個行動)
  • 學習者觀察到損失: $f_{t}(w_{t})$.
  • 具體的損失-線性損失:$f_{t}(w_{t})=\langle l_{t}, w_{t}\rangle$. (轉成reward就可以符合我們要的設定, 只是會變成contextual bandit?)
  • 專家j的累積損失: $L_{t,j}=\sum_{s=1}^{t}l_{s,j}$
  • 學習者的累積損失:

解法:FTRL 跟著正規領導者: 「data-dependent sequence of regularizer」平行「online regularization scheme」; 他們的目標是regret好; 我們的目標是控制收斂速度, 以「統計偏差(Statistical error)」之名.

解法:FTRL 跟著正規領導者

  • data-dependent sequence of regularizer (與我們的online regularization也很類似)
  • 這樣找到的「action」就像是在$l_1$ regularizer下找到的Lasso. (這樣串起來, 好像其實我們之前做的是一種Follow the regularizer leader. 但我們的context是什麼? 好像還是不一樣, 我們那邊是mean reward, 然後最優的那個的mean reward也是一個算法; 所以我們的$\beta_{t}$是這邊的$w_{t}$)
  • (所以, 只要regret返回, 變成$|w-w_{t}|$函數, 然後我們會算$w_{t}$的收斂速度, 就可以得到regret的收斂速度.)
  • (與optimization比較, 他們這裡關心的是regret; 我們統計關心的是$w_{t}$是否也有收斂到true parameter $w$, 因為true parameter代表了population, 也就是我們想要驗的data generative model.)
  • 這樣感覺整個比較懂了!很酷, 這個真的蠻神奇的…🥳

Evidence: 2021:Best-Case Lower Bounds in Online Learning

Any-time Hedge: 有好的worst-case regret

Any-time Hedge有好的worst-case regret.:

  • 調整learning rate的Hedge算法, 有好的worst-case regret.

For example, in the setting of decision-theoretic online learning (DTOL) with d experts (Freund and Schapire, 1997), the well-known anytime version of Hedge (which uses the time-varying learning rate $\eta_{t} \asymp \sqrt{\log (d) / t}$ ) enjoys $O(\sqrt{T \log d})$ worst-case regret and, as we show, has $-O(\sqrt{T \log d})$ best-case regret.

常數學習率Hedge的缺點: 非anytime的群體公平

常數學習率Hedge的缺點: 非anytime的群體公平:

  • constant learning Hedge: 其best-case後悔是$-O(\sqrt{T})$.
  • 有限多個滿足「群體公平」的專家, 在巧妙使用Hedge的狀況下也能滿足同樣的「群體公平」, 且可以得到$O(\sqrt{T})$後悔.
  • 缺點: 不是anytime算法. 導致最後的群體公平, 看起來會很不公平. The fixed time horizon assumption also implies that their notion of group fairness also is inherently tied to a fixed time horizon (see Section 4 for a detailed discussion), and this latter implication can lead to experts that seem very unfair but which, based on a fixed horizon view of group fairness, are technically considered to be fair. (👻 可以大寫特寫的點, 想想看怎麼變成work.)
  • 本文章的方法給出了一個「anytime version of group fairness」, 感覺這才是我們要的!

A key motivation for our work is a recent result of Blum et al. (2018) which shows, in the setting of DTOL, that Hedge with constant learning rate has best-case regret lower bounded by $-O(\sqrt{T})$. This result, taken together with worst-case upper bounds of order $O(\sqrt{T})$, is then used to show that if each of finitely many experts approximately satisfies a certain notion of group fairness, then a clever use of the Hedge algorithm (running it separately on each group) also approximately satisfies the same notion of group fairness while still enjoying $O(\sqrt{T})$ regret. However, we stress that their result is very limited in that it applies only to Hedge when run with a known time-horizon. The fixed time horizon assumption also implies that their notion of group fairness also is inherently tied to a fixed time horizon (see Section 4 for a detailed discussion), and this latter implication can lead to experts that seem very unfair but which, based on a fixed horizon view of group fairness, are technically considered to be fair. Our best-case lower bounds enable the results of Blum et al. (2018) to hold in much greater generality; in particular, our results enable the use of an anytime version of group fairness, which we feel is trulv needed.

Decreasing Hedge (Mourtada and Gaïffas, 2019):有好的anytime-best-case lower bound與worst-case upper bound

Decreasing Hedge (Mourtada and Gaïffas, 2019):有好的anytime-best-case lower bound與worst-case upper bound:

  • 使用「負夏儂熵(Negative Shannon entropy)」來做regularizer;
  • 跟著配合的learning rate. 使得最後做的「決策」是以 的模式採用專j的意見. (這裡感覺可以與j arm的故事產生關係.)

Answer

A. 行動就是參數; 損失就是脈絡

我們可否建立minimax optimal相關的group fairness呢?

這次的50分鐘領悟了, 其實行動就是參數; 損失就是脈絡. 在online learning的各種技術,主要是為了回答regret層面的考量; 而Statistics對估計的考量會了以後, 又可以討論統計誤差, 再更具體執行統計的任務. 研究真有意思. Martin Wainright相關的作品一定就是走這條學術密碼吧!

版權

CC BY-NC-ND 4.0

評論