MUR048 非參設限需求的動態定價與存貨管理的最優策略

目錄

非參設限需求的動態定價與存貨管理的最優策略

紫式晦澀每日一篇文章第48天

前言

  1. 今天是2022年第46天, 全年第7週, 二月的第3個週二. 今天開眼界, 讀讀非參設限需求的動態定價與存貨管理的最優策略, 來理解商學院的題目.

  2. 今天的素材主要來自文章:

代碼: ICN (Inventory Control Nonparametric): 非參存貨管理.

摘要

ICN000 賣受限需求, 非凹獎勵函數, 推廣平均Hellinger距離的下界

  • 存貨補貨控制 (Inventory Replenishment Control)

  • learning-while doing framework.

  • T consecutive review periors

  • firm not know the demand curve a priori.

  • 受限需求: censor demand

  • 非凹獎勵函數: non-concave reward functions

  • lower bound by generalized squared Hellinger distance.

We study the fundamental model in joint pricing and inventory replenishment control under the learningwhile-doing framework, with $T$ consecutive review periods and the firm not knowing the demand curve a priori. At the beginning of each period, the retailer makes both a price decision and an inventory order-upto level decision, and collects revenues from consumers' realized demands while suffering costs from either holding unsold inventory items, or lost sales from unsatisfied customer demands. We make the following contributions to this fundamental problem as follows:

  1. We propose a novel inversion method based on empirical measures to consistently estimate the difference of the instantaneous reward functions at two prices, directly tackling the fundamental challenge brought by censored demands, without raising the order-up-to levels to unnaturally high levels to collect more demand information. Based on this technical innovation, we design bisection and trisection search methods that attain an $\widetilde{O}(\sqrt{T})$ regret, assuming the reward function is concave and only twice continuously differentiable.
  2. In the more general case of non-concave reward functions, we design an active tournament elimination method that attains $\widetilde{O}\left(T^{3 / 5}\right)$ regret, based also on the technical innovation of consistent estimates of reward differences at two prices.
  3. We complement the $\widetilde{O}\left(T^{3 / 5}\right)$ regret upper bound with a matching $\widetilde{\Omega}\left(T^{3 / 5}\right)$ regret lower bound. The lower bound is established by a novel information-theoretical argument based on generalized squared Hellinger distance, which is significantly different from conventional arguments that are based on Kullback-Leibler divergence. This lower bound shows that no learning-while-doing algorithm could achieve $\widetilde{O}(\sqrt{T})$ regret without assuming the reward function is concave, even if the untruncated revenue as a function of demand rate or price is concave.

Both the upper bound technique based on the “difference estimator” and the lower bound technique based on generalized Hellinger distance are new in the literature, and can be potentially applied to solve other inventory or censored demand type problems that involve learning.

簡介

ICN101 立即獎勵:沒現貨損失銷售機會, 有現貨損失倉儲成本

  • 設定: 同時考慮「定價」與「存貨補貨」, 無固定「訂購成本(ordering cost)」與不會消失的「存貨(inventories)」.
  • 決策: 「存貨決策(inventory decision)」, 「定價決策(price decision)」
  • $b$: 損失銷售機會的單位成本(unit costs of lost-sales)
  • $h$: 持有庫存的單位成本(unit costs of holding inventories)
  • $y_{t}$: 現貨數量
  • $d_{t}=\lambda(p_{t})+\epsilon$: 需求大小, 是價錢$p_{t}$的函數
  • $\lambda(\cdot)$: 需求曲線 (Demand curve)
  • $p_{t}$: 定價
  • $o_{t}$: 受限需求(censored demand)- 貨太少只看到自己的庫存量; 貨太多就真的看到需求.

ICN102: 短視優化價錢與庫存量-補太少承受「沒機會賣的單位成本+定價的成本」; 補太多承受「多進的要放庫存的成本」.

  • 短視優化(Myopic optimization):最優化單步的「期望獎勵」
  • 期望獎勵: 之前我們研究的是「利潤profit $p\lambda(p)$」, 但現在額外要決定「補貨程度(replemishment)」.補太少承受「沒機會賣的單位成本+定價的成本」; 補太多承受「多進的要放庫存的成本」
  • 這邊的邏輯是, 先優化庫存, 再優化價錢.

Reward engineering真的很深奧, 要具體去學.

ICN103 後悔: 缺少需求曲線與噪音分佈資訊, 做出的決策中, 少賺的獎勵

  • 缺少的資訊: 不知道需求曲線, 不知道噪音分佈
  • 訊號: 受限的需求
  • 後悔: 少賺的期望獎勵, 在缺少需求曲線與噪音分佈的狀況下.

在這種狀況下如何做「需求學習(demand learning)」呢?

ICN

ICN

方法

ICN201 不同長相的需求函數

  • 怎麼感覺他們這裡不要估計demand 函數裡面的參數?

ICN

ICN

ICN

ICN

結果

ICN

ICN

ICN

ICN

ICN

討論

ICN

ICN

ICN

ICN

ICN

後記

可以多學習如何給lecture的方式, 感覺抓重點可以看到其主線邏輯.

2022.02.14. 紫蕊 於 西拉法葉, 印第安納, 美國.


Version Date Summary
0.1 2022-02-13 分析看見更深的世界

版權

CC BY-NC-ND 4.0

評論