Let us consider the following one item inventory problem:
At time t, the shop owner can order items (total stock cannot exceed L). They are delivered at time t+ (just after t).
Between t+ and t+1, demands by customers are satisfied if the stock is sufficient.
The final time is denoted H.
The reward function is of the following form (beware some functions are rewards and others are costs).
Ordering k items costs c1.k+c2
Holding stock s for one time unit (between t and t+1) costs c3.s
Selling n items earns c4.n
Unsatisfied demand of m items (when the stock is empty) costs c5.m
Final stock of x items at time H earns c6.x
c1,c2,c3,c4,c5,c6 are constants.
The demand follows a known distribution: p(i) denotes the probability that the demand will be i in the current time slot (for i=0,…,D).
Question: write down the Bellman equation and write a computer program that solves it. Try to infer the structure of the optimal policy when H is large.