Policy Evaluation Using Markov Decision Process for Inventory Optimization
Markov Decision Process (MDP) is a discrete-time state-transition system suited to progressive decision making for models with uncertain elements. The impact of it has been significant in the fields of operations management in recent times. This thesis aims to find an optimal policy for an adaptive inventory model through the help of MDP. We consider a situation where a seller has only one type of product, of unit quantity for each cycle, to order to satisfy demands. Whether or not there will be demand of that product is a case of stochasticity. The demand can also only be of unit quantity for each cycle. The seller can either stock or issue a backorder of the product. However, inventory size or the number of issued backorders is limited. For each state of the inventory, only one of two actions can be chosen. The seller can either decide to buy, or not to buy the product based on the stochastic demand pattern. We approach this problem by using the MDP to iterate through possible policies. Our goal is to find the optimal policy that will recommend which actions to take in each inventory state that minimizes the cost of stocking or backordering.