Deep Reinforcement Learning for Multi-Echelon Inventory Control: A study at Volvo Group
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesisAlternative title
Deep reinforcement learning för lagerstyrning i ett multi-echelon distributionsnätverk (Swedish)
Abstract [en]
In this study, a Deep Reinforcement Learning (DRL) model was developed to solve a multi- echelon inventory control problem at Volvo Group, a manufacturer of heavy vehicles. According to recent studies, DRL has shown potential to be an effective method of solving inventory management problem by successfully balancing the trade-off between costs and service. Excess stocks leads to high inventory costs and too low inventory levels increase the risk of stockouts and back orders which can lead to lost sales and badwill. This trade-off is especially important for Volvo’s spare part distribution, since stockouts can cause severe implications for Volvo’s customers. However, keeping high availability of spare parts increases the inventory carrying costs.
The thesis was divided into three main parts. Firstly, the study identified what logistics complexities and cost complexities should be included in a DRL model. Secondly, the study identified how the DRL configurations should be designed in a DRL model. Thirdly, different combinations of logistics models and DRL configurations were tested in 8 different DRL models. Comparing the results of the 8 different DRL models against evaluation baselines (random policy, 1:1 policy and base stock policy), the models’ performance could be assessed. By identifying the most promising models, conclusions could be drawn re- garding how multi-agent reinforcement learning can be used in a distribution network.
The results presented in this study indicates that multi-agent DRL can be used to solve a multi-echelon inventory management problem, given a sufficiently simple setting. It is shown that the agents are able to learn fundamental dynamics such as the cost-availability trade-off. By increasingly penalizing stockouts, the agents are proven to compensate with generally higher inventory levels, and consequently higher availability. By comparing different methods of penalizing stockouts, this study stresses the importance of choosing an appropriate cost function in order to reach adequate availability levels. Finally, by consid- ering order handling costs, this study investigated the use of a novel action-space including the option to not place an order. While the results indicates that the agents are able to adjust to this increased complexity, it is suggested that further adjustments are necessary to fully benefit from such an action. This study also introduces a compilation of logistics complexities that can be considered when developing an inventory management model based on DRL.
It is recognized that the logistical scenarios evaluated in this thesis are highly simplified and substantial research remains before a practical DRL model can be derived. DRL proved to the authors to be challenging to apply in practice due to the technical complexity and the vast amount of adjustable components. Furthermore, combining the fields of DRL and inventory management requires reasonably extensive interdisciplinary competencies in order to develop a sufficiently advanced model, and at the same time consider the logistics aspects to make it logistically relevant.
Place, publisher, year, edition, pages
2020. , p. 132
Keywords [en]
Multi-echelon inventory control, Inventory management, Supply-chain management, Reinforcement learning, Deep reinforcement learning, Replenishment optimisation, Advantage Actor-Critic, Discrete event simulation, Logistics, Machine learning
National Category
Transport Systems and Logistics
Identifiers
URN: urn:nbn:se:liu:diva-167097ISRN: LIU-IEI-TEK-A--20/03845--SEOAI: oai:DiVA.org:liu-167097DiVA, id: diva2:1447877
External cooperation
Volvo Group AB
Subject / course
Industrial Management
Presentation
2020-06-10, Linköping, 14:30 (Swedish)
Supervisors
Examiners
2021-06-112020-06-262021-06-11Bibliographically approved