We develop online scheduling policies to minimize the sum average age of information (AoI) subject to transmission capacity and long-run average resource constraints in a multi-source two-hop system, where independent sources randomly generate status update packets which are sent to the destination via a relay through error-prone links. A stochastic optimization problem is formulated and solved in known and unknown environments. For the known environment, an online nearoptimal low-complexity policy is developed using the drift-plus-penalty method. For the unknown environment, a deep reinforcement learning policy is developed by employing the Lyapunov optimization theory and a dueling double deep Q-network. Simulation results show up to 136% performance improvement of the proposed policy compared to a greedy-based baseline policy.
Funding Agencies|Infotech Oulu; Academy of Finland [340171, 323698, 346208]