Grant-free multiple access (GFMA) mitigates the uplink handshake overhead to support low-latency communication by transmitting payload data together with the pilot (preamble). However, the channel capacity with random access is limited by the number of available orthogonal pilots and the incoordination among devices. We consider a delay-constrained GFMA system, where each device with randomly generated data traffic needs to deliver its data packets before some pre-determined deadline. The pilot selection problem is formulated to minimize the average packet drop rate of the worst user. A priority-sorting based centralized policy is derived by introducing a fairness promoting function. For decentralization, we propose a multi-agent policy optimization algorithm with improved sample efficiency by exploring the model structure. Simulation results show that our proposed scheme facilitates near-optimal coordination between devices by using only partial state information.
Funding agencies: This work was supported in part by Excellence Center at Linkoping-Lund in Information Technology (ELLIIT), and by the Knut and Alice Wallenberg foundation.