Constrained Deep Reinforcement Learning for Cost-Efficient and Timely Packet Delivery

Constrained Deep Reinforcement Learning for Cost-Efficient and Timely Packet Delivery

NYU Wireless P.I.s

Research Overview

Next-generation networks aim to provide performance guarantees to real-time interactive services that require timely and cost-efficient packet delivery. In this context, the goal is to reliably deliver packets with strict deadlines imposed by the application while minimizing overall resource allocation cost. See Fig. 1 which shows how queues are generated based on packet lifetimes.

Existing methods leverage stochastic optimization techniques for dynamic routing and scheduling; however, they fall short when faced with per-packet delay requirements. As a solution, we formulate the minimum-cost delay-constrained network control problem as a constrained Markov decision process and utilize constrained deep reinforcement learning (CDRL) techniques to effectively minimize total resource allocation while maintaining timely throughput above a target reliability level. Using multi-agent deep deterministic policy gradient (MADDPG), we train a centralized router and distributed schedulers at every node as shown in Fig. 2, and we demonstrate that our solutions can achieve lower cost than existing baselines and achieve target reliability levels when other methods fall short, as shown in Fig. 3 [1].

Fig. 1: Illustration of lifetime-based queue dynamics: Packets turn red as their lifetime decreases, blue arrows represent queue evolution, and packets without edges and with dashed edges indicate available packets at times t-1 and t, respectively

Fig. 2: Illustration of routing and scheduling with 1) packet arrivals, 2) path selection by the routing agent, 3-4) packet forwarding/dropping by scheduling agents, and 5) packet aging.

Fig. 3: Reliability and cost per episode for two commodities whose reliability constraints are 0.7 and 0.6, respectively. The cost of CDRL-NC is lower than existing baselines and it stays over the reliability requirements with low cost even when other baselines cannot.