ns3-gym入门（二）：linear-mesh例子详解

一、问题背景：Random Access

Controlling the random access in an IEEE 802.11 mesh network is challenging as the network nodes compete for the shared radio resources. It is known that assigning the same channel access probability to each node is not optimal [17] and therefore the literature proposed solutions where e.g. the channel access probability depends on the network load (queue size) of a node.

在IEEE 802.11网状网络中控制随机接入具有挑战性，因为网络节点会争夺共享的无线电资源。众所周知，为每个节点分配相同的信道访问概率并不是最优的，因此文献提出了一些解决方案，例如，信道访问概率取决于节点的网络负载（队列大小）。
争用窗口（Contention Window，CW）是一个关键概念，它与介质访问控制（MAC）协议中的冲突避免机制密切相关。具体来说，CW用于控制设备在发送数据之前等待的随机退避时间，CWmin和CWmax分别表示竞争窗口的最小值和最大值，较小的CW值意味着节点可以更频繁地尝试发送数据包，但也可能导致更多的冲突；较大的CW值则相反。以下是对CW的详细解释：

冲突避免：在WiFi网络中，多个设备可能同时尝试访问共享的无线信道。如果两个或多个设备同时发送数据，会导致数据包冲突。为了减少这种冲突，WiFi使用了一个称为载波侦听多路访问/冲突避免（CSMA/CA）的机制
退避算法：CSMA/CA中的退避算法规定，当一个设备检测到信道正在被占用时，它不会立即重试发送数据，而是会等待一个随机的时间段。这个随机时间段由争用窗口（CW）决定
CW的范围：争用窗口的大小是一个动态变化的范围，通常表示为CW_min, CW_maxCWmin,CWmax。当设备第一次尝试发送数据时，它会从0, CW_min0,CWmin范围内随机选择一个时间槽（slot time）进行等待。如果再次发生冲突，CW的范围会加倍，直到达到CW_max。这种机制称为指数退避（Exponential Backoff）。

作为网络负载函数的控制信道访问概率值。我们在 ns-3 中创建了一个由五个节点组成的线性拓扑，并设置了从最左侧到最右侧节点的饱和 UDP 数据包流。

observation - queue lengths of each node 节点队列长度
actions - set channel access probability for each node; here we set both CWmin and CWMax to the same value, i.e. uniform backoff (window stays constant even when in case of packet collisions) 频道接入概率
reward - the number of packets received at the flow’s ultimate destination during last step interval 目的节点接收到的数据包数量
gameover - end of simulation time 仿真时间结束

二、关键步骤

①收集状态GetObservation：collect每个节点的队列长度存入box类型的状态空间

Ptr<OpenGymDataContainer>
MyGymEnv::GetObservation()
{NS_LOG_FUNCTION (this);uint32_t nodeNum = NodeList::GetNNodes ();std::vector<uint32_t> shape = {nodeNum,};Ptr<OpenGymBoxContainer<uint32_t> > box = CreateObject<OpenGymBoxContainer<uint32_t> >(shape);for (NodeList::Iterator i = NodeList::Begin (); i != NodeList::End (); ++i) {Ptr<Node> node = *i;Ptr<WifiMacQueue> queue = GetQueue (node);uint32_t value = queue->GetNPackets();box->AddValue(value);}NS_LOG_UNCOND ("MyGetObservation: " << box);return box;
}

收集节点队列长度的函数GetQueue：

Ptr<WifiMacQueue>
MyGymEnv::GetQueue(Ptr<Node> node)
{Ptr<NetDevice> dev = node->GetDevice (0);Ptr<WifiNetDevice> wifi_dev = DynamicCast<WifiNetDevice> (dev);Ptr<WifiMac> wifi_mac = wifi_dev->GetMac ();Ptr<RegularWifiMac> rmac = DynamicCast<RegularWifiMac> (wifi_mac);PointerValue ptr;rmac->GetAttribute ("Txop", ptr);Ptr<Txop> txop = ptr.Get<Txop> ();Ptr<WifiMacQueue> queue = txop->GetWifiMacQueue ();return queue;
}

②奖励：根据收包数确定

float
MyGymEnv::GetReward()
{NS_LOG_FUNCTION (this);static float lastValue = 0.0;float reward = m_rxPktNum - lastValue;lastValue = m_rxPktNum;NS_LOG_UNCOND ("MyGetReward: " << reward);return reward;
}

收包数的统计

void
MyGymEnv::NotifyPktRxEvent(Ptr<MyGymEnv> entity, Ptr<Node> node, Ptr<const Packet> packet)
{NS_LOG_DEBUG ("Client received a packet of " << packet->GetSize () << " bytes");entity->m_currentNode = node;entity->m_rxPktNum++;NS_LOG_UNCOND ("Node with ID " << entity->m_currentNode->GetId() << " received " << entity->m_rxPktNum << " packets");entity->Notify();
}void
MyGymEnv::CountRxPkts(Ptr<MyGymEnv> entity, Ptr<Node> node, Ptr<const Packet> packet)
{NS_LOG_DEBUG ("Client received a packet of " << packet->GetSize () << " bytes");entity->m_currentNode = node;entity->m_rxPktNum++;
}

③动作：调整CWmin和CWmax，改变随机接入概率

bool
MyGymEnv::ExecuteActions(Ptr<OpenGymDataContainer> action)
{NS_LOG_FUNCTION (this);NS_LOG_UNCOND ("MyExecuteActions: " << action);Ptr<OpenGymBoxContainer<uint32_t> > box = DynamicCast<OpenGymBoxContainer<uint32_t> >(action);std::vector<uint32_t> actionVector = box->GetData();uint32_t nodeNum = NodeList::GetNNodes ();for (uint32_t i=0; i<nodeNum; i++){Ptr<Node> node = NodeList::GetNode(i);uint32_t cwSize = actionVector.at(i);SetCw(node, cwSize, cwSize);}return true;
}

调整Cw的函数SetCw:

bool
MyGymEnv::SetCw(Ptr<Node> node, uint32_t cwMinValue, uint32_t cwMaxValue)
{Ptr<NetDevice> dev = node->GetDevice (0);Ptr<WifiNetDevice> wifi_dev = DynamicCast<WifiNetDevice> (dev);Ptr<WifiMac> wifi_mac = wifi_dev->GetMac ();Ptr<RegularWifiMac> rmac = DynamicCast<RegularWifiMac> (wifi_mac);PointerValue ptr;rmac->GetAttribute ("Txop", ptr);Ptr<Txop> txop = ptr.Get<Txop> ();// if both set to the same value then we have uniform backoff?if (cwMinValue != 0) {NS_LOG_DEBUG ("Set CW min: " << cwMinValue);txop->SetMinCw(cwMinValue);}if (cwMaxValue != 0) {NS_LOG_DEBUG ("Set CW max: " << cwMaxValue);txop->SetMaxCw(cwMaxValue);}return true;
}

④路由环境与ns3-gym接口的耦合
以上过程均来源于mygym.cc，而在sim.cc里面，先单独配置好正常的网络环境，然后写入接口，主要是Env的初始化、统计收包数目的函数回调

  // OpenGym EnvPtr<OpenGymInterface> openGymInterface = CreateObject<OpenGymInterface> (openGymPort);Ptr<MyGymEnv> myGymEnv;if (eventBasedEnv){myGymEnv = CreateObject<MyGymEnv> ();} else {myGymEnv = CreateObject<MyGymEnv> (Seconds(envStepTime));}myGymEnv->SetOpenGymInterface(openGymInterface);// connect OpenGym entity to event sourcePtr<UdpServer> udpServer = DynamicCast<UdpServer>(sinkApps.Get(0));if (eventBasedEnv){udpServer->TraceConnectWithoutContext ("Rx", MakeBoundCallback (&MyGymEnv::NotifyPktRxEvent, myGymEnv, dstNode));} else {udpServer->TraceConnectWithoutContext ("Rx", MakeBoundCallback (&MyGymEnv::CountRxPkts, myGymEnv, dstNode));}

⑤Agent的学习(dqn_agent_v1.py)

Our RL agent was able to learn to assign lower CWmin/CWMax values to nodes closer to the flow destination. Hence it was able to outperform the baseline where all nodes were assigned the same CWmin/CWMax.
我们的 RL 代理能够学习将较低的 CWmin/CWMax 值分配给更接近流目标的节点。因此，它的表现能够优于所有节点都被分配到相同的 CWmin/CWMax 的基线。

也就是说智能体可以逐渐学会如何根据节点队列长度来优化CWmin/CWmax值，通过分配较低的CW值，离目的地较近的节点能够更频繁地尝试发送数据包，从而减少数据传输的延迟和可能的中断，从而提高网络的整体性能。

首先定义DQN结构，这里用的是tensorflow框架

class DqnAgent(object):"""docstring for DqnAgent"""def __init__(self, inNum, outNum):super(DqnAgent, self).__init__()self.model = keras.Sequential()self.model.add(keras.layers.Dense(inNum, input_shape=(inNum,), activation='relu'))self.model.add(keras.layers.Dense(outNum, activation='softmax'))self.model.compile(optimizer=tf.train.AdamOptimizer(0.001),loss='categorical_crossentropy',metrics=['accuracy'])def get_action(self, state):return np.argmax(self.model.predict(state)[0])def predict(self, next_state):return self.model.predict(next_state)[0]def fit(self, state, target, action):target_f = self.model.predict(state)target_f[0][action] = targetself.model.fit(state, target_f, epochs=1, verbose=0)

其次定义agent

agent0 = DqnAgent(inputQueues, cwSize)
agent1 = DqnAgent(inputQueues, cwSize)
agent2 = DqnAgent(inputQueues, cwSize)
agent3 = DqnAgent(inputQueues, cwSize)

定义agent如何选择动作，这里使用的是贪心策略

        # Choose actionif np.random.rand(1) < epsilon:action0 = np.random.randint(cwSize)action1 = np.random.randint(cwSize)action2 = np.random.randint(cwSize)action3 = np.random.randint(cwSize)else:action0 = agent0.get_action(state[:,0:2])action1 = agent1.get_action(state[:,1:3])action2 = agent2.get_action(state[:,2:4])action3 = agent3.get_action(state[:,3:5])

执行动作并得到反馈

        # StepactionVec = [action0, action1, action2, action3, 100]next_state, reward, done, _ = env.step(actionVec)

根据reward训练、更新，Q值根据贝尔曼方程计算

        # Traintarget0 = rewardtarget1 = rewardtarget2 = rewardtarget3 = rewardif not done:target0 = reward + 0.95 * np.amax(agent0.predict(next_state[:,0:2]))target1 = reward + 0.95 * np.amax(agent1.predict(next_state[:,1:3]))target2 = reward + 0.95 * np.amax(agent2.predict(next_state[:,2:4]))target3 = reward + 0.95 * np.amax(agent3.predict(next_state[:,3:5]))agent0.fit(state[:,0:2], target0, action0)agent1.fit(state[:,1:3], target1, action1)agent2.fit(state[:,2:4], target2, action2)agent3.fit(state[:,3:5], target3, action3)

更新状态、奖励、贪心因子

        state = next_staterewardsum += rewardif epsilon > epsilon_min: epsilon *= epsilon_decay

记录数据

    time_history.append(time)rew_history.append(rewardsum)

训练过程绘图

plt.plot(range(len(time_history)), time_history)
plt.plot(range(len(rew_history)), rew_history)
plt.xlabel('Episode')
plt.ylabel('Time')
plt.show()

三、关于接口的思考
①ns3->ns3gym
初始化环境就相当于把所有的回调函数调用了
执行动作、收集状态最好单独写个函数，注意传参要统一

②ns3gym->python
其实和最原始的那两个例子差不多，主要是添加了强化学习的核心代码
像这里使用的是简单的DQN，没有涉及到经验回放，包含两个全连接层（Dense layers），第一个隐藏层使用 ReLU 激活函数，输出层使用 Softmax 激活函数
输入节点队列长度，输出竞争窗口大小