一、代码场景
将FTP服务器设计为多线程形式。
FTP服务器在处理客户端响应时,对数据连接描述符dataFd和控制连接描述符ctrlFd分别进行集中处理。
因为方便用select()
多路复用,开两个线程分发连接到来的事件。
1. 整体框架
void
addToControlServer (int connFd, struct sockaddr_in clientAddr)
{static Server *server = new ControlServer();server->addClient (connFd, clientAddr);
}void
addToDataServer (int connFd, struct sockaddr_in clientAddr)
{static Server *server = new DataServer();server->addClient (connFd, clientAddr);
}void
solveConrtrolConnection ()
{while (1){int connfd = accept (listenFd, (struct sockaddr *)&clientAddr, &clientAddrLen);addToControlServer (connfd, clientAddr);}
}void
solveDataConnection ()
{...while (1){int connfd = accept (listenFd, (struct sockaddr *)&clientAddr, &clientAddrLen);addToDataServer (connfd, clientAddr);}
}
int main()
{std::thread ctrlThread([]{ solveConrtrolConnection(); })std::thread dataThread([]{ solveDataConnection(); })ctrlThread.join();dataThread.join();retun 0;
}
2. Server实现
Server底层使用了select()
多路复用。
class Server : public ErrorUtil {
public:Server();void addClient (int fd, struct sockaddr_in addr);void removeClient (int fd);
protected:virtual void preAdd (int fd, struct sockaddr_in addr);virtual void postRemove (int fd);virtual void workWhenDataCome (int fd) = 0;
private:class Impl;std::shared_ptr<Impl> m_pImpl = nullptr;
};class Server::Impl {
public:Impl (Server *server){assert (server);m_server = server;auto workThread = std::thread (&Impl::threadEntry, this);workThread.detach();dispatcher.waitForStartCompleted();}void addClient (int fd, struct sockaddr_in addr){m_server->preAdd (fd, addr);dispatcher.addFd (fd);dispatcher.stopWait();}void removeClient (int fd){dispatcher.removeFd (fd);dispatcher.stopWait();m_server->postRemove (fd);}private:void threadEntry (){while (true) {std::vector<int> readAbleFds = dispatcher.waitForReadAble();std::for_each (readAbleFds.begin(), readAbleFds.end(), [&] (int fd) { m_server->workWhenDataCome (fd); });}}EasySelect dispatcher;Server *m_server;
};
ControlServer处理控制连接相关的逻辑:
class ControlServer : public Server {
...
private:void stopService (int fd){ ...removeClient (fd);}void workWhenDataCome (int fd) override{...stopService (fd);...}...
};
二、死锁位置
代码不少,没必要全看。
死锁的位于是ControlServer
中对virtual void workWhenDataCome (int fd) = 0;
的重实现:
if (nRead == 0) {...stopService (fd);...
}
问题就出现在stopService()
void stopService (int fd)
{...removeClient (fd);
}
stopService()
中调用了removeClient()
,removeClient()
中调用了另一个类EasySelect
的stopWait()
方法
这里
EasySelect仅仅
是对select()
进行了易用性封装。
void
EasySelect::Impl::stopWait()
{char ANY_CHAR = 0;IOUtil::writen (m_pipeWrite, &ANY_CHAR, 1);std::unique_lock<std::mutex> lock (m_notifyMutex);m_stopFromPipeCond.wait (lock, [&] { return m_stopFromPipe == true; });m_stopFromPipe = false;IOUtil::readn (m_pipeRead, &ANY_CHAR, 1);
}
waitForReadable()
则相当于select()
系统调用,里面利用匿名管道来实现对select()
的中断。
std::vector<int>
EasySelect::Impl::waitForReadable()
{fd_set fdSet;FD_ZERO (&fdSet);int fd_limits = -1;std::for_each (m_fds.begin(), m_fds.end(), [&] (int fd) {FD_SET (fd, &fdSet);fd_limits = std::max (fd_limits, fd);});FD_SET (m_pipeRead, &fdSet); //把管道读端放进去fd_limits = std::max (fd_limits, m_pipeRead);++fd_limits;{m_isWaiting = true;m_startCompleted.notify_one();}int nReadAble = ::select (fd_limits, &fdSet, NULL, NULL, NULL);if (nReadAble == -1) {setError (strerror (errno));return {};}m_isWaiting = false;std::vector<int> ret;std::for_each (m_fds.begin(), m_fds.end(), [&] (int fd) {if (FD_ISSET (fd, &fdSet) && fd != m_pipeRead) {ret.emplace_back (fd);}});if (ret.empty()) {std::lock_guard<std::mutex> lock (m_notifyMutex);m_stopFromPipe = true;m_stopFromPipeCond.notify_one();}return ret;
}
现在的程序实际只有两个线程:主线程(只用来建立连接)、ctrlFd处理线程。
原本的设想是在给每个客户端都再分配一个线程来执行任务,为了测试暂且只用了ctrlFd
线程来串行处理所有任务。
下面的workWhenDataCome
调用了stopWait()
,在ctrlFd
线程中执行。
然而stopWait()
需要等待条件变量m_stopFromPipeCond
的触发,这个条件变量又是在waitForReadAble()
时才会产生的,线程现在卡在stopWait()
处,根本不会执行waitForReadAble()
。
void threadEntry ()
{while (true) {std::vector<int> readAbleFds = dispatcher.waitForReadAble();std::for_each (readAbleFds.begin(), readAbleFds.end(), [&] (int fd) { m_server->workWhenDataCome (fd); });}
}
于是就产死锁了。
三、吸取教训
一定要搞清楚每个函数在执行时各自跑在哪个线程,就像看到变量就应该明白它在哪片内存空间。