实验环境:
Windows物理机:192.168.1.4
WSL Ubuntu 20.04.6 LTS:172.19.32.196
Windows下的一个http服务器:HFS,大概长这个样子:
客户端就是Ubuntu,服务端就是这个http服务器(下文称服务器或服务端),服务器ip通过参数传递给程序。
源码在最后边。
主函数程序分以下几个部分:
1.获取参数,并通过netlink通信选择网卡进行通信(实际Ubuntu只有一个网卡,ip为172.19.32.196),并初始化服务器和客户端的ip和端口。关于通过netlink通信选择网卡进行通信可以参见下面链接:
netlink通信——读取路由表获取通信网卡IP
部分代码:
...// netlink通信uint32_t src_address = getLocalIPAddress(inet_addr(dst));...src_addr.sin_family = AF_INET;src_addr.sin_port = htons((uint16_t) getpid()); // 将当前进程ID作为源端口src_addr.sin_addr = *(struct in_addr *) &src_address;dst_addr.sin_family = AF_INET;dst_addr.sin_port = htons(HTTP_PORT);dst_addr.sin_addr.s_addr = inet_addr(dst);...
2.创建两个socket,发送socket和接收socket,绑定客户端(这步可要可不要,因为本例四元组是不变的),设置socket协议属性。
部分代码:
...send_sock_fd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);recv_sock_fd = socket(AF_INET, SOCK_RAW, IPPROTO_TCP);bind(recv_sock_fd, (const struct sockaddr *) &src_addr,sizeof(struct sockaddr_in)) < 0);int one = 1;setsockopt(recv_sock_fd, IPPROTO_IP, IP_HDRINCL, &one, sizeof(one));...
创建发送和接收套接字:
AF_INET表示TCP/IP – IPv4协议族;
SOCK_RAW表示套接字类型为原始套接字;
第三个参数为protocol参数,IPPROTO_RAW表示开发人员可以自己构造和解析 IP 数据包,用这个作为发送套接字的协议类型,需要我们自己对发出的数据包进行封装,并计算校验和;IPPROTO_TCP表示TCP包,表示收到的数据包为TCP数据包。
绑定客户端ip和端口:这步可不要,如上所说两方的ip和端口都是不变的。
设置socket协议属性:setsockopt设置了接收套接字的属性,第二个参数是套接字选项,常用选项有:
(1)套接字层级选项(SOL_SOCKET)
SO_REUSEADDR:允许重用本地地址。
SO_RCVBUF:设置接收缓冲区大小。
SO_SNDBUF:设置发送缓冲区大小。
SO_BROADCAST:允许发送广播消息。
SO_KEEPALIVE:启用保活机制,检测连接是否有效。
(2)IP 层选项(IPPROTO_IP)
IP_TTL:设置 IP 数据报的生存时间(TTL)。
IP_HDRINCL:指示应用程序提供 IP 头。
(3)TCP 层选项(IPPROTO_TCP)
TCP_NODELAY:禁用 Nagle 算法,减少延迟。
TCP_MAXSEG:设置 TCP 最大分段大小。
本例socket选项设置的是IP层选项IPPROTO_IP的IP_HDRINCL——表示接收的包包含IP头。
开始三次握手
connect_tcp(send_sock_fd, recv_sock_fd, &dst_addr, &src_addr);
//Blocking call
int connect_tcp(int send_fd, int recv_fd, struct sockaddr_in* dst_addr,struct sockaddr_in* src_addr)
{int ret = 0;// Initialize the TCP Session State with the given detailsbzero(&tcp_state, sizeof(tcp_state__t));tcp_state.max_segment_size = MAX_CLIENT_SEGMENT_SIZE; // 初始化MSStcp_state.client_window_size = CLIENT_WINDOW_SIZE; // 初始化拥塞窗口tcp_state.client_next_seq_num = STARTING_SEQUENCE; // 客户端下个包的seqtcp_state.session_info.dst_addr = *dst_addr; // 目的地址tcp_state.session_info.src_addr = *src_addr; // 源地址tcp_state.session_info.recv_fd = recv_fd; // 接收句柄tcp_state.session_info.send_fd = send_fd; // 发送句柄tcp_state.syn_retries = 5; // 重传次数tcp_state.cwindow_size = 1; // 拥塞窗口值initialize_mutex(&tcp_state.tcp_state_lock);initialize_mutex(&tcp_state.session_info.send_fd_lock);tcp_flags_t flags = {0};flags.ack = 1;flags.syn = 1;if (((ret = send_syn()) < 0) || ((ret = receive_syn_ack_segment(&flags)) < 0)|| ((ret = send_ack_segment(0)) < 0)){printf("Failed to set up TCP Connection!!");ret = -1;goto EXIT;}tcp_state.tcp_current_state = ESTABLISHED;EXIT: return ret;
}
握手流程大概如下:
可以看到分为三步:
1.发送SYN包,对应函数为:send_syn();
create_packet()创建一个TCP包,这个函数很重要,实现的很巧妙,通过设置偏移的指向可以找到IP头、TCP头和数据。TCP的SYN标志肯定要设置为1,然后就是构造包头——build_packet_headers,具体做的就是封装TCP头,计算TCP检验和,封装IP头,计算检验和。TCP的状态转换图中,客户端发送SYN包后,其状态由CLOSED变为SYN_SENT,所以还要设置TCP状态:tcp_state.tcp_current_state = SYN_SENT;然后就可以发送数据包了,除了使用sendto发送数据,还要创建重传定时器,设置其回调函数,当SYN包发送后,超时时间内未收到回应,还需重传SYN包。所以还需要将发送的packet写入发送循环队列缓冲,超时后,从发送循环队列取出保存的数据重新发送,循环队列本篇不展开讲。
2.接收SYN/ACK包,对应函数为receive_syn_ack_segment(&flags);
使用recvfrom函数接收数据,收到后需做一系列检验,如检验ip校验和,检验源目的端口和ip是否正确,检验TCP校验和,检验此包是否是重传包。然后还需要设置数据包中IP头、TCP头和数据的偏移指向。然后可以判断收到的包的SYN标志和ACK标志是否为1,判断是否是RST包。这些工作做完后,还需要处理这个SYN/ACK包,包括设置服务器下一个传来的包的seq,设置客户端下一个包的seq,更新服务器接收窗口的值,更新拥塞窗口值,从接收循环队列中删除此回应包(因为已经处理完了),释放为了接收此包而开辟的空间,更新MSS。
3.发送ACK包,对应函数为send_ack_segment(0);
第三步很简单,发送回应,表示三次握手成功,参数为0表示FIN标志为0,即此包是个ACK包。设置TCP状态为ESTABLISHED。
那么结果如何呢,运行程序并使用wireshark抓包:
可以看到,在发送SYN包并接收SYN/ACK后,客户端不知怎么的又发送了个RST包,然后才发送了ACK包,程序看似运行成功,但是实际上三次握手建立连接是失败的。
这是什么原因呢?
本程序是使用raw socket进行通信,并不是使用的系统调用,当客户端发来SYN/ACK后,操作系统先收到了这个包,然后检查本地是否有对应的(使用系统调用创建的)socket,一检查,没有,那么就会发送一个RST包,然后这个三次握手建立连接就失败了。。
那么怎么解决的?
因为此程序是实验学习为主,所以解决办法可以是使用iptables将本机发出的RST包丢掉,此时服务端不会收到客户端发来的RST包,本次连接就可以成功建立了!
实现的shell脚本:
#!/bin/shif ! iptables -C OUTPUT -p tcp --tcp-flags RST RST -j DROP; theniptables -A OUTPUT -p tcp --tcp-flags RST RST -j DROP
fi./handshake "$@"
先检测iptables -C OUTPUT -p tcp --tcp-flags RST RST -j DROP这条命令是否执行,未执行的话就执行一遍。
查看结果:
成功!
最后,此程序只实现了三次握手,未实现四次挥手。
源码:
run.sh
#!/bin/shif ! iptables -C OUTPUT -p tcp --tcp-flags RST RST -j DROP; theniptables -A OUTPUT -p tcp --tcp-flags RST RST -j DROP
fi./handshake "$@"
Makefile
CFLAGS= -g -Werror -lrt -lpthread
CC=gccall:$(CC) handshake.c routing_table.c tcp_handler.c $(CFLAGS) -o handshakeclean:rm -rf handshake
handshake.c
#include "routing_table.h"
#include "tcp_handler.h"
#include <ctype.h>
#include <fcntl.h>
#include <unistd.h>#define WRITE_BUFFER_SIZE 2048
#define RECV_BUFFER_LENGTH 32768
#define REQ_LENGTH 256
#define STRIP_LEADING_NEWLINE_CHAR(ptr) \while(*ptr == '\n') \ptr++;
#define STRIP_LEADING_WHITESPACES(ptr) \while(*ptr == ' ') \ptr++;
#define STRIP_TRAILING_CARRIAGE_RETURN(ptr) (ptr[strlen(ptr)-1] = '\0')int main(int argc, char** argv)
{int send_sock_fd = -1, recv_sock_fd = -1;struct sockaddr_in src_addr, dst_addr;char dst[REQ_LENGTH] = {0};if (argc != 2){printf("Usage: ./rawhttpget ip\n");exit(1);}strncpy(dst, argv[1], REQ_LENGTH);memset(&src_addr, 0, sizeof(struct sockaddr_in));memset(&dst_addr, 0, sizeof(struct sockaddr_in));// netlink通信uint32_t src_address = getLocalIPAddress(inet_addr(dst));src_addr.sin_family = AF_INET;src_addr.sin_port = htons((uint16_t) getpid()); // 将当前进程ID作为源端口src_addr.sin_addr = *(struct in_addr *) &src_address;dst_addr.sin_family = AF_INET;dst_addr.sin_port = htons(HTTP_PORT);dst_addr.sin_addr.s_addr = inet_addr(dst);send_sock_fd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW); // IPPROTO_RAW:表示开发人员可以自己构造和解析 IP 数据包if (send_sock_fd < 0){printf("Error: Creation of Raw Socket failed: %s!!\n", strerror(errno));exit(1);}recv_sock_fd = socket(AF_INET, SOCK_RAW, IPPROTO_TCP); // IPPROTO_TCP表示接收TCP包if (recv_sock_fd < 0){printf("Error: Creation of Raw Socket failed: %s!!\n", strerror(errno));exit(1);}if (bind(recv_sock_fd, (const struct sockaddr *) &src_addr,sizeof(struct sockaddr_in)) < 0){printf("Error: Unable to bind the receiving socket: %s\n",strerror(errno));exit(1);}//IP_HDRINCL to tell the kernel that headers are included in the packetint one = 1;if (setsockopt(recv_sock_fd, IPPROTO_IP, IP_HDRINCL, &one, sizeof(one)) < 0) // IP_HDRINCL:数据中包含IP头{perror("Error setting IP_HDRINCL");exit(1);}char psrc_addr[256] = {0}, pdst_addr[256] = {0};printf("Src Address: %s Destination Address: %s\n",inet_ntop(AF_INET, &src_addr.sin_addr.s_addr, psrc_addr, 256),inet_ntop(AF_INET, &dst_addr.sin_addr.s_addr, pdst_addr, 256));if (connect_tcp(send_sock_fd, recv_sock_fd, &dst_addr, &src_addr) < 0){printf("TCP Connection Failed\n");goto EXIT;}elseprintf("TCP Connection Successful\n");EXIT: close(send_sock_fd);close(recv_sock_fd);}
routing_table.c
#include <stdio.h>
#include <stdlib.h>
#include <bits/sockaddr.h>
#include <asm/types.h>
#include <linux/rtnetlink.h>
#include <sys/socket.h>
#include <errno.h>
#include <arpa/inet.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <netdb.h>
#include <unistd.h>
#include <string.h>#define BUFFER_LENGTH 8192
typedef struct rt_request
{struct nlmsghdr nl;struct rtmsg rt;char payload[BUFFER_LENGTH];
} rt_request;uint32_t fetch_interface_ip(uint32_t if_index)
{int family;struct ifreq ifreq;char host[256] ={ 0 }, if_name[256] ={ 0 };uint32_t src_addr;int fd;if_indextoname(if_index, if_name); // 根据索引值获取网络接口名,如eth0fd = socket(AF_INET, SOCK_DGRAM, 0);if (fd < 0){perror("socket()");exit(EXIT_FAILURE);}memset(&ifreq, 0, sizeof ifreq);strncpy(ifreq.ifr_name, if_name, IFNAMSIZ);if (ioctl(fd, SIOCGIFADDR, &ifreq) != 0) // 获取接口ip{/* perror(name); */return -1; /* ignore */}switch (family = ifreq.ifr_addr.sa_family){case AF_UNSPEC:// return;return -1; /* ignore */case AF_INET:case AF_INET6:getnameinfo(&ifreq.ifr_addr, sizeof ifreq.ifr_addr, host, sizeof host,0, 0, NI_NUMERICHOST);break;default:sprintf(host, "unknown (family: %d)", family);}inet_pton(AF_INET, host, &src_addr);close(fd);return src_addr;
}void formRequest(rt_request* req)
{bzero(req, sizeof(req));
/*
struct nlmsghdr 为 netlink socket 自己的消息头,
这用于多路复用和多路分解 netlink 定义的所有协议类型以及其它一些控制,
netlink 的内核实现将利用这个消息头来多路复用和多路分解已经其它的一些控制,
因此它也被称为netlink 控制块。因此,应用在发送 netlink 消息时必须提供该消息头。
*/req->nl.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg));req->nl.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP; // NLM_F_REQUEST表示消息是一个请求req->nl.nlmsg_type = RTM_GETROUTE; // nlmsg_type消息内容// 填充rtmsg结构体,即路由表管理结构体,对于上面的RTM_GETROUTE操作来说,只需要定义下面两个内容req->rt.rtm_family = AF_INET;req->rt.rtm_table = RT_TABLE_MAIN;}void sendRequest(int sock_fd, struct sockaddr_nl *pa, rt_request* req)
{struct msghdr msg; // sendmsg和recvmsg的参数,描述发送消息和接收消息的结构体struct iovec iov; // iovec结构体用于描述一个数据缓冲区int rtn;bzero(pa, sizeof(pa));pa->nl_family = AF_NETLINK;bzero(&msg, sizeof(msg));msg.msg_name = pa;msg.msg_namelen = sizeof(*pa);iov.iov_base = (void *) req;iov.iov_len = req->nl.nlmsg_len;msg.msg_iov = &iov;msg.msg_iovlen = 1;while (1){if ((rtn = sendmsg(sock_fd, &msg, 0)) < 0){if (errno == EINTR)continue;else{printf("Error: Unable to send NetLink message:%s\n",strerror(errno));exit(1);}}break;}}int receiveReply(int sock_fd, char* response_buffer)
{char* p;int nll, rtl, rtn;struct nlmsghdr *nlp;struct rtmsg *rtp;bzero(response_buffer, BUFFER_LENGTH);p = response_buffer;nll = 0;while (1){if ((rtn = recv(sock_fd, p, BUFFER_LENGTH - nll, 0)) < 0){if (errno == EINTR)continue;else{printf("Failed to read from NetLink Socket: %s\n",strerror(errno));exit(1);}}nlp = (struct nlmsghdr*) p;if (nlp->nlmsg_type == NLMSG_DONE)break;p += rtn;nll += rtn;}return nll;
}uint32_t readReply(char *response, int nll, in_addr_t dst_address)
{struct nlmsghdr *nlp = NULL;struct rtmsg *rtp = NULL;struct rtattr *rtap = NULL;int rtl = 0, found_route = 0, default_route = 0;uint32_t route_addr, net_mask;uint32_t if_index = -1;nlp = (struct nlmsghdr*) response;for (; NLMSG_OK(nlp, nll); nlp = NLMSG_NEXT(nlp, nll)) // NLMSG_OK:检查nlh地址是否是一条完整的消息{ // NLMSG_NEXT:当前消息地址,返回下一个消息地址rtp = (struct rtmsg *) NLMSG_DATA(nlp); // NLMSG_DATA:从nlh首地址向后移动到data起始位置if (rtp->rtm_table != RT_TABLE_MAIN)continue;// RTM_RTA:输入route message指针,返回route第一个属性首地址rtap = (struct rtattr *) RTM_RTA(rtp); // rtattr结构体封装可选路由信息的通用结构,用于表示 Netlink 消息的属性rtl = RTM_PAYLOAD(nlp); // RTM_PAYLOAD:即rtmsg层封装的数据长度,相当于TCP数据包去掉IP报头和TCP报头长度得到TCP数据部分长度found_route = 0;default_route = 1;for (; RTA_OK(rtap, rtl); rtap = RTA_NEXT(rtap, rtl)) // RTA_OK:判断一个属性rta是否正确{ // RTA_NEXT:先对attrlen减去rta属性内容的全部长度,然后返回下一个rtattr的首地址switch (rtap->rta_type){// destination IPv4 addresscase RTA_DST:default_route = 0;route_addr = *((uint32_t*) RTA_DATA (rtap));net_mask = 0xFFFFFFFF;net_mask <<= (32 - rtp->rtm_dst_len);net_mask = ntohl(net_mask);if (route_addr == (dst_address & net_mask))found_route = 1;else if (route_addr == 0)default_route = 1;break;// unique ID associated with the network// interfacecase RTA_OIF: // Output interface indexif (found_route || default_route)if_index = *((uint32_t*) RTA_DATA (rtap));break;default:break;}}if (found_route)break;}return if_index;}
// Netlink分层模型及消息格式:https://onestraw.github.io/linux/netlink-message/
uint32_t getLocalIPAddress(in_addr_t dst_address)
{int route_sock_fd = -1, res_len = 0;struct sockaddr_nl sa, pa; // sa为消息接收者的 netlink 地址uint32_t if_index;rt_request req = {0};char response_payload[BUFFER_LENGTH] = {0};// Open Routing Socketif ((route_sock_fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE)) == -1){printf("Error: Failed to open routing socket: %s\n", strerror(errno));exit(1);}bzero(&sa, sizeof(sa));// nl_groups == 0 表示该消息为单播sa.nl_family = AF_NETLINK;sa.nl_pid = getpid(); // nl_pid表示接收消息者的进程IDbind(route_sock_fd, (struct sockaddr*) &sa, sizeof(sa));formRequest(&req); // 构造netlink消息sendRequest(route_sock_fd, &pa, &req); // 发送消息res_len = receiveReply(route_sock_fd, response_payload); // 接收消息if_index = readReply(response_payload, res_len, dst_address); // 从接收的消息中获取if(network interface)close(route_sock_fd);return fetch_interface_ip(if_index); // 从if_index获取接口ip
}
routing_table.h
#include <sys/types.h>
#include <netinet/in.h>#ifndef ROUTING_TABLE_H
#define ROUTING_TABLE_Huint32_t getLocalIPAddress(in_addr_t dst_address);#endif
tcp_handler.c
#include "tcp_handler.h"#define STARTING_SEQUENCE 1
#define TCP_WORD_LENGTH_WITH_NO_OPTIONS 5
#define HAS_TCP_OPTIONS(ptr) (ptr->doff > TCP_WORD_LENGTH_WITH_NO_OPTIONS)
#define TCP_OPTION_OFFSET(ptr) ((char*)ptr + (TCP_WORD_LENGTH_WITH_NO_OPTIONS * WORD_LENGTH))
#define TCP_OPTION_LENGTH(ptr) ((ptr->doff - TCP_WORD_LENGTH_WITH_NO_OPTIONS) * WORD_LENGTH)
#define END_OF_TCP_OPTION_CHECK(ptr) ((*ptr) == 0)
#define TCP_OPTIONS_LEN(ptr) ((ptr->doff - TCP_WORD_LENGTH_WITH_NO_OPTIONS) * WORD_LENGTH )
#define IS_NO_OPERATION(ptr) ((*ptr) == 1)
#define IS_MSS(ptr) ((*ptr) == 2)
#define OPTION_LENGTH(ptr) (*(ptr+1))
#define min(a,b) \({ __typeof__ (a) _a = (a); \__typeof__ (b) _b = (b); \_a < _b ? _a : _b; })
#define TCP_OPTION_DATA_OFFSET 2#define IS_DUPLICATE_TCP_SEGMENT(tcph) (ntohl(tcph->seq) < tcp_state.server_next_seq_num)
#define IS_DUPLICATE_ACK(tcph) (tcph->ack && (tcph->ack_seq == tcp_state.last_acked_seq_num) )
#define WRAP_ROUND_BUFFER_SIZE(index) \({ __typeof__ (index) _index = (index); \( _index + 1) > MAX_BUFFER_SIZE ? 0 : (_index + 1); })tcp_state__t tcp_state;/*Generic checksum calculation function*/
static unsigned short csum(uint16_t *ptr, unsigned int nbytes)
{uint32_t sum;uint16_t answer;sum = 0;while (nbytes > 1){sum += *ptr++;nbytes -= 2; // 以16位的字为单位计算和}if (nbytes == 1) // 如果总长度为奇数个字节,则在最后增添一个位都为0的字节{sum += *(unsigned char*) ptr;}// 将32bit数据压缩成16bit数据,即将高16bit与低16bit相加,将进位加到低16位上,最后取反sum = (sum >> 16) + (sum & 0xffff);sum = sum + (sum >> 16);answer = (short) ~sum;return (answer);
}static void calculate_tcp_checksum(struct tcphdr* tcph,uint16_t tcp_payload_len, uint32_t src_addr, uint32_t dst_addr)
{pseudo_header psh;char* pseudogram;uint16_t tcphdr_len = (tcph->doff * WORD_LENGTH); // tcph->doff:以32位字为单位表示TCP头长// pseudoheaderbzero(&psh, sizeof(pseudo_header));psh.source_address = src_addr;psh.dest_address = dst_addr;psh.protocol = IPPROTO_TCP;psh.tcp_length = htons(tcphdr_len + tcp_payload_len);int psize = sizeof(pseudo_header) + tcphdr_len + tcp_payload_len;pseudogram = malloc(psize);// TCP伪首部、TCP头、TCP数据bzero(pseudogram, psize);memcpy(pseudogram, &psh, sizeof(pseudo_header));memcpy(pseudogram + sizeof(pseudo_header), tcph,tcphdr_len + tcp_payload_len);// 计算校验和tcph->check = csum((uint16_t*) pseudogram, (unsigned int) psize);free(pseudogram);
}static int validate_ip_checksum(struct iphdr* iph)
{int ret = -1;uint16_t received_checksum = iph->check;iph->check = 0;if (received_checksum== csum((uint16_t*) iph, (unsigned int) (iph->ihl * WORD_LENGTH)))ret = 1;return ret;
}static int validate_tcp_checksum(struct tcphdr* tcph,uint16_t tcp_payload_length)
{int ret = -1;uint16_t received_checksum = tcph->check;tcph->check = 0;calculate_tcp_checksum(tcph, tcp_payload_length,*(uint32_t *) &tcp_state.session_info.dst_addr.sin_addr.s_addr,*(uint32_t *) &tcp_state.session_info.src_addr.sin_addr.s_addr);if (received_checksum == tcph->check)ret = 1;if (ret < 0) {printf("received_checksum:%d, tcph->check:%d\n", received_checksum, tcph->check);char psrc_addr[256] = {0}, pdst_addr[256] = {0};printf("Src Address: %s Destination Address: %s\n",inet_ntop(AF_INET, &tcp_state.session_info.src_addr.sin_addr.s_addr, psrc_addr, 256),inet_ntop(AF_INET, &tcp_state.session_info.dst_addr.sin_addr.s_addr, pdst_addr, 256));}return ret;
}static packet_t* create_packet()
{packet_t* packet = malloc(sizeof(packet_t));// send tcp synbzero(packet, sizeof(packet_t));packet->offset[IP_OFFSET] = packet->payload;packet->offset[TCP_OFFSET] = packet->payload + sizeof(struct iphdr);packet->offset[DATA_OFFSET] = packet->payload + sizeof(struct tcphdr)+ sizeof(struct iphdr);packet->retransmit_timer_id = NULL;return packet;
}static void adjust_layer_offset(packet_t* packet)
{struct tcphdr *tcph;struct iphdr *iph;iph = (struct iphdr *) packet->payload;tcph = (struct tcphdr *) (packet->payload + (iph->ihl * WORD_LENGTH));packet->offset[TCP_OFFSET] = (char*) tcph;packet->offset[DATA_OFFSET] = (char*) (packet->offset[TCP_OFFSET]+ (tcph->doff * WORD_LENGTH));
}static void destroy_packet(packet_t* packet)
{if (packet->retransmit_timer_id != NULL)timer_delete(packet->retransmit_timer_id);free(packet);
}static void remove_acked_entries(uint32_t next_expected_seq)
{pthread_mutex_lock(&tcp_state.sender_info.tcp_retx_lock);while ((tcp_state.sender_info.retx_buffer[tcp_state.sender_info.retx_buffer_head].packet_seq< next_expected_seq)&& !(tcp_state.sender_info.retx_buffer_head== tcp_state.sender_info.retx_buffer_tail)){destroy_packet(tcp_state.sender_info.retx_buffer[tcp_state.sender_info.retx_buffer_head].packet);tcp_state.sender_info.retx_buffer[tcp_state.sender_info.retx_buffer_head].packet = NULL;tcp_state.sender_info.retx_buffer_head =WRAP_ROUND_BUFFER_SIZE(tcp_state.sender_info.retx_buffer_head);}pthread_mutex_unlock(&tcp_state.sender_info.tcp_retx_lock);
}static void reset_packet_retransmission_timer(timer_t* timer_id,uint16_t timeInSecs)
{struct itimerspec timer_value = {0};timer_value.it_interval.tv_sec = timeInSecs;timer_value.it_value.tv_sec = timeInSecs;if (timer_settime(*timer_id, 0, &timer_value, NULL) < 0){printf("Failed to set time!!");timer_delete(*timer_id);*timer_id = NULL;}
}static void build_ip_header(struct iphdr* iph, uint16_t ip_payload_len)
{iph->daddr = *(uint32_t*) &tcp_state.session_info.dst_addr.sin_addr.s_addr;iph->saddr = *(uint32_t*) &tcp_state.session_info.src_addr.sin_addr.s_addr;iph->ihl = 5;iph->protocol = IPPROTO_TCP;iph->ttl = 255;iph->version = 4;iph->tot_len = sizeof(struct iphdr) + ip_payload_len;iph->check = csum((unsigned short*) iph, sizeof(struct iphdr));
}static void build_tcp_header(struct tcphdr* tcph, tcp_flags_t* flags,uint16_t payload_len)
{tcph->dest = *(uint16_t*) &tcp_state.session_info.dst_addr.sin_port;tcph->source = *(uint16_t*) &tcp_state.session_info.src_addr.sin_port;tcph->window = htons(tcp_state.client_window_size);tcph->seq = htonl(tcp_state.client_next_seq_num);tcp_state.client_next_seq_num +=(flags->syn || flags->fin) ? 1 : payload_len;tcph->doff = (flags->syn) ? 6 : 5;tcph->syn = flags->syn;tcph->ack = flags->ack;tcph->fin = flags->fin;tcph->psh = flags->psh;tcph->ack_seq = htonl(tcp_state.server_next_seq_num);if (flags->syn){char* tcp_options = ((char *) tcph) + sizeof(struct tcphdr);tcp_options_t mss = {0};mss.option_type = 2;mss.option_len = 4;mss.option_value = htons(1460);memcpy(tcp_options++, &mss.option_type, sizeof(char));memcpy(tcp_options++, &mss.option_len, sizeof(char));memcpy(tcp_options, &mss.option_value, sizeof(uint16_t));}
}static void build_packet_headers(packet_t* packet, int payload_len,tcp_flags_t* flags)
{struct tcphdr* tcph = (struct tcphdr*) packet->offset[TCP_OFFSET];struct iphdr* iph = (struct iphdr*) packet->offset[IP_OFFSET];build_tcp_header(tcph, flags, payload_len);calculate_tcp_checksum(tcph, payload_len,*(uint32_t *) &tcp_state.session_info.src_addr.sin_addr.s_addr,*(uint32_t *) &tcp_state.session_info.dst_addr.sin_addr.s_addr);build_ip_header(iph, ((tcph->doff * WORD_LENGTH) + payload_len));
}static int send_packet(void *buffer, int total_packet_len)
{int ret = -1;pthread_mutex_lock(&tcp_state.session_info.send_fd_lock);while (total_packet_len > 0){//Send the packetif ((ret = sendto(tcp_state.session_info.send_fd, buffer,total_packet_len, 0,(struct sockaddr *) &tcp_state.session_info.dst_addr,sizeof(struct sockaddr_in))) < 0){if (errno == EINTR){printf("Sendto() Interrupted!!");continue;}else{perror("sendto failed");goto EXIT;}}if (ret == total_packet_len)break;total_packet_len -= ret;buffer += ret;}EXIT: pthread_mutex_unlock(&tcp_state.session_info.send_fd_lock);return ret;
}static void handle_packet_retransmission()
{packet_t* packet = NULL;pthread_mutex_lock(&tcp_state.sender_info.tcp_retx_lock);int index = tcp_state.sender_info.retx_buffer_head;while (index != tcp_state.sender_info.retx_buffer_tail){packet = tcp_state.sender_info.retx_buffer[index].packet;// 重启重传定时器reset_packet_retransmission_timer(&packet->retransmit_timer_id, 0);if (send_packet(packet->payload, packet->payload_len) < 0)printf("Failed to retransmit packet!!");reset_packet_retransmission_timer(&packet->retransmit_timer_id, 60);index++;}pthread_mutex_unlock(&tcp_state.sender_info.tcp_retx_lock);
}static int send_ack_segment(uint8_t fin)
{int ret = -1;packet_t* packet = create_packet();tcp_flags_t flags ={ 0 };flags.ack = 1;flags.fin = fin;build_packet_headers(packet, 0, &flags);if ((ret = send_packet(&packet->payload,((struct iphdr*) packet->offset[IP_OFFSET])->tot_len)) < 0){printf("Send error!! Exiting.. ");}EXIT: destroy_packet(packet);return ret;
}static int receive_packet(packet_t *packet)
{int ret = -1;while (1){if ((ret = recvfrom(tcp_state.session_info.recv_fd, &packet->payload,sizeof(packet->payload), 0,NULL, NULL)) < 0){if (errno == EINTR)continue;else{perror("recv failed");return ret;}}//Data received successfullystruct iphdr *iph = (struct iphdr *) &packet->payload;// printf("packet->payload:%s\n", packet->payload);if (validate_ip_checksum(iph) < 0){printf("IP Checksum validation failed!! Packet dropped!!\n");continue;}uint16_t iphdr_len = iph->ihl * WORD_LENGTH;struct tcphdr *tcph = (struct tcphdr *) ((char*) iph + iphdr_len);uint16_t tcphdr_len = tcph->doff * WORD_LENGTH;if (iph->saddr != tcp_state.session_info.dst_addr.sin_addr.s_addr&& tcph->dest != tcp_state.session_info.src_port&& tcph->source != tcp_state.session_info.dst_port)continue;if (validate_tcp_checksum(tcph,(ntohs(iph->tot_len) - iphdr_len - tcphdr_len)) < 0){printf("TCP Checksum validation failed!! Packet dropped!!\n");continue;}if ( IS_DUPLICATE_ACK(tcph)){handle_packet_retransmission();continue;}else if ( IS_DUPLICATE_TCP_SEGMENT(tcph)){send_ack_segment(0);continue;}adjust_layer_offset(packet);packet->payload_len = (ntohs(iph->tot_len) - iphdr_len - tcphdr_len);// printf("packet->payload_len:%d\n", packet->payload_len);break;}return ret;
}static void process_ack(struct tcphdr *tcph, uint16_t payload_len)
{tcp_state.server_next_seq_num = (ntohl(tcph->seq) + payload_len); // 当前收到的包的序号是seq,长度是payload_len,那么下一个数据包的seq就是ntohl(tcph->seq) + payload_lentcp_state.last_acked_seq_num = (ntohl(tcph->ack_seq)); // 下一个发包的seqpthread_mutex_lock(&tcp_state.tcp_state_lock);tcp_state.server_window_size = ntohs(tcph->window); // 更新对端接收窗口值tcp_state.cwindow_size =(++tcp_state.cwindow_size > MAX_CONGESTION_WINDOW_SIZE) ?MAX_CONGESTION_WINDOW_SIZE : tcp_state.cwindow_size;pthread_cond_signal(&tcp_state.send_window_low_thresh);pthread_mutex_unlock(&tcp_state.tcp_state_lock);remove_acked_entries(ntohl(tcph->ack_seq)); // 删除已经收到回应的数据包// 更新tcp_state.max_segment_sizeif (HAS_TCP_OPTIONS(tcph)){char* tcp_options_offset = (char*) TCP_OPTION_OFFSET(tcph);uint16_t total_options_len = TCP_OPTIONS_LEN(tcph);while (!END_OF_TCP_OPTION_CHECK(tcp_options_offset)&& total_options_len > 0){if ( IS_NO_OPERATION(tcp_options_offset)){tcp_options_offset++;total_options_len--;}else if ( IS_MSS(tcp_options_offset)){tcp_state.max_segment_size =min(tcp_state.max_segment_size,*((uint16_t*)(tcp_options_offset+TCP_OPTION_DATA_OFFSET)));tcp_options_offset += OPTION_LENGTH(tcp_options_offset);total_options_len -= OPTION_LENGTH(tcp_options_offset);}else{tcp_options_offset += OPTION_LENGTH(tcp_options_offset);total_options_len -= OPTION_LENGTH(tcp_options_offset);}}}
}static void retransmission_timer_handler(union sigval value)
{int buffer_index = value.sival_int;packet_t* packet = NULL;pthread_mutex_lock(&tcp_state.tcp_state_lock);tcp_state.cwindow_size = 1;pthread_mutex_unlock(&tcp_state.tcp_state_lock);pthread_mutex_lock(&tcp_state.sender_info.tcp_retx_lock);if (tcp_state.sender_info.retx_buffer[buffer_index].packet == NULL|| buffer_index < tcp_state.sender_info.retx_buffer_head)goto EXIT;packet = tcp_state.sender_info.retx_buffer[buffer_index].packet;if (send_packet(&packet->payload,((struct iphdr*) packet->offset[IP_OFFSET])->tot_len) < 0){printf("Failed to retransmit packet!!\n");}EXIT: pthread_mutex_unlock(&tcp_state.sender_info.tcp_retx_lock);
}void create_retransmission_timer(timer_t* timer, int send_buffer_index)
{union sigval val;struct sigevent sev;struct itimerspec timer_value = {0};memset(&val, 0, sizeof(val));memset(&sev, 0, sizeof(sev));val.sival_int = send_buffer_index;// SIGEV_THREAD:当定时器到期,内核会(在此进程内)以sigev_notification_attributes为线程属性创建一个线程,// 并且让它执行sigev_notify_function,传入sigev_value作为为一个参数。sev.sigev_notify = SIGEV_THREAD;sev.sigev_value = val;sev.sigev_notify_function = retransmission_timer_handler; // 定时器到期,重传数据包(即超时重传)// 创建定时器// CLOCK_MONOTONIC:从系统启动这一刻起开始计时,不受系统时间被用户改变的影响if (timer_create(CLOCK_MONOTONIC, &sev, timer) < 0){printf("Failed to create the retransmission timer!!");*timer = NULL;goto EXIT;}timer_value.it_interval.tv_sec = 60; // it_interval:定时时间 60stimer_value.it_value.tv_sec = 60; // it_value:单次启动时间 60s// 设置定时器if (timer_settime(*timer, 0, &timer_value, NULL) < 0){printf("Failed to set time!!");timer_delete(*timer);*timer = NULL;}EXIT: return;
}static int send_tcp_segment(packet_t* packet)
{int ret = 0;if ((ret = send_packet(&packet->payload,((struct iphdr*) packet->offset[IP_OFFSET])->tot_len)) < 0){printf("Send error!! Exiting.. ");goto EXIT;}// 创建重传定时器,超时重传数据包 NULL 0create_retransmission_timer(&packet->retransmit_timer_id,tcp_state.sender_info.retx_buffer_tail);pthread_mutex_lock(&tcp_state.sender_info.tcp_retx_lock);// 数据包写入发送循环队列tcp_state.sender_info.retx_buffer[tcp_state.sender_info.retx_buffer_tail].packet_seq =((struct tcphdr*) &packet->offset[TCP_OFFSET])->seq;tcp_state.sender_info.retx_buffer[tcp_state.sender_info.retx_buffer_tail].packet =packet;// 发送尾指针加一,指向下一个空队列空间tcp_state.sender_info.retx_buffer_tail =WRAP_ROUND_BUFFER_SIZE(tcp_state.sender_info.retx_buffer_tail);pthread_mutex_unlock(&tcp_state.sender_info.tcp_retx_lock);EXIT: return ret;
}static int send_syn()
{int ret = -1;packet_t* packet = create_packet();tcp_flags_t flags = {0};flags.syn = 1;build_packet_headers(packet, 0, &flags);tcp_state.tcp_current_state = SYN_SENT;return send_tcp_segment(packet);
}static int receive_syn_ack_segment(tcp_flags_t* flags)
{int ret = -1;packet_t* packet = create_packet();struct tcphdr *tcph;while (1){if ((ret = receive_packet(packet)) < 0){printf("Receive error!! Exiting.. ");goto EXIT;}tcph = (struct tcphdr *) packet->offset[TCP_OFFSET];if (tcph->ack == flags->ack && tcph->syn == flags->syn)break;if (tcph->rst || !tcp_state.syn_retries){ret = -1;goto EXIT;}}process_ack(tcph, 1);EXIT: destroy_packet(packet);return ret;
}static int initialize_mutex(pthread_mutex_t* mutex)
{int ret = -1;pthread_mutexattr_t mutex_attr;if ((ret = pthread_mutexattr_init(&mutex_attr)) != 0){printf("Failed to initialize mutex attribute\n");ret = -1;goto EXIT;}if ((ret = pthread_mutexattr_settype(&mutex_attr, PTHREAD_MUTEX_RECURSIVE))!= 0){printf("Failed to set mutex attribute\n");ret = -1;goto EXIT;}if ((ret = pthread_mutex_init(mutex, &mutex_attr)) != 0){printf("Failed to initialize mutex!!\n");ret = -1;}EXIT: return ret;
}static void get_wait_time(struct timespec* timeToWait, uint16_t timeInSeconds)
{struct timeval now;int rt;gettimeofday(&now, NULL);timeToWait->tv_sec = now.tv_sec + timeInSeconds;timeToWait->tv_nsec = 0;
}//Blocking call
int connect_tcp(int send_fd, int recv_fd, struct sockaddr_in* dst_addr,struct sockaddr_in* src_addr)
{int ret = 0;// Initialize the TCP Session State with the given detailsbzero(&tcp_state, sizeof(tcp_state__t));tcp_state.max_segment_size = MAX_CLIENT_SEGMENT_SIZE; // 初始化MSStcp_state.client_window_size = CLIENT_WINDOW_SIZE; // 初始化拥塞窗口tcp_state.client_next_seq_num = STARTING_SEQUENCE; // 客户端下个包的seqtcp_state.session_info.dst_addr = *dst_addr; // 目的地址tcp_state.session_info.src_addr = *src_addr; // 源地址tcp_state.session_info.recv_fd = recv_fd; // 接收句柄tcp_state.session_info.send_fd = send_fd; // 发送句柄tcp_state.syn_retries = 5; // 重传次数tcp_state.cwindow_size = 1; // 拥塞窗口值initialize_mutex(&tcp_state.tcp_state_lock);initialize_mutex(&tcp_state.session_info.send_fd_lock);tcp_flags_t flags = {0};flags.ack = 1;flags.syn = 1;if (((ret = send_syn()) < 0) || ((ret = receive_syn_ack_segment(&flags)) < 0)|| ((ret = send_ack_segment(0)) < 0)){printf("Failed to set up TCP Connection!!");ret = -1;goto EXIT;}tcp_state.tcp_current_state = ESTABLISHED;EXIT: return ret;
}static int send_fin()
{int ret = -1;packet_t* packet = create_packet();tcp_flags_t flags = {0};flags.fin = 1;flags.ack = 1;build_packet_headers(packet, 0, &flags);return send_tcp_segment(packet);
}int close_tcp()
{int ret = -1;pthread_mutex_lock(&tcp_state.tcp_state_lock);if (!((tcp_state.tcp_current_state & ESTABLISHED)|| (tcp_state.tcp_current_state & CLOSE_WAIT))){pthread_mutex_unlock(&tcp_state.tcp_state_lock);goto EXIT;}pthread_mutex_unlock(&tcp_state.tcp_state_lock);if ((ret = send_fin()) < 0)goto EXIT;struct timespec timeToWait;get_wait_time(&timeToWait, 10);pthread_mutex_lock(&tcp_state.tcp_state_lock);if (tcp_state.tcp_current_state & ESTABLISHED)tcp_state.tcp_current_state = FIN_WAIT_1;elsetcp_state.tcp_current_state = LAST_ACK;tcp_state.tcp_write_end_closed = 1;pthread_cond_timedwait(&tcp_state.tcp_session_closed_notify,&tcp_state.tcp_state_lock, &timeToWait);pthread_mutex_unlock(&tcp_state.tcp_state_lock);EXIT: return ret;
}static void release_and_update_recv_buffer(packet_t* packet)
{pthread_mutex_lock(&tcp_state.recv_info.tcp_recv_lock);tcp_state.recv_info.recv_buffer[tcp_state.recv_info.recv_buffer_head].packet =NULL;tcp_state.recv_info.recv_buffer_head =WRAP_ROUND_BUFFER_SIZE(tcp_state.recv_info.recv_buffer_head);destroy_packet(packet);pthread_cond_signal(&tcp_state.recv_info.recv_buffer_full);pthread_mutex_unlock(&tcp_state.recv_info.tcp_recv_lock);}int receive_data(char* buffer, int buffer_len)
{int total_bytes_read = 0, ret = -1;packet_t* packet = NULL;struct timespec timeToWait;while (buffer_len > 0){get_wait_time(&timeToWait, 5);pthread_mutex_lock(&tcp_state.recv_info.tcp_recv_lock);if (tcp_state.recv_info.recv_buffer_head== tcp_state.recv_info.recv_buffer_tail){if (total_bytes_read > 0){pthread_mutex_unlock(&tcp_state.recv_info.tcp_recv_lock);break;}else{if ((ret = pthread_cond_timedwait(&tcp_state.recv_info.recv_buffer_empty,&tcp_state.recv_info.tcp_recv_lock, &timeToWait)) != 0){pthread_mutex_unlock(&tcp_state.recv_info.tcp_recv_lock);if (ret == ETIMEDOUT){pthread_mutex_lock(&tcp_state.tcp_state_lock);if (tcp_state.tcp_read_end_closed){printf("TCP Server Closed!!\n");total_bytes_read = -1;pthread_mutex_unlock(&tcp_state.tcp_state_lock);break;}pthread_mutex_unlock(&tcp_state.tcp_state_lock);continue;}elsebreak;}}}packet =tcp_state.recv_info.recv_buffer[tcp_state.recv_info.recv_buffer_head].packet;pthread_mutex_unlock(&tcp_state.recv_info.tcp_recv_lock);int copied_bytes = 0;if (packet->payload_len > buffer_len){printf("CHUNKED TRANSFER: %d:%d\n", packet->payload_len,buffer_len);memcpy((buffer + total_bytes_read), packet->offset[DATA_OFFSET],buffer_len);packet->offset[DATA_OFFSET] += buffer_len;packet->payload_len -= buffer_len;total_bytes_read += buffer_len;copied_bytes = buffer_len;buffer_len = 0;}else{memcpy((buffer + total_bytes_read), packet->offset[DATA_OFFSET],packet->payload_len);buffer_len -= packet->payload_len;total_bytes_read += packet->payload_len;copied_bytes = packet->payload_len;release_and_update_recv_buffer(packet);}pthread_mutex_lock(&tcp_state.tcp_state_lock);tcp_state.client_window_size += copied_bytes;tcp_state.client_window_size =(tcp_state.client_window_size > CLIENT_WINDOW_SIZE) ?CLIENT_WINDOW_SIZE : tcp_state.client_window_size;pthread_mutex_unlock(&tcp_state.tcp_state_lock);}return total_bytes_read;
}
tcp_handler.h
#ifndef TCP_HANDLER_H_
#define TCP_HANDLER_H_#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <errno.h>
#include <string.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>
#include <string.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <pthread.h>
#include <signal.h>
#include <time.h>
#include <sys/time.h>#define TOTAL_LAYERS 2
#define IP_LAYER_OFFSET 0
#define TCP_LAYER_OFFSET 1
#define PAYLOAD_OFFSET 2
#define CLIENT_PORT 35555
#define HTTP_PORT 80
#define RTAX_MAX 8
#define IP_OFFSET 0
#define TCP_OFFSET 1
#define DATA_OFFSET 2
#define MAX_BUFFER_SIZE 400
#define MAX_CLIENT_SEGMENT_SIZE 1460
// #define CLIENT_WINDOW_SIZE 16384
#define CLIENT_WINDOW_SIZE 12000
#define WORD_LENGTH 4
// #define PACKET_MAX_SIZE 16384
#define PACKET_MAX_SIZE 12000
#define MAX_PAYLOAD_LEN (PACKET_MAX_SIZE - sizeof(struct iphdr) - sizeof(struct tcphdr))
#define MAX_CONGESTION_WINDOW_SIZE 1000typedef enum
{SYN_SENT = 1,ESTABLISHED = 2,FIN_WAIT_1 = 4,FIN_WAIT_2 = 8,CLOSE_WAIT = 16,CLOSING = 32,LAST_ACK = 64,CLOSED = 128
} tcp_state_machine_t;typedef struct
{uint8_t syn :1;uint8_t ack :1;uint8_t fin :1;uint8_t psh :1;
} tcp_flags_t;typedef struct
{uint8_t option_type;uint8_t option_len;uint16_t option_value;
} tcp_options_t;typedef struct
{char payload[PACKET_MAX_SIZE];char* offset[TOTAL_LAYERS + 1];timer_t retransmit_timer_id;uint16_t payload_len;
} packet_t;typedef struct
{packet_t* packet;uint32_t packet_seq;
} buffered_packet_t;// TCP 伪首部
typedef struct
{u_int32_t source_address;u_int32_t dest_address;u_int8_t placeholder;u_int8_t protocol;u_int16_t tcp_length;
} pseudo_header;typedef struct
{struct sockaddr_in src_addr;struct sockaddr_in dst_addr;uint16_t src_port;uint16_t dst_port;int send_fd;int recv_fd;pthread_mutex_t send_fd_lock;
} session_info__t;typedef struct
{buffered_packet_t send_buffer[MAX_BUFFER_SIZE];uint16_t send_buffer_head;uint16_t send_buffer_tail;buffered_packet_t retx_buffer[MAX_BUFFER_SIZE];uint16_t retx_buffer_head;uint16_t retx_buffer_tail;pthread_mutex_t tcp_send_lock;pthread_mutex_t tcp_retx_lock;pthread_cond_t send_buffer_empty;pthread_cond_t send_buffer_full;
} tcp_send_data_t;typedef struct
{buffered_packet_t recv_buffer[MAX_BUFFER_SIZE];uint16_t recv_buffer_head;uint16_t recv_buffer_tail;pthread_mutex_t tcp_recv_lock;pthread_cond_t recv_buffer_empty;pthread_cond_t recv_buffer_full;
} tcp_recv_data_t;typedef struct
{session_info__t session_info;uint32_t client_next_seq_num; // 本端发送的下一个数据包的sequint32_t last_acked_seq_num; // (相对的)三次回应包的sequint32_t server_next_seq_num; // 对端下一个包的seq(即希望对方下一个包的数据是从第seq开始的)uint16_t server_window_size;uint16_t client_window_size;uint16_t max_segment_size;uint16_t cwindow_size;uint16_t ssthresh;pthread_cond_t send_window_low_thresh;uint8_t syn_retries;tcp_send_data_t sender_info;tcp_recv_data_t recv_info;pthread_mutex_t tcp_state_lock;pthread_cond_t tcp_session_closed_notify;uint8_t tcp_write_end_closed;uint8_t tcp_read_end_closed;pthread_t tcp_worker_threads[2];tcp_state_machine_t tcp_current_state;
} tcp_state__t;int connect_tcp(int send_fd, int recv_fd, struct sockaddr_in* dst_addr,struct sockaddr_in* src_addr);int send_data(char* buffer, int buffer_len);int receive_data(char* buffer, int buffer_len);int close_tcp();#endif /* TCP_HANDLER_H_ */
本代码参考:https://github.com/praveenkmurthy/Raw-Sockets