现在来谈下验证码图片的获取方式,带有验证码的地方都会附带有个刷新按钮,而刷新按钮的地方就是获取验证码网址代码。如果看过前面写的《模拟网页行为之工具篇》就会很容易定位到代码位置。定位到代码位置后看下图:
基本可以看到的是获取验证码的网址是:https://ipin.siren24.com/stickyCaptcha。但这还不够,因为前篇我们还讲过关于cookie的概念,需要带有cookie去刷新验证码才是有效的验证码,但如何获取cookie,看下图:
可见cookie类型是hostonly的,hostonly就是说只能在当前网页获取cookie。有了上述抓包分析,那么对验证码的流程有了基本的了解。
那么接下来的步骤分为
1.获取https://ipin.siren24.com/stickyCaptcha的cookie,。
2. 带cookie刷新验证码获取图片数据。
获取hostonly的cookie,c++代码实现如下:
std::string CWebLoginDlg::GetCookie( std::string url )
{LPSTR lpszData = NULL;DWORD dwSize=0; lpszData= new char[1];memset(lpszData,0, 1);retry: if (!InternetGetCookieA(url.c_str(), "", lpszData, &dwSize)){DWORD er = GetLastError();if (er == ERROR_INSUFFICIENT_BUFFER){delete []lpszData;lpszData = new char[dwSize+1];memset(lpszData,0,dwSize+1);goto retry;}else{ATLTRACE("cookie is null");}}std::string strCookieContent = std::string(lpszData, dwSize);delete [] lpszData;return strCookieContent;
}
参数即为:https://ipin.siren24.com/stickyCaptcha
若cookie为httponly类型,获取的方式也不一样,C++代码如下:
std::wstring CWebLoginDlg::GetCookieEx( std::wstring url )
{LPWSTR lpszData = NULL;DWORD dwSize=0; lpszData= new wchar_t[1];memset(lpszData,0, sizeof(wchar_t));retry: if (!InternetGetCookieEx(url.c_str(), L"JSESSIONID", lpszData, &dwSize, 0x00002000, NULL)){DWORD er = GetLastError();if (er == ERROR_INSUFFICIENT_BUFFER){delete []lpszData;lpszData = new wchar_t[dwSize+1];memset(lpszData,0,dwSize+1);goto retry;}else{ATLTRACE("cookie is null");}}std::wstring strCookieContent = std::wstring(lpszData, dwSize);delete [] lpszData;return strCookieContent;
}
刷新验证码图片数据,我采用的方式是用curl库,实际上所有网页走网络协议方式都可以借助curl来实现,但这里只单纯刷验证码图片数据。上面步骤把cookie获取到后,将其编辑成以下格式,然后将其作为参数cookie,传入到获取网页返回数据函数,c++代码表示如下:
std::string cookie = GetCookie("https://ipin.siren24.com/stickyCaptcha");
char nline[1024];
sprintf_s(nline, sizeof(nline),"%s; domain=ipin.siren24.com; path=/; hostOnly", cookie.c_str());
m_pCurlClient->GetURLResource("https://ipin.siren24.com/stickyCaptcha", nline, ret);
GetURLResource实现如下:
struct MemoryStruct {char *memory;size_t size;
};size_t CurlClient::WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp)
{size_t realsize = size * nmemb;if (userp == NULL){return realsize;}struct MemoryStruct *mem = (struct MemoryStruct *)userp;//ATLTRACE("222 chunk addr %x %d %d threaid %d", (DWORD)mem, mem->size, realsize, GetCurrentThreadId());mem->memory = (char*)realloc(mem->memory, mem->size + realsize + 1);if(mem->memory == NULL) {/* out of memory! */ printf("not enough memory (realloc returned NULL)\n");return 0;}memcpy(&(mem->memory[mem->size]), contents, realsize);mem->size += realsize;mem->memory[mem->size] = 0;return realsize;
}bool CurlClient::GetURLResource( std::string url, std::string cookie, std::string &rev)
{bool ssl = (url.find("https") != std::string::npos);struct MemoryStruct chunk;chunk.memory = (char*)malloc(1); chunk.size = 0; CURL *curl;CURLcode res;curl = curl_easy_init();if (curl){if (!cookie.empty()){char nline[1024];sprintf_s(nline, sizeof(nline),"Set-Cookie: ""%s", cookie.c_str());res = curl_easy_setopt(curl, CURLOPT_COOKIELIST, nline);}curl_easy_setopt(curl, CURLOPT_URL, url.c_str());int agentIndex = m_Multi ? GetCurrentThreadId() % m_UserAgentList.size() : 0;curl_easy_setopt(curl, CURLOPT_USERAGENT, m_UserAgentList[agentIndex].c_str());if (ssl){curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);}curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &CurlClient::WriteMemoryCallback);curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void*)&chunk);curl_easy_setopt(curl, CURLOPT_VERBOSE, 1);curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L);curl_easy_setopt(curl, CURLOPT_FORBID_REUSE, 1); //多线程完成任务马上断开连接curl_easy_setopt(curl, CURLOPT_TIMEOUT, 30);curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, 15);curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);res = curl_easy_perform(curl);if (res != CURLE_OK){char curlerror[1024 * 5] = {0};sprintf_s(curlerror, _countof(curlerror), "返回的信息 %s",curl_easy_strerror(res));m_Error = curlerror;}rev = std::string(chunk.memory, chunk.size);free(chunk.memory);curl_easy_cleanup(curl);}return res == CURLE_OK;
}
以上,验证码的图片数据即可获取。