最近在做数据分析平台,那在做这个产品的时最需要的自然是测试数据,自己去构建行业测试数据比较麻烦,看到有同行产品的演示数据当然不能错过。由于采集过程中使用到BeetleX.Http.Clients去抓取第三方的Https接口数据,所以顺便记录一下BeetleX.Http.Clients使用的方法。
首先构建一个控制台程序,然后Nuget引用BeetleX.Http.Clients组件,引用后就可以干活了,组件提供一个HttpClient对象可以很方便的对webapi进行访问,即使是调用https也无须添加任何配置即可完成。
接下来取一下https://console.bce.xxxx.com/sugar/的演示产品数据。
BeetleX.Http.Clients.HttpJsonClient client = new BeetleX.Http.Clients.HttpJsonClient("https://sugar.bce.baidu.com/");
client.SetHeader("csrf-token", "xDgdnfJ8-NhQZ0xWD8ZsjBrs1pTAGyp-CJ6U");
client.SetHeader("cookie", "BIDUPSID=127E2C6CE0EBA549524FAA4EE738C5F7; PSTM=1559696211; BAIDUID=D1235FD0F3793CD52877199489385314:FG=1; MCITY=-257%3A; H_WISE_SIDS=107320_110085_127969_131423_132549_144966_154213_155931_164108_164869_165135_166148_167086_167296_168030_168490_168542_169061_169307_169708_169882_170149_170155_170221_170244_170355_170474_170579_170583_170590_170607_170762_170810_170817_170873_170957_171216_171223_171234_171523_171584_171622_171816_171837_171850_171989_172128_172247_172496_172679; CAMPAIGN_TRACK=cp%3Aotheronline-media%7Cpf%3Apc%7Cpp%3Abaiduyunduanxin-huodong-21kainianshengdian-laoyonghu%7Cpu%3Aduanxin%7Cci%3A2021knsd%7Ckw%3A10074020; CAMPAIGN_TRACK_TIME=2021-03-29+14%3A18%3A33; sugarbisid=s%3Ao_q8jIFFbRjcEf8-x-CCXG2yL_pZdEnt.948a4PF6nEojy%2FSYM0Y05l2f8%2Br%2F6dpTc9NCMuOQj78; sugar-company=scp_1013e-2xjcwe8b-oqpvmj; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BAIDUID_BFESS=AFCE5C81652686EBDB40FA33174550AB:FG=1; __yjs_duid=1_e8da08a67d6144100f2ea8eb59d6178f1617255570813; BDRCVFR[feWj1Vr5u3D]=I67x6TjHwwYf0; delPer=0; PSINO=6; ab_sr=1.0.0_YjQyYWZhZWE2Yjc4ZThlNWY1MWYwMDNlMTc2MmEyNDc2ZTI4ZjFmMGYxMDg0NTVjZTFiYWI4ZDg2MzVlM2RlOGQ5YTM1NTE3ZDJjNzk1NDUxYTExYjYzODI1YWEyYTAwOTNkMmFhYjg4NDQwNmU5NmQwYjRiMzk0Zjc0MDBiMzc=; H_PS_PSSID=33797_33639_33740_33272_33689_33760_33675_33392_33624_33163_26350_22159; BA_HECTOR=a4a5000g8ka1a081ov1g6damo0q; __bce-console-referrer__=; BCE_MONITOR_TRACK_SESSION_ID=161734115059605fe; Hm_lvt_28a17f66627d87f1d046eae152a1c93d=1614946024,1615288720,1616998877,1617341151; BDUSS=TI2b01Qa1ZQWTV3dzhnV0JDWWVtbTdHNGV-bzdCRE95LXdibmRhN0VUcnBONDVnSUFBQUFBJCQAAAAAAAAAAAEAAADrubswZmFuaGVucnlmYW4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOmqZmDpqmZgW; BDUSS_BFESS=TI2b01Qa1ZQWTV3dzhnV0JDWWVtbTdHNGV-bzdCRE95LXdibmRhN0VUcnBONDVnSUFBQUFBJCQAAAAAAAAAAAEAAADrubswZmFuaGVucnlmYW4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOmqZmDpqmZgW; x-bce-login-redirect-url=https%3A%2F%2Fconsole.bce.baidu.com%2Fcdn%2F%3F_%3D1611315494638#/cdn/list; bce-login-type=PASSPORT; bce-passport-stoken=d32f2a3af24c12142043dbb85ccd70f44dac7b8696f4af61b520a7d98b0654bc; bce-auth-type=PASSPORT; bce-sessionid=001f9c8d3c3652c440b8ec3194c735a27ea; bce-ctl-client-cookies=\"BDUSS, bce - passport - stoken, bce - device - cuid, bce - device - token, BAIDUID\"; bce-ctl-client-parameters=brt; bce-ctl-client-headers=\"\"; bce-user-info=\"2021 - 04 - 02T13: 26:03Z | ad8d8a4a97be5a3da27db213fbdd14f7\"; bce-login-display-name=fanhenryfan; bce-userbind-source=PASSPORT; bce-session=0cb2dc02d3454eb6aaa545892e3e29d034da1482052a4e2cab27863d4f43d8a4|2d53879befa5dddebb239f0b4fe7e8f9; bce-ctl-sessionmfa-cookie=bce-session; bce-login-expire-time=\"2021 - 04 - 02T05: 56:03Z | 802ff95a53ab810c6938f59a506512d3\"; bce-locale=zh-cn; BCE_MONITOR_TRACK_SESSION_ID=161734115059605fe; BAIDU_CLOUD_TRACK_PATH=https%3A%2F%2Fcloud.baidu.com%2Fproduct%2Fsugar.html; Hm_lpvt_28a17f66627d87f1d046eae152a1c93d=1617341420; _csrf=EIyOZMiPySNSeutEtga6VE9j; Hm_lvt_0369a83cfe6c3d97357eea08cc40e92f=1616998914,1617341442; Hm_lpvt_0369a83cfe6c3d97357eea08cc40e92f=1617341442");
client.SetHeader("referer", "https://sugar.bce.xxxx.com/group/first/manage/dbPreview?database=d_1013e-akrxglq5-kej8q1&__scp__=scp_1013e-2xjcwe8b-oqpvmj");
List<Row> datas = new List<Row>();
for (int i = 1; i <= 980; i++)
{client.SetBody(new { page = i, perPage = 10 });var result = await client.Post("/api/group/g_1013e-1x6fmdc9-1pyz2x/database/d_1013e-akrxglq5-kej8q1/getTableData?table=medical_list&_replace=1");var data = ((JToken)result.Body)["data"];if (data != null){var row = data["rows"];if (row != null){var items = row.ToObject<List<Row>>();datas.AddRange(items);System.Threading.Thread.Sleep(200);}}}
由于接口的调用需要一些访问凭证信息,这些信息可以通过浏览器访问获取得到。
只需要在浏览器访问一下就可以获取到详细的cookie和token信息,把这些信息添加到http的头。配置好之后所有工作就简单多了,在调用Post后根据自己需要解释对应Body的数据即可。由于平台没做调用限制,没过几分钟900多页的数据就成功获取下来
BeetleX
开源跨平台通讯框架(支持TLS)
提供高性能服务和大数据处理解决方案
https://beetlex.io