首页 » 技术分享 » 爬虫——记一次奇妙的异步请求爬取

爬虫——记一次奇妙的异步请求爬取

 

公司的需求,爬取某某查的企业信息,为防止律师函,全文不提该友站名称。这篇文章主要讲的是一个反反爬的思路,初学者爬数据过程中遇到问题也别慌,开发过程中没有灵异事件,所有奇怪的结果都是有原因的,塌下心来一步步捋顺,毕竟反爬的措施也是人写的。也许这篇文章比较长,但是也别怕,图片较多。

获取异步响应的数据是很常见的事,为什么把这次单拿出来分享呢,听我娓娓道来。

某查的反爬真是酸爽,这次我遇到的反爬过程是这样的:

1.通过开发者工具找到目标数据的请求地址,通过requests构造请求我用携带了cookie的session对象去请求;

2.发现和浏览器响应的数据不一致;

3.检查了一下发现浏览器中的这次请求cookie和本次回话其他请求的cookie有差异,说明js从中作祟没跑了,于是去找js;

4.js果然执行过程中篡改了cookie中的某几个值,但不是随便修改,而是在目标请求之前先发送了一个获取最新cookie的请求;

5.又去看了一下最新cookie的请求的响应内容,结果是一个全是数字的列表;

6.继续读js,原来是后端传了一些数字,前端通过fromCharCode()方法转为字符串,字符串为js内容,从中提取最新的cookie值

下面通过图文展示一下

1.下图为某查的前端页面,我需要“企业图谱”中的详情内容

 2.点击“查看详情”,(实际上下面的页面是有某查logo的背景图的,我在前端删掉了 XD

 3.开发者工具中,找到以上数据的请求

 4.去尝试模拟这个请求,不出意外403了

import requests


# 为了方便,直接从开发者工具复制下来cookie模拟登录
cookie = '_ga=GA1.2.1955355838.1558056696; _gid=GA1.2.54470962.1558056696; bannerFlag=undefined; _rutm=d46be5d8a6024caf8e35921087eade99; rtoken=ae0769f415f4409c8165f231f1231032; Hm_lpvt_e92c8d65d92d534b0fc290df538b4758=1558059476; Hm_lvt_e92c8d65d92d534b0fc290df538b4758=1558056696; auth_token=eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiIxODc0NTY4ODY1NSIsImlhdCI6MTU1ODA1OTQ2OSwiZXhwIjoxNTg5NTk1NDY5fQ.iJxAQ0jZi1biugR51cUkFcwsDQoz-Iv0bROVMpPD-TVbmXwCiBfvLpnobjfQ0EokimPoztzdr1mz2zm7CZ7n-w; tyc-user-info=%257B%2522claimEditPoint%2522%253A%25220%2522%252C%2522myAnswerCount%2522%253A%25220%2522%252C%2522myQuestionCount%2522%253A%25220%2522%252C%2522signUp%2522%253A%25220%2522%252C%2522explainPoint%2522%253A%25220%2522%252C%2522privateMessagePointWeb%2522%253A%25220%2522%252C%2522nickname%2522%253A%2522%25E9%2583%25AD%25E9%259D%2596%2522%252C%2522integrity%2522%253A%25220%2525%2522%252C%2522privateMessagePoint%2522%253A%25220%2522%252C%2522state%2522%253A%25220%2522%252C%2522announcementPoint%2522%253A%25220%2522%252C%2522isClaim%2522%253A%25220%2522%252C%2522vipManager%2522%253A%25220%2522%252C%2522discussCommendCount%2522%253A%25221%2522%252C%2522monitorUnreadCount%2522%253A%2522191%2522%252C%2522onum%2522%253A%25220%2522%252C%2522claimPoint%2522%253A%25220%2522%252C%2522token%2522%253A%2522eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiIxODc0NTY4ODY1NSIsImlhdCI6MTU1ODA1OTQ2OSwiZXhwIjoxNTg5NTk1NDY5fQ.iJxAQ0jZi1biugR51cUkFcwsDQoz-Iv0bROVMpPD-TVbmXwCiBfvLpnobjfQ0EokimPoztzdr1mz2zm7CZ7n-w%2522%252C%2522pleaseAnswerCount%2522%253A%25221%2522%252C%2522redPoint%2522%253A%25220%2522%252C%2522bizCardUnread%2522%253A%25220%2522%252C%2522vnum%2522%253A%25220%2522%252C%2522mobile%2522%253A%252218745688655%2522%257D; _gat_gtag_UA_123487620_1=1; RTYCID=cb7b8a8a10fb4057907f2b50dd0cd778; TYCID=ee79e4001de611e9bf3d3ff6d3a1d2e4; ssuid=2068339864; undefined=ee79e4001de611e9bf3d3ff6d3a1d2e4'
# 转成dict
cookies = dict([l.split("=", 1) for l in cookie.split("; ")])
# 请求头安排的明明白白,某查就是这个尿性,新手都会的模拟请求头他们也不放过
headers = {
    'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1 Safari/605.1.15'
}
resp = requests.get('https://dis.tianyancha.com/dis/enterpriseMap.json?id=24416401',cookies=cookies,headers=headers).text
print(resp)

’‘’
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head><title>403 Forbidden</title></head>
<body bgcolor="white">
<h1>403 Forbidden</h1>
<p>You don't have permission to access the URL on this server. Sorry for the inconvenience.<br/>
Please report this message and include the following information to us.<br/>
Thank you very much!</p>
<table>
<tr>
<td>URL:</td>
<td>https://dis.tianyancha.com/dis/enterpriseMap.json?id=24416401</td>
</tr>
<tr>
<td>Server:</td>
<td>iz2zef8sue94bxg3w0librz</td>
</tr>
<tr>
<td>Date:</td>
<td>2019/05/17 10:29:25</td>
</tr>
</table>
<hr/>Powered by Tengine</body>
</html>
‘’‘

5.根据经验来讲,这种情况肯定是被服务端做了判断为不合法的请求了,既然流程没问题,那线索就出在cookie上了,对比了一下这个请求的cookie和这次回话的其他请求的cookie,果然不同。(Chrome插件原因提示Provisional headers are shown看不到cookie,我在Safari上对比了cookie)

 6.一次回话竟然在最后改了cookie,那就去js中找,定位到了目标js,找到相关代码:

exports.pre_relation_company = exports.getEnterpriseMap = void 0;
    var _regenerator = __webpack_require__(108),
    _regenerator2 = _interopRequireDefault(_regenerator),
    _asyncToGenerator2 = __webpack_require__(111),
    _asyncToGenerator3 = _interopRequireDefault(_asyncToGenerator2),
    _promise = __webpack_require__(51),
    _promise2 = _interopRequireDefault(_promise),
    getEnterpriseMap = exports.getEnterpriseMap = function() {
        var _ref = (0, _asyncToGenerator3.
    default)(_regenerator2.
    default.mark(function _callee(nodeId) {
            var res, data, arr, fnStr, i, fxck, fxckStr;
            return _regenerator2.
        default.wrap(function _callee$(_context) {
                for (;;) switch (_context.prev = _context.next) {
                case 0:
                    return _context.next = 2,
                    pre_relation_company(nodeId);
                case 2:
                    for (res = _context.sent, data = res.data, arr = data.v.split(","), fnStr = "", i = 0; i < arr.length; i++) fnStr += String.fromCharCode(arr[i]);
                    if (eval(fnStr), window.$SoGou$ = (0, _ms2.default)(nodeId), window.wtf) {
                        for (fxck = window.wtf().split(","), fxckStr = "", i = 0; i < fxck.length; i++) fxckStr += window.$SoGou$[fxck[i]];
                        document.cookie = "_rutm=" + fxckStr + ";path=/;",
                        delete window.wtf
                    }
                    return _context.next = 12,
                    get("/dis/enterpriseMap.json", {
                        id: nodeId
                    });
                case 12:
                    return _context.abrupt("return", _context.sent);
                case 13:
                case "end":
                    return _context.stop()
                }
            },
            _callee, this)
        }));
        return function(t) {
            return _ref.apply(this, arguments)
        }
    } (),
    pre_relation_company = exports.pre_relation_company = function() {
        var t = (0, _asyncToGenerator3.
    default)(_regenerator2.
    default.mark(function t(e) {
            return _regenerator2.
        default.wrap(function(t) {
                for (;;) switch (t.prev = t.next) {
                case 0:
                    return t.next = 2,
                    get("/qq/" + e + ".json?random=" + Date.now());
                case 2:
                    return t.abrupt("return", t.sent);
                case 3:
                case "end":
                    return t.stop()
                }
            },
            t, this)
        }));
        return function(e) {
            return t.apply(this, arguments)
        }
    } (),

7.分析一下js吧:

①执行getEnterpriseMap,其中调用了pre_relation_company()参数为企业ID,这个ID是某查分配的唯一值,在企业信息的url上就能找到;

②pre_relation_company方法中做了一次get请求,代码'get("/qq/" + e + ".json?random=" + Date.now());',可以发现,这个请求就是这个响应:

③再往下看,fnStr为上图响应内容中data['v']每个元素调用String.fromCharCode()的拼接,为了详细理解js的意义,我去查了一下String.fromCharCode()方法的含义:将 Unicode 编码转为一个字符;python有对应方法chr(n % 256),这里我尝试着转了一下码:

l = [33,102,117,110,99,116,105,111,110,40,110,41,123,100,111,99,117,109,101,110,116,46,99,111,111,107,105,101,61,39,114,116,111,107,101,110,61,54,100,99,52,100,49,56,55,49,48,98,48,52,56,99,56,57,97,102,53,54,98,102,97,52,97,56,53,54,55,51,55,59,112,97,116,104,61,47,59,39,59,110,46,119,116,102,61,102,117,110,99,116,105,111,110,40,41,123,114,101,116,117,114,110,39,50,56,44,49,57,44,50,57,44,55,44,52,44,51,48,44,50,56,44,49,52,44,49,56,44,50,56,44,51,50,44,51,44,50,57,44,50,56,44,51,49,44,50,55,44,49,56,44,49,56,44,51,49,44,49,51,44,55,44,52,44,55,44,51,48,44,49,52,44,51,50,44,49,56,44,48,44,50,55,44,50,55,44,51,50,44,52,39,125,125,40,119,105,110,100,111,119,41,59]
s = ''
for i in l:
    s += chr(i %256)
print(s)
'''
!function(n){document.cookie='rtoken=6dc4d18710b048c89af56bfa4a856737;path=/;';n.wtf=function(){return'28,19,29,7,4,30,28,14,18,28,32,3,29,28,31,27,18,18,31,13,7,4,7,30,14,32,18,0,27,27,32,4'}}(window);
'''

④竟然转成了一个js代码,接下来eval(fnStr),熟悉Python的不陌生eval()这个方法,将字符串编译执行的意思,执行的结果显而易见,给cookie加一个rtoken的参数,并且声明了一个wtf的变量;

⑤在这里就没必要再往下看了,因为我在测试过程中发现加了rtoken就已经可以正常获取数据了

import requests
import time
import json
import re

# 为了方便,直接从开发者工具复制下来cookie模拟登录
cookie = 'bannerFlag=undefined; Hm_lpvt_e92c8d65d92d534b0fc290df538b4758=1558059480; Hm_lvt_e92c8d65d92d534b0fc290df538b4758=1558056696; _ga=GA1.2.1955355838.1558056696; _gid=GA1.2.54470962.1558056696; _rutm=d46be5d8a6024caf8e35921087eade99; rtoken=ae0769f415f4409c8165f231f1231032; auth_token=eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiIxODc0NTY4ODY1NSIsImlhdCI6MTU1ODA1OTQ2OSwiZXhwIjoxNTg5NTk1NDY5fQ.iJxAQ0jZi1biugR51cUkFcwsDQoz-Iv0bROVMpPD-TVbmXwCiBfvLpnobjfQ0EokimPoztzdr1mz2zm7CZ7n-w; tyc-user-info=%257B%2522claimEditPoint%2522%253A%25220%2522%252C%2522myAnswerCount%2522%253A%25220%2522%252C%2522myQuestionCount%2522%253A%25220%2522%252C%2522signUp%2522%253A%25220%2522%252C%2522explainPoint%2522%253A%25220%2522%252C%2522privateMessagePointWeb%2522%253A%25220%2522%252C%2522nickname%2522%253A%2522%25E9%2583%25AD%25E9%259D%2596%2522%252C%2522integrity%2522%253A%25220%2525%2522%252C%2522privateMessagePoint%2522%253A%25220%2522%252C%2522state%2522%253A%25220%2522%252C%2522announcementPoint%2522%253A%25220%2522%252C%2522isClaim%2522%253A%25220%2522%252C%2522vipManager%2522%253A%25220%2522%252C%2522discussCommendCount%2522%253A%25221%2522%252C%2522monitorUnreadCount%2522%253A%2522191%2522%252C%2522onum%2522%253A%25220%2522%252C%2522claimPoint%2522%253A%25220%2522%252C%2522token%2522%253A%2522eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiIxODc0NTY4ODY1NSIsImlhdCI6MTU1ODA1OTQ2OSwiZXhwIjoxNTg5NTk1NDY5fQ.iJxAQ0jZi1biugR51cUkFcwsDQoz-Iv0bROVMpPD-TVbmXwCiBfvLpnobjfQ0EokimPoztzdr1mz2zm7CZ7n-w%2522%252C%2522pleaseAnswerCount%2522%253A%25221%2522%252C%2522redPoint%2522%253A%25220%2522%252C%2522bizCardUnread%2522%253A%25220%2522%252C%2522vnum%2522%253A%25220%2522%252C%2522mobile%2522%253A%252218745688655%2522%257D; RTYCID=cb7b8a8a10fb4057907f2b50dd0cd778; TYCID=ee79e4001de611e9bf3d3ff6d3a1d2e4; ssuid=2068339864; undefined=ee79e4001de611e9bf3d3ff6d3a1d2e4'

# 转成dict
cookies = dict([l.split("=", 1) for l in cookie.split("; ")])
# 请求头安排的明明白白,某查就是这个尿性,新手都会的模拟请求头他们也不放过
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1 Safari/605.1.15'
}
# 第一步,先请求rtoken的值
url = 'https://dis.tianyancha.com/qq/24416401.json?random=' + str(int(time.time() * 1000))
resp = requests.get(url, cookies=cookies, headers=headers).text
print(resp)
'''
{"state":"ok","message":"","special":"","vipMessage":"","isLogin":0,"data":{"name":"24416401","uv":0,"pv":0,"v":"33,102,117,110,99,116,105,111,110,40,110,41,123,100,111,99,117,109,101,110,116,46,99,111,111,107,105,101,61,39,114,116,111,107,101,110,61,49,53,56,49,51,48,55,100,49,51,48,102,52,51,50,99,56,50,50,49,49,97,48,98,54,98,52,56,97,99,98,48,59,112,97,116,104,61,47,59,39,59,110,46,119,116,102,61,102,117,110,99,116,105,111,110,40,41,123,114,101,116,117,114,110,39,49,44,50,57,44,51,50,44,49,44,50,56,44,55,44,49,57,44,55,44,50,55,44,49,44,51,44,50,57,44,50,57,44,51,48,44,49,56,44,49,52,44,51,50,44,49,56,44,50,56,44,51,48,44,49,52,44,51,50,44,51,50,44,50,56,44,51,48,44,50,57,44,49,44,51,44,48,44,51,48,44,52,44,48,39,125,125,40,119,105,110,100,111,119,41,59"}}
'''

v = json.loads(resp)['data']['v']


# 转码
def fun(n):
    return chr(n % 256)


s = ''
for i in v.split(','):
    s += fun(int(i))
# 此时rtoken获取完成
rtoken = re.findall('rtoken=(.*?);', s)[0]
print(rtoken) # 57e01e89920f4768895098b99c58a59e
# 加入到cookies中
cookies['rtoken'] = rtoken
# 第二步,获取所需内容
resp = requests.get('https://dis.tianyancha.com/dis/enterpriseMap.json?id=24416401', cookies=cookies, headers=headers).text
print(resp)
'''
{"state":"ok","message":"","special":"","vipMessage":"","isLogin":0,"data":{"id":24416401,"staff":[{"id":2353360077,"typeJoin":"监事","type":2,"cid":24416401,"name":"李杰"},{"id":2019209279,"typeJoin":"董事长","type":2,"cid":24416401,"name":"梁华"},{"id":1979639199,"typeJoin":"监事","type":2,"cid":24416401,"name":"李健"},{"id":2041924268,"typeJoin":"董事","type":2,"cid":24416401,"name":"汪涛"},{"id":2082668154,"typeJoin":"监事","type":2,"cid":24416401,"name":"田峰"},{"id":2224718404,"typeJoin":"副董事长","type":2,"cid":24416401,"name":"郭平"},{"id":2260140428,"typeJoin":"董事","type":2,"cid":24416401,"name":"陈黎芳"},{"id":1756836232,"typeJoin":"董事","type":2,"cid":24416401,"name":"丁耘"},{"id":1942927204,"typeJoin":"副董事长","type":2,"cid":24416401,"name":"徐直军"},{"id":1884280345,"typeJoin":"副董事长","type":2,"cid":24416401,"name":"孟晚舟"},{"id":1941417178,"typeJoin":"董事","type":2,"cid":24416401,"name":"徐文伟"},{"id":2124715048,"typeJoin":"副董事长","type":2,"cid":24416401,"name":"胡厚崑"},{"id":1960510540,"typeJoin":"监事","type":2,"cid":24416401,"name":"易翔"},{"id":1775069157,"typeJoin":"监事","type":2,"cid":24416401,"name":"任树录"},{"id":1775134421,"typeJoin":"董事,经理","type":2,"cid":24416401,"name":"任正非"},{"id":1978908566,"typeJoin":"监事","type":2,"cid":24416401,"name":"李今歌"},{"id":1981952211,"typeJoin":"监事","type":2,"cid":24416401,"name":"李大丰"},{"id":1780306088,"typeJoin":"董事","type":2,"cid":24416401,"name":"何庭波"},{"id":1990566837,"typeJoin":"董事","type":2,"cid":24416401,"name":"李英涛"},{"id":1934905129,"typeJoin":"董事","type":2,"cid":24416401,"name":"彭中阳"},{"id":1871593745,"typeJoin":"董事","type":2,"cid":24416401,"name":"姚福海"},{"id":2261243733,"typeJoin":"董事","type":2,"cid":24416401,"name":"陶景文"},{"id":1785159340,"typeJoin":"董事","type":2,"cid":24416401,"name":"余承东"},{"id":1851919567,"typeJoin":"监事","type":2,"cid":24416401,"name":"周代琪"},{"id":1890637610,"typeJoin":"监事","type":2,"cid":24416401,"name":"宋柳平"},{"id":1899596362,"typeJoin":"监事","type":2,"cid":24416401,"name":"尹绪全"},{"id":2317588346,"typeJoin":"董事","type":2,"cid":24416401,"name":"阎力大"}],"holder":[{"id":17066311,"percent":"100.00%","type":1,"name":"华为投资控股有限公司"}],"historyHolder":[{"type":1,"id":2349747865,"name":"深圳市华为新技术有限公司工会委员会"},{"id":2109001043,"type":2,"cid":24416401,"name":"纪平"},{"type":1,"id":26231095,"name":"深圳市华为投资控股有限公司"},{"type":1,"id":2453598792,"name":"深圳市华为新技术股份有限公司"}],"name":"华为技术有限公司","branch":[{"type":1,"id":2351300838,"name":"华为技术有限公司东莞分公司"},{"type":1,"id":2334237855,"name":"华为技术有限公司哈尔滨分公司"},{"type":1,"id":2352787391,"name":"华为技术有限公司海口分公司"},{"type":1,"id":916214586,"name":"华为技术有限公司重庆分公司"},{"type":1,"id":2351300851,"name":"华为技术有限公司广州分公司"},{"type":1,"id":2315417851,"name":"华为技术有限公司上海分公司"},{"type":1,"id":1224702459,"name":"华为技术有限公司武汉研究所"},{"type":1,"id":360915771,"name":"华为技术有限公司银川办事处"},{"type":1,"id":7899774,"name":"华为技术有限公司北京分公司"},{"type":1,"id":1031138292,"name":"华为技术有限公司大连办事处"},{"type":1,"id":3258295862,"name":"华为技术有限公司重庆研究所"},{"type":1,"id":2322867554,"name":"华为技术有限公司长沙研究所"},{"type":1,"id":422242600,"name":"华为技术有限公司合肥办事处"},{"type":1,"id":444914357,"name":"华为技术有限公司杭州研究所"},{"type":1,"id":2333642370,"name":"华为技术有限公司成都研究所"},{"type":1,"id":2313046539,"name":"华为技术有限公司西安研究所"},{"type":1,"id":2311078035,"name":"华为技术有限公司济南工作处"},{"type":1,"id":2808586702,"name":"华为技术有限公司驻广州办事处"},{"type":1,"id":7904640,"name":"华为技术有限公司北京研究所"},{"type":1,"id":2348950155,"name":"华为技术有限公司南京研究所"},{"type":1,"id":2984269825,"name":"深圳市华为技术有限公司上海研究所"},{"type":1,"id":2514043895,"name":"深圳市华为技术有限公司烟台办事处"},{"type":1,"id":226459349,"name":"华为技术有限公司长春办事处"},{"type":1,"id":2808595696,"name":"深圳市华为技术有限公司天津办事处"},{"type":1,"id":2586102880,"name":"深圳市华为技术有限公司青岛办事处"},{"type":1,"id":2585269843,"name":"深圳市华为技术有限公司长沙办事处"},{"type":1,"id":2349747866,"name":"深圳市华为技术有限公司沈阳办事处"},{"type":1,"id":3187121512,"name":"深圳市华为技术有限公司杭州研究所"},{"type":1,"id":3187121519,"name":"深圳市华为技术有限公司南京研究所"},{"type":1,"id":3283679516,"name":"深圳市华为技术有限公司济南工作处"}],"inverst":[{"id":1481505221,"percent":"100.00%","type":1,"name":"华为数字技术(苏州)有限公司"},{"id":10736630,"percent":"100.00%","type":1,"name":"华为软件技术有限公司"},{"id":26044960,"percent":"100.00%","type":1,"name":"北京华为数字技术有限公司"},{"id":2325552661,"percent":"100.00%","type":1,"name":"深圳市海思半导体有限公司"},{"id":571706809,"percent":"100.00%","type":1,"name":"海思光电子有限公司"},{"id":2326017429,"percent":"100.00%","type":1,"name":"深圳力捷科技有限公司"},{"id":605816572,"percent":"100.00%","type":1,"name":"深圳市华为技术软件有限公司"},{"id":605819557,"percent":"100.00%","type":1,"name":"深圳市华为技术服务有限公司"},{"id":628453038,"percent":"100.00%","type":1,"name":"西安华为技术有限公司"},{"id":628458055,"percent":"100.00%","type":1,"name":"深圳市华为安捷信电气有限公司"},{"id":1114800056,"percent":"100.00%","type":1,"name":"上海华为技术有限公司"},{"id":1194491236,"percent":"100.00%","type":1,"name":"杭州华为数字技术有限公司"},{"id":1208288762,"percent":"100.00%","type":1,"name":"华为技术服务有限公司"},{"id":2983753263,"percent":"100.00%","type":1,"name":"东莞华为服务有限公司"},{"id":1255771345,"percent":"100.00%","type":1,"name":"杭州华为企业通信技术有限公司"},{"id":3202709139,"percent":"100.00%","type":1,"name":"上海海思技术有限公司"},{"id":1473222193,"percent":"100.00%","type":1,"name":"华为机器有限公司"},{"id":3286894224,"percent":"100.00%","type":1,"name":"华为(杭州)培训中心有限公司"},{"id":1481503614,"percent":"100.00%","type":1,"name":"成都华为技术有限公司"},{"id":100565680,"percent":"94.37%","type":1,"name":"北京北方华为通信技术有限公司"},{"id":2540785786,"percent":"93.74%","type":1,"name":"安徽华为通信技术有限责任公司"},{"id":2349747864,"percent":"89.99%","type":1,"name":"深圳市华为新技术有限公司"},{"id":2358818928,"percent":"83.23%","type":1,"name":"山东华为通信技术有限责任公司"},{"id":2453597089,"percent":"75.76%","type":1,"name":"深圳市华为集成电路设计有限公司"},{"id":2345867655,"percent":"71.00%","type":1,"name":"浙江华为通信技术有限公司"},{"id":2324084957,"percent":"51.98%","type":1,"name":"深圳市华为电气股份有限公司"},{"id":2412631259,"percent":"50.00%","type":1,"name":"上海华为信息技术有限公司"},{"id":818559653,"percent":"10.00%","type":1,"name":"贵州艾玛特信息超市项目开发有限公司"},{"id":3192961110,"percent":"7.05%","type":1,"name":"华为终端(深圳)有限公司"},{"id":2450711940,"percent":"6.00%","type":1,"name":"上海宇梦通信科技有限公司"},{"id":2321478908,"percent":"-","type":1,"name":"深圳市三维电器有限公司"},{"id":616428616,"percent":"-","type":1,"name":"中芯国际集成电路新技术研发(上海)有限公司"}],"historyLegal":[]}}
'''

大功告成~

这只是反反爬的一个思路,在各自的工作中还要考虑各自的情况,每天和某查勾心斗角也是真有意思,这就是爬虫的魅力吧。

转载自原文链接, 如需删除请联系管理员。

原文链接:爬虫——记一次奇妙的异步请求爬取,转载请注明来源!

0