英雄联盟Python爬虫
英雄联盟Python爬虫
英雄主界面qqhttps:lol。qq。comdatainfoheros。shtml
1。英雄爬取
https:lol。qq。comdatainfoheros。shtml
get方法获取指定英雄信息。
https:lol。qq。comdatainfoheros。shtml?idxxx
idxxx
2。JS获取所有英雄信息importjsonimportrequestsfromfakerimportFactoryfrombs4importBeautifulSoupfFactory。create()defgetallheros():urlhttps:game。gtimg。cnimageslolactimgjsheroListherolist。jsheaders{useragent:f。useragent()}rrequests。get(url,headersheaders)r。encodingr。apparentencodingcr。textljson。loads(c)〔hero〕foriinl〔:50〕:print(ID:{0}姓名:{1}别名:{2}。format(i〔heroId〕,i〔name〕,i〔alias〕))ifnamemain:getallheros()
效果:
3。爬取比赛数据第一个LOL网页爬取
http:www。wanplus。comlolplayerstats
用到了csrftoken,post请求需要携带setcookies中的csrftoken即可。importjsonimporttimeimportrequestsfromfakerimportFactoryfromurllibimportparsefFactory。create()defgettoken():urlhttp:www。wanplus。comlolplayerstatsheaders{useragent:f。useragent(),Referer:http:www。wanplus。comlolteamstats,Host:www。wanplus。com,}rrequests。get(url,headersheaders,allowredirectsFalse)r。encodingr。apparentencodingcr。cookiesr。close()myCookiesc。getdict()print(myCookies)returnstr(int(c。get(wanpluscsrf)〔9:〕)int(16777216)),myCookiesdefgetcompetition():urlhttp:www。wanplus。comajaxstatslisttoken,myCookiesgettoken()headers{useragent:f。useragent(),ContentType:applicationxwwwformurlencoded;charsetUTF8,Host:www。wanplus。com,Origin:http:www。wanplus。com,Referer:http:www。wanplus。comlolplayerstats,XCSRFToken:token,XRequestedWith:XMLHttpRequest,}formdata{gtk:token,draw:1,columns〔0〕〔data〕:order,columns〔0〕〔name〕:,columns〔0〕〔searchable〕:true,columns〔0〕〔orderable〕:false,columns〔0〕〔search〕〔value〕:,columns〔0〕〔search〕〔regex〕:false,columns〔1〕〔data〕:playername,columns〔1〕〔name〕:,columns〔1〕〔searchable〕:true,columns〔1〕〔orderable〕:false,columns〔1〕〔search〕〔value〕:,columns〔1〕〔search〕〔regex〕:false,columns〔2〕〔data〕:teamname,columns〔2〕〔name〕:,columns〔2〕〔searchable〕:true,columns〔2〕〔orderable〕:false,columns〔2〕〔search〕〔value〕:,columns〔2〕〔search〕〔regex〕:false,columns〔3〕〔data〕:meta,columns〔3〕〔name〕:,columns〔3〕〔searchable〕:true,columns〔3〕〔orderable〕:false,columns〔3〕〔search〕〔value〕:,columns〔3〕〔search〕〔regex〕:false,columns〔4〕〔data〕:appearedTimes,columns〔4〕〔name〕:,columns〔4〕〔searchable〕:true,columns〔4〕〔orderable〕:true,columns〔4〕〔search〕〔value〕:,columns〔4〕〔search〕〔regex〕:false,columns〔5〕〔data〕:kda,columns〔5〕〔name〕:,columns〔5〕〔searchable〕:true,columns〔5〕〔orderable〕:true,columns〔5〕〔search〕〔value〕:,columns〔5〕〔search〕〔regex〕:false,columns〔6〕〔data〕:attendrate,columns〔6〕〔name〕:,columns〔6〕〔searchable〕:true,columns〔6〕〔orderable〕:true,columns〔6〕〔search〕〔value〕:,columns〔6〕〔search〕〔regex〕:false,columns〔7〕〔data〕:killsPergame,columns〔7〕〔name〕:,columns〔7〕〔searchable〕:true,columns〔7〕〔orderable〕:true,columns〔7〕〔search〕〔value〕:,columns〔7〕〔search〕〔regex〕:false,columns〔8〕〔data〕:mostkills,columns〔8〕〔name〕:,columns〔8〕〔searchable〕:true,columns〔8〕〔orderable〕:true,columns〔8〕〔search〕〔value〕:,columns〔8〕〔search〕〔regex〕:false,columns〔9〕〔data〕:deathsPergame,columns〔9〕〔name〕:,columns〔9〕〔searchable〕:true,columns〔9〕〔orderable〕:true,columns〔9〕〔search〕〔value〕:,columns〔9〕〔search〕〔regex〕:false,columns〔10〕〔data〕:mostdeaths,columns〔10〕〔name〕:,columns〔10〕〔searchable〕:true,columns〔10〕〔orderable〕:true,columns〔10〕〔search〕〔value〕:,columns〔10〕〔search〕〔regex〕:false,columns〔11〕〔data〕:assistsPergame,columns〔11〕〔name〕:,columns〔11〕〔searchable〕:true,columns〔11〕〔orderable〕:true,columns〔11〕〔search〕〔value〕:,columns〔11〕〔search〕〔regex〕:false,columns〔12〕〔data〕:mostassists,columns〔12〕〔name〕:,columns〔12〕〔searchable〕:true,columns〔12〕〔orderable〕:true,columns〔12〕〔search〕〔value〕:,columns〔12〕〔search〕〔regex〕:false,columns〔13〕〔data〕:goldsPermin,columns〔13〕〔name〕:,columns〔13〕〔searchable〕:true,columns〔13〕〔orderable〕:true,columns〔13〕〔search〕〔value〕:,columns〔13〕〔search〕〔regex〕:false,columns〔14〕〔data〕:lasthitPermin,columns〔14〕〔name〕:,columns〔14〕〔searchable〕:true,columns〔14〕〔orderable〕:true,columns〔14〕〔search〕〔value〕:,columns〔14〕〔search〕〔regex〕:false,columns〔15〕〔data〕:damagetoheroPermin,columns〔15〕〔name〕:,columns〔15〕〔searchable〕:true,columns〔15〕〔orderable〕:true,columns〔15〕〔search〕〔value〕:,columns〔15〕〔search〕〔regex〕:false,columns〔16〕〔data〕:damagetoheroPercent,columns〔16〕〔name〕:,columns〔16〕〔searchable〕:true,columns〔16〕〔orderable〕:true,columns〔16〕〔search〕〔value〕:,columns〔16〕〔search〕〔regex〕:false,columns〔17〕〔data〕:damagetakenPermin,columns〔17〕〔name〕:,columns〔17〕〔searchable〕:true,columns〔17〕〔orderable〕:true,columns〔17〕〔search〕〔value〕:,columns〔17〕〔search〕〔regex〕:false,columns〔18〕〔data〕:damagetakenPercent,columns〔18〕〔name〕:,columns〔18〕〔searchable〕:true,columns〔18〕〔orderable〕:true,columns〔18〕〔search〕〔value〕:,columns〔18〕〔search〕〔regex〕:false,columns〔19〕〔data〕:wardsplacedPermin,columns〔19〕〔name〕:,columns〔19〕〔searchable〕:true,columns〔19〕〔orderable〕:true,columns〔19〕〔search〕〔value〕:,columns〔19〕〔search〕〔regex〕:false,columns〔20〕〔data〕:wardskilledPermin,columns〔20〕〔name〕:,columns〔20〕〔searchable〕:true,columns〔20〕〔orderable〕:true,columns〔20〕〔search〕〔value〕:,columns〔20〕〔search〕〔regex〕:false,order〔0〕〔column〕:4,order〔0〕〔dir〕:desc,start:0,length:20,search〔value〕:,search〔regex〕:false,area:,eid:1065,type:player,gametype:2,filter:{team:{},player:{},meta:{}},}字典转换为k1v1k2v2dataparse。urlencode(formdata)print(data)rrequests。post(url,cookiesmyCookies,datadata,headersheaders,allowredirectsFalse)r。encodingr。apparentencodingcr。textprint(11111内容如下:)iflen(c)100:print(获取失败,重新获取!)returnFalseprint(获取成功!)ljson。loads(c)〔data〕foriinl〔:20〕:print(队伍编号:{0}队伍名:{1}玩家名称:{2}。format(〔teamid〕,i〔teamname〕,i〔playername〕))returnTruedefcookietodic(mycookie):dic{}foriinmycookie。split(;):dic〔i。split()〔0〕〕i。split()〔1〕returndicifnamemain:while1:okgetcompetition()ifokisTrue:breaktest()
第二个LOL网页数据爬取
http:lol。admin。pentaq。com
没有任何反爬和csrftoken认证:fromfakerimportFactoryimportrequestsimportjsonfFactory。create()deffun():urlhttp:lol。admin。pentaq。comapitournamentteamdata?tour29patchheaders{useragent:f。useragent()}rrequests。get(url,headersheaders)r。encodingr。apparentencodingcr。textr。close()ljson。loads(c)〔data〕〔teamsdata〕foriinl〔:20〕:print(队伍名称:{0}队伍ID:{1}win:{2}。format(i〔teamfullname〕,i〔teamid〕,i〔win〕))ifnamemain:fun()
第三个LOL网页数据爬取
http:www。op。ggchampionstatistics
采用BeautifulSoup即可。fromfakerimportFactoryimportrequestsfrombs4importBeautifulSoupfFactory。create()deffun():urlhttp:www。op。ggchampionstatisticsheaders{useragent:f。useragent(),AcceptLanguage:zhCN,zh;q0。9,en;q0。8}rrequests。get(url,headersheaders)r。encodingr。apparentencodingifr。statuscode!200:returnFalsecr。textr。close()print(c)iflen(c)10000:returnFalsehtmlBeautifulSoup(c,html。parser)lhtml。find(tbody,classtabItemchampiontrendtierTOP)。findall(tr)forxinl〔:5〕:ax。findall(td)tmpa〔3〕btmp。findall(p)nameb〔0〕。textposb〔1〕。text。replace(,)。replace(,)print(rank:{0}name:{1}pos:{2}胜率:{3}登场率:{4}。format(a〔0〕。text,name,pos,a〔4〕。text,a〔5〕。text))returnTrueforcinl〔:20〕:ac。findall(td)tmpa〔3〕btmp。findall(p)nameb〔0〕。textposb〔1〕。textprint(rank:{0〕name:{1}pos:{2}胜率:{3}登场率:{4}。format(a〔0〕。text,name,pos,a〔4〕。text,a〔5〕。text))ifnamemain:whileTrue:okfun()ifok:break4。多线程爬取LOL英雄皮肤图片
1。获取对应英雄url列表,函数geturllist()
2。下载对应的图片保存到文件夹download()
3。main()开启多线程执行爬取任务importrequestsimportjsonimportosfromfakerimportFactoryfrommultiprocessing。dummyimportPoolasThreadPoolimporttimefFactory。create()headers{useragent:f。useragent()}defgeturllist():urlhttps:game。gtimg。cnimageslolactimgjsheroListherolist。jsrrequests。get(url,headersheaders)r。encodingr。apparentencodingcr。textHerosjson。loads(c)〔hero〕156个hero信息idList〔〕forheroinHeros:heroidhero〔heroId〕idList。append(heroid)print(idList)defspider(url):rrequests。get(url,headersheaders)r。encodingr。apparentencodingcr。textr。close()resdictjson。loads(c)skinsresdict〔skins〕15个hero信息forindex,heroinenumerate(skins):这里使用到enumerate获取下标,以便文件图片命名;item{}字典对象item〔name〕hero〔heroName〕item〔skinname〕hero〔name〕ifhero〔mainImg〕:continueitem〔imgLink〕hero〔mainImg〕print(item)download(index1,item)defdownload(index,contdict):namecontdict〔name〕path皮肤nameifnotos。path。exists(path):os。makedirs(path)contentrequests。get(contdict〔imgLink〕,headersheaders)。contentwithopen(。皮肤namecontdict〔skinname〕str(index)。jpg,wb)asf:f。write(content)defmain():starttime。time()poolThreadPool(6)page〔〕foriinrange(1,11):newpagehttps:game。gtimg。cnimageslolactimgjshero{}。js。format(i)print(newpage)page。append(newpage)resultpool。map(spider,page)pool。close()pool。join()endtime。time()print(用时:,endstart)ifnamemain:main()