文章目录
- 前言
- 源码分析
- dirpro.py
- start.py
- backup.py
- rely.py
- results.py
- end.py
前言
工具简介
dirpro 是一款由 python 编写的目录扫描器专业版,操作简单,功能强大,高度自动化
自动根据返回状态码和返回长度,对扫描结果进行二次整理和判断,准确性非常高
项目地址
项目已在github开源,求个star嘻嘻嘻
https://github.com/coleak2021/dirpro
已实现功能
- 可自定义扫描线程
- 导入url文件进行批量扫描并分别保存结果
- 状态码429检测,自动退出程序并提示设置更小的线程
- 每扫描10%自动显示扫描进度
- 可自定义扫描字典文件
- 可自定义代理流量
- 自动使用随机的User-Agent
- 自动规范输入的目标url格式,根据输入的url动态生成敏感目录
- 强大的默认字典top10000
- 自动根据返回状态码和返回长度对扫描结果进行二次整理和判断
- 扫描结果自动生成 域名+时间 格式的防同名文件名 并保存到scan_result目录中
扫描参数
options:-h, --help show this help message and exit-u U url-t T thread:default=30-w W dirfile path-a A proxy,such as 127.0.0.1:7890-f F urlfile,urls in the file-b fastly to find backup files and sensitive files
源码分析
目录结构
dirpro.py
项目入口,接受传入的参数,并调用函数进行后续操作
if not args.f:rooturl = args.u.strip('/')(time1,ret)=__start(args,rooturl)__end(rooturl,time1,ret)else:urlfile=open(args.f, 'r')urls = urlfile.read().splitlines()for rooturl in urls:rooturl = rooturl.strip('/')(time1,ret) = __start(args, rooturl)__end(rooturl,time1,ret)
判断是否传入url文件,初始化处理掉url末尾的/,调用__start(args,rooturl)
返回(time1,ret),然后调用__end(rooturl,time1,ret)
对扫描结果进行处理
start.py
sem = threading.Semaphore(args.t)urlList = []urlList.extend(searchFiles(rooturl))
限制线程的最大数,清空urlList(防止多url文件扫描时前面生成的urllist影响后续url扫描),调用searchFiles(rooturl)生成敏感目录并将结果加入到urllist中
if args.a:proxies['http'] = f"http://{args.a}"proxies['https'] = f"http://{args.a}"
判断是否加入代理
if args.b:sem = threading.Semaphore(5)searchdir(urlList,sem,rooturl)else:if not args.w:defaultword = './wordlist/default'else:defaultword = args.wf = open(defaultword, 'r')files = f.read().splitlines()for file in files:urlList.append(f'{rooturl}/{file}')f.close()searchdir(urlList,sem,rooturl)return (time_1,ret)
判断扫描方式是快速扫描还是普通扫描,快速扫描需要设置小的线程(快速扫描自带的字典比较小),普通扫描判断是否传入字典文件,将字典中的dir加载到url中,调用 searchdir(urlList,sem,rooturl),最后返回(time_1,ret)
backup.py
searchFiles(rooturl)生成敏感目录并将结果加入到urllist中
for file in FILE_LIST:urlList.append(f'{rootUrl}/{file}')urlList.append(f'{rootUrl}/{file}.bak')urlList.append(f'{rootUrl}/{file}~')urlList.append(f'{rootUrl}/{file}.swp')urlList.append(f'{rootUrl}/.{file}.swp')urlList.append(f'{rootUrl}/.{file}.un~')
加入备份文件目录
SOURCE_LIST = ['.svn', '.svn/wc.db', '.svn/entries', # svn'.git/', '.git/HEAD', '.git/index', '.git/config', '.git/description', '.gitignore' # git'.hg/', # hg'CVS/', 'CVS/Root', 'CVS/Entries', # cvs'.bzr', # bzr'WEB-INF/web.xml', 'WEB-INF/src/', 'WEB-INF/classes', 'WEB-INF/lib', 'WEB-INF/database.propertie', # java'.DS_Store', # macos'README', 'README.md', 'README.MD', # readme'_viminfo', '.viminfo', # vim'.bash_history','.htaccess']for source in SOURCE_LIST:urlList.append(f'{rootUrl}/{source}')
加入源代码文件目录
suffixList = ['.rar','.zip','.tar','.tar.gz', '.7z']keyList = ['www','wwwroot','site','web','website','backup','data','mdb','WWW','新建文件夹','ceshi','databak','db','database','sql','bf','备份','1','2','11','111','a','123','test','admin','app','bbs','htdocs','wangzhan']num1 = rootUrl.find('.')num2 = rootUrl.find('.', num1 + 1)keyList.append(rootUrl[num1 + 1:num2])for key in keyList:for suff in suffixList:urlList.append(f'{rootUrl}/{key}{suff}')
加入压缩文件目录
rely.py
扫描功能集中在这个文件
def __random_agent():user_agent_list = [{'User-Agent': 'Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; FDM; SV1; .NET CLR 3.0.04506.30)'},{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'},
......]return random.choice(user_agent_list)
使用随机user_agent
def searchdir(urlList,sem,rooturl):global dglobal _semglobal _listd=0_sem=semthread_array = []n=len(urlList)k=int (n/10)for i in range(1,10):_list.append(k*i)print(f"[*]开始扫描{rooturl}")for i in urlList:t = Thread(target= __get,args=(i,))thread_array.append(t)t.start()for t in thread_array:t.join()
_list
存放进度条信息,for循环将调用get方法加入线程列表,通过t.join()设置除非子线程全部运行完毕,否则主线程一直挂起
def __get(url):count = 0global dwith _sem:while count < 3:try:r = requests.get(url,headers=__random_agent(),proxies=proxies)except:count += 1continuebreak#判断请求是否成功if count >= 3:print(f'visit failed:{url}')returnl=len(r.text)if r.status_code != 404 and r.status_code != 429:log = f'{r.status_code:<6}{l:<7}{url}'print(log)elif r.status_code == 429:print('Too Many Requests 429 so that Request terminated,please Set up smaller threads')os._exit(0)d += 1if d in _list:print(f"[*]已经扫描{(_list.index(d)+1)*10}%")# 添加到retret.append({'status_code': r.status_code,'length': l,'url': url})
with _sem
相当于 sem.acquire(), sem.release()
锁定信号的变量sem在线程内阻塞,等待前面的线程执行结束。就是说实际上有多少任务就会开多少线程,只是超过限制的部分线程在线程内阻塞
os._exit(0)
这里判断返回码出现429则退出整个程序,如果用exit()则只能退出子线程
results.py
对扫描结果进行二次整理和判断
t=f"./scan_result/{rooturl.split('//')[1].replace(':', '')}{int (time.time())}"try:f = open(t, 'w',encoding="utf-8")except:f = open(f"{int (time.time())}", 'w',encoding="utf-8")
设置保存扫描结果的文件名
for result in ret:statusCode = result['status_code']length = result['length']statusCodeMap[statusCode] = statusCodeMap.get(statusCode, 0) + 1lenMap[length] = lenMap.get(length, 0) + 1
统计返回长度和状态码的个数
for result in ret:if result['length'] != maxLength:__log(f'{result["status_code"]:<6}{result["length"]:<7}{result["url"]}')f.close()return t
打印异常的状态码和长度对于的url
end.py
result = __Results(rooturl,ret)time2 = time.time()print("总共花费: ", time2 - time1, "秒,", f"结果保存在{result}")ret.clear()
将ret清空以免影响后续的扫描