如何用python写个脚本从OpenGrok上把代码下载到本地

最新推荐文章于 2025-07-23 15:32:02 发布

原创最新推荐文章于 2025-07-23 15:32:02 发布 · 1.4k 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#python #linux

linux 专栏收录该内容

9 篇文章

订阅专栏

本文介绍了一位开发者因网络问题不便使用在线代码浏览工具OpenGrok，决定自行编写Python脚本将代码下载到本地的故事。该脚本能从指定链接抓取代码并按目录结构保存，避免了网络不稳定带来的困扰。

部署运行你感兴趣的模型镜像

最近一直在opengrok上看代码，该工具搜索，查看调用等都很方便，不好是放在服务器上，有时候网络不好时，看起来很不方便。便想着把代码下载到本地，网络不好时在本地看。首先是网上找下载工具，都没有合适的。最后没办法只好自己动手写一个脚本，从网上找了个python脚本，根据我的情况修改了一下，运行良好。
代码如下，因为是在我的环境上运行的，所以对于脚本中有些提取目录和下载链接的关键词要根据你的情况来修改。

#!/usr/bin/python
import requests
import re
import os
import sys

def help(script):
    text = 'python3 %s  <link_address>  <path>' % script
    print(text)

def get_file(url,path):##文件下载函数
    content = requests.get(url).text
    file_str1 = "class=\"p\""
    if file_str1 in content:
		#Get download link from html, you should change the key word according to your html 
        sub_url = re.findall('class="p".*?href="(/source/download.*?)"',content)
        length = len(sub_url)
        print("\033[31m======Total %s files, will download=======\033[0m" %length)
        for sub_path in sub_url:
            cpath = sub_path[sub_path.rfind('/Qualcomm/'):]
            fileName = sub_path.split('/')[-1]
            print("downloading  ->  %-60s" %(fileName), end=" ")
            dir_path = path + cpath
			#To check whether file is exist or not, you can delete this if you want download all file again
            if os.path.isfile(dir_path):
                print("\033[33mFile is exists already, ignoring!\033[0m")
                continue
			#it can not extract IP or domain from html, so we get it from input URL
            domainName = url[0:url.rfind('/source/xref/Qualcomm')]
            res = requests.get(domainName+sub_path)
            res.raise_for_status()     # 确保程序在下载失败时停止
            playFile = open(path+cpath, 'wb')
            for chunk in res.iter_content(100000):
                playFile.write(chunk)
            playFile.close()
            print("\033[32mDone\033[0m")

def get_dir(url,path): #文件夹处理逻辑
    content = requests.get(url).text
    dir_str1 = "class=\"r\""  #directory mark in html 
    file_str1 = "class=\"p\""  #single file mark in html
    if dir_str1 in content:
        sub_url = re.findall('class="r".*?href="(/source/history.*?)"',content)
        for sub_path in sub_url:
            path_slice=sub_path[sub_path.rfind('/Qualcomm/'):]
            if not os.path.exists(path+path_slice):
                print("will create directory %s" %(path+path_slice))
                os.makedirs(path+path_slice)
            i = sub_path.split('/')[-1]
            get_dir(url+"/"+i,".")
    if file_str1 in content:
        get_file(url,path)
if __name__ == '__main__':
    if (len(sys.argv) < 2):
        help(sys.argv[0])
        exit(0)
    else:
        get_dir(sys.argv[1],".")

您可能感兴趣的与本文相关的镜像

Python3.10

Conda

Python

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本