非禁用validateRequest=false使用Page_Error()错误处理

本文介绍ASP.NET中的XSS(跨站脚本攻击)防护机制,包括默认的安全特性、如何正确处理HttpRequestValidationException异常及富文本场景下的安全策略。
     ASP.Net 1.1后引入了对提交表单自动检查是否存在XSS(跨站脚本攻击)的能力。当用户试图用之类的输入影响页面返回结果的时候,ASP.Net的引擎会引发一 个 HttpRequestValidationExceptioin。默认情况下会返回如下文字的页面:

以下是引用片段:
Server Error in ''/YourApplicationPath'' Application

A potentially dangerous Request.Form value was detected from the client
(txtName="<b>").

Description: Request Validation has detected a potentially dangerous client input value, and processing of the request has been aborted. This value may indicate an attempt to compromise the security of your application, such as a cross-site scripting attack. You can disable request validation by setting validateRequest=false in the Page directive or in the configuration section. However, it is strongly recommended that your application explicitly check all inputs in this case.

Exception Details: System.Web.HttpRequestValidationException: A potentially dangerous Request.Form value was detected from the client (txtName="<b>").

....

  这是ASP.Net提供的一个很重要的安全特性。因为很多程序员对安全没有概念,甚至都不知道XSS这种攻击的存在,知道主动去防护的就更少了。ASP.Net在这一点上做到默认安全。这样让对安全不是很了解的程序员依旧可以写出有一定安全防护能力的网站。

  但是,当我Google搜索 HttpRequestValidationException 或者 "A potentially dangerous Request.Form value was detected from the client"的时候,惊奇的发现大部分人给出的解决方案竟然是在ASP.Net页面描述中通过设置 validateRequest=false 来禁用这个特性,而不去关心那个程序员的网站是否真的不需要这个特性。看得我这叫一个胆战心惊。安全意识应该时时刻刻在每一个程序员的心里,不管你对安全的概念了解多少,一个主动的意识在脑子里,你的站点就会安全很多。

  为什么很多程序员想要禁止 validateRequest 呢?有一部分是真的需要用户输入"<>"之类的字符。这就不必说了。还有一部分其实并不是用户允许输入那些容易引起XSS的字符,而是讨厌这 种报错的形式,毕竟一大段英文加上一个ASP.Net典型异常错误信息,显得这个站点出错了,而不是用户输入了非法的字符,可是自己又不知道怎么不让它报 错,自己来处理报错。

  对于希望很好的处理这个错误信息,而不使用默认ASP.Net异常报错信息的程序员们,你们不要禁用validateRequest=false。

  正确的做法是在你当前页面添加Page_Error()函数,来捕获所有页面处理过程中发生的而没有处理的异常。然后给用户一个合法的报错信 息。如果当前页面没有Page_Error(),这个异常将会送到Global.asax的Application_Error()来处理,你也可以在那 里写通用的异常报错处理函数。如果两个地方都没有写异常处理函数,才会显示这个默认的报错页面呢。

举例而言,处理这个异常其实只需要很简短的一小段代码就够了。在页面的Code-behind页面中加入这么一段代码:

    protected void Page_Error(object sender, EventArgs e)

    {

        Exception ex = Server.GetLastError();

        if (ex is HttpRequestValidationException)

        {

            Response.Write("请您输入合法字符串。");

            Server.ClearError(); // 如果不ClearError()这个异常会继续传到Application_Error()

        }

    }

  这样这个程序就可以截获 HttpRequestValidationException 异常,而且可以按照程序员的意愿返回一个合理的报错信息。

  这段代码很简单,所以我希望所有不是真的要允许用户输入之类字符的朋友,千万不要随意的禁止这个安全特性,如果只是需要异常处理,那么请用类似于上面的代码来处理即可。

  而对于那些通过 明确禁止了这个特性的程序员,自己一定要明白自己在做什么,而且一定要自己手动的检查必须过滤的字符串,否则你的站点很容易引发跨站脚本攻击。

  关于存在Rich Text Editor的页面应该如何处理?

  如果页面有富文本编辑器的控件的,那么必然会导致有类的HTML标签提交回来。在这种情况下,我们不得不将validateRequest="false"。那么安全性怎么处理?如何在这种情况下最大限度的预防跨站脚本攻击呢?

  根据微软的建议,我们应该采取安全上称为“默认禁止,显式允许”的策略。

  首先,我们将输入字符串用 HttpUtility.HtmlEncode()来编码,将其中的HTML标签彻底禁止。

  然后,我们再对我们所感兴趣的、并且是安全标签,通过Replace()进行替换。比如,我们希望有""标签,那么我们就将""显式的替换回""。

  示例代码如下:

    void submitBtn_Click(object sender, EventArgs e)

    {

        // 将输入字符串编码,这样所有的HTML标签都失效了。

        StringBuilder sb = new StringBuilder(

        HttpUtility.HtmlEncode(htmlInputTxt.Text));

        // 然后我们选择性的允许<b> <i>

        sb.Replace("&lt;b&gt;","<b>");

        sb.Replace("&lt;/b&gt;", "");

        sb.Replace("&lt;i&gt;", "<i>");

        sb.Replace("&lt;/i&gt;", "");

        Response.Write(sb.ToString());

    }

    这样我们即允许了部分HTML标签,又禁止了危险的标签。

  根据微软提供的建议,我们要慎重允许下列HTML标签,因为这些HTML标签都是有可能导致跨站脚本攻击的。

     # <applet>          # <body>            # <embed>      # <frame>
     # <script>           # <frameset>      # <html>        # <iframe>
     # <img>              # <style>              # <layer>       # <link>
     # <ilayer>           # <meta>             # <object>

  可能这里最让人不能理解的是<img>。但是,看过下列代码后,就应该明白其危险性了。

      以下是引用片段:

        <img src="javascript:alert('hello');">

        <img src="javascript:alert('hello');">

        <img src="javascript:alert('hello');">

  通过<img>标签是有可能导致javascript执行的,这样攻击者就可以做他想伪装的任何事情。

  关于<style>也是一样:

     以下是引用片段:

     <style TYPE="text/javascript">...

         alert('hello');

     </style>


app.py -- coding: utf-8 -- from flask import Flask, jsonify, request, render_template, redirect, url_for from flask_login import LoginManager,current_user from flask_wtf.csrf import CSRFProtect import logging import os import sys from datetime import datetime import warnings import urllib3 from urllib3.exceptions import InsecureRequestWarning from sqlalchemy import text import time import threading 禁用SSL警告 warnings.filterwarnings("ignore", category=InsecureRequestWarning) urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) 添加当前目录到Python路径 current_dir = os.path.dirname(os.path.abspath(file)) sys.path.insert(0,current_dir) from config import config, DevelopmentConfig from database.models import db,User, UserProxy, Proxy, SystemConfig from core.scheduler import SchedulerManager from web.auth import auth_bp from web.views import web_bp 配置日志 - 简化日志输出 logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[logging.StreamHandler(sys.stdout)] ) logger= logging.getLogger(name) login_manager = LoginManager() login_manager.login_view= 'auth.login' login_manager.login_message= '请先登录以访问此页面' csrf= CSRFProtect() 全局调度器实例 scheduler_manager = None _app_initialized= False _app_initializing= False _scheduler_started= False def get_database_uri(): """获取数据库URI字符串""" try: config_instance = DevelopmentConfig() return config_instance.SQLALCHEMY_DATABASE_URI except Exception as e: logger.error(f"获取数据库URI失败: {e}") return f"sqlite:///{os.path.join(current_dir, 'proxy_pool.db')}" def test_database_connection(): """测试数据库连接""" try: db.session.execute(text('SELECT 1')) return True except Exception as e: logger.warning(f"数据库连接异常: {e}") return False def check_first_run(app): """检查是否是首次运行""" try: db_uri = app.config.get('SQLALCHEMY_DATABASE_URI', '') def initialize_scheduler(app): """初始化并启动调度器""" global scheduler_manager, _scheduler_started def initialize_application(app): """应用初始化函数""" global _app_initialized, _app_initializing def create_app(config_name='default'): app = Flask(name) 命令行运行 - 修改启动方式 if name == 'main': app = create_app() utils/helpers.py import re from datetime import datetime,timedelta from urllib.parse import urlparse import ipaddress import os import logging from sqlalchemy import inspect,text logger = logging.getLogger(name) import pymysql import sqlite3 import logging from sqlalchemy import create_engine from sqlalchemy.exc import OperationalError 添加纯真IP查询库 import qqwry logger = logging.getLogger(name) def test_database_connection(db_config): """ 测试数据库连接是否可用 db_config: 数据库配置字典 返回: (成功与否, 错误信息) """ try: db_type = db_config.get('DB_TYPE', 'sqlite') def is_valid_ip(ip): """验证IP地址是否有效""" try: ipaddress.ip_address(ip) return True except ValueError: return False def is_valid_port(port): """验证端口是否有效""" try: port = int(port) return 1 <= port <= 65535 except ValueError: return False def is_valid_protocol(protocol): """验证协议是否有效""" return protocol.lower() in ['http', 'https', 'socks4', 'socks5'] def parse_proxy_string(proxy_str, default_protocol='http'): """解析代理字符串""" if not proxy_str or not isinstance(proxy_str, str): return None def format_proxy_string(proxy_dict): """格式化代理为字符串""" if not proxy_dict: return "" 初始化纯真IP查询对象(单例模式) _qqwry_reader = None def get_qqwry_reader(): """获取纯真IP查询读取器""" global _qqwry_reader if _qqwry_reader is None: try: # 自动下载或使用本地纯真IP数据库 _qqwry_reader = qqwry.QQwry() _qqwry_reader.load_file('qqwry.dat') # 默认使用当前目录的qqwry.dat文件 except Exception as e: logger.error(f"初始化纯真IP数据库失败: {e}") return None return _qqwry_reader def get_location_from_ip(ip): """根据IP获取地理位置(使用纯真IP查询)""" try: # 验证IP地址有效性 if not is_valid_ip(ip): return "无效IP" def calculate_next_run(interval_seconds): """计算下一次运行时间""" return datetime.now() + timedelta(seconds=interval_seconds) def format_timedelta(delta): """格式化时间间隔""" if not delta: return "从未" def humanize_time(dt): """人性化时间显示""" if not dt: return "从未" web/views.py from functools import wraps from flask import Blueprint,render_template, jsonify, request, flash, redirect, url_for, current_app from flask_login import login_required,current_user, login_user from datetime import datetime,timedelta import json import threading from time import sleep import logging import os import subprocess from sqlalchemy import inspect from database.models import db, Proxy, User, UserProxy, CrawlerRule, SystemConfig from web.forms import CrawlerRuleForm,UserForm, SystemConfigForm, ImportForm, FirstRunForm, DatabaseMigrationForm from core.crawler import ProxyCrawler from core.validator import ProxyValidator from utils.importer import ProxyImporter web_bp = Blueprint('web', name) logger= logging.getLogger(name) 添加模板过滤器 @web_bp.app_template_filter('safe_date') def safe_date_filter(value,format='%Y-%m-%d %H:%M'): """安全的日期格式化过滤器""" if value is None: return "从未" try: if hasattr(value, 'strftime'): return value.strftime(format) elif isinstance(value, str): # 尝试解析字符串日期 try: if 'T' in value: dt = datetime.fromisoformat(value.replace('Z', '+00:00')) else: dt = datetime.strptime(value, '%Y-%m-%d %H:%M:%S') return dt.strftime(format) except (ValueError, TypeError): return value else: return str(value) except (AttributeError, ValueError): return "无效日期" 检查是否需要首次运行 web/views.py 中的 check_first_run 函数需要修复 def check_first_run(): """更健壮的首次运行检查""" try: # 检查数据库连接是否正常 from sqlalchemy import text db.session.execute(text('SELECT 1')) 管理员权限装饰器 def admin_required(f): @wraps(f) def decorated_function(args, **kwargs): if not current_user.is_authenticated: flash('请先登录', 'danger') return redirect(url_for('auth.login')) if not current_user.is_admin: flash('需要管理员权限', 'danger') return redirect(url_for('web.dashboard')) return f(args, **kwargs) return decorated_function 全局变量用于跟踪验证进度 validation_progress = { 'running': False, 'total': 0, 'completed': 0, 'message': '' } 添加首次运行检查中间件 @web_bp.before_request def before_request(): # 排除静态文件和首次运行路由 if request.endpoint and ( request.endpoint.startswith('static') or request.endpoint in ['web.first_run', 'web.first_run_setup'] ): return web/views.py 中的首次运行路由 @web_bp.route('/first-run', methods=['GET']) def first_run(): # 使用新的检查方法 from utils.helpers import check_first_run @web_bp.route('/first-run/setup', methods=['POST']) def first_run_setup(): from utils.helpers import check_first_run 数据库迁移页 @web_bp.route('/database-migration', methods=['GET', 'POST']) @login_required @admin_required def database_migration(): migration_form = DatabaseMigrationForm() 修改设置页面路由,添加数据库迁移选项 @web_bp.route('/settings', methods=['GET', 'POST']) @login_required @admin_required def settings(): form = SystemConfigForm() @web_bp.route('/') @login_required def dashboard(): from database.models import Proxy, User, CrawlerRule, SystemConfig @web_bp.route('/proxies') @login_required def proxies(): page = request.args.get('page', 1, type=int) per_page = 20 @web_bp.route('/test-proxy/') @login_required def test_proxy(proxy_id): """测试单个代理""" proxy = Proxy.query.get_or_404(proxy_id) @web_bp.route('/edit-proxy/', methods=['POST']) @login_required @admin_required def edit_proxy(proxy_id): """编辑代理信息""" proxy = Proxy.query.get_or_404(proxy_id) @web_bp.route('/validate-all-proxies') @login_required @admin_required def validate_all_proxies(): """后台验证所有代理""" global validation_progress @web_bp.route('/get-validation-progress') @login_required def get_validation_progress(): """获取验证进度""" global validation_progress return jsonify(validation_progress) @web_bp.route('/my-proxies') @login_required def my_proxies(): user_proxies = current_user.proxies.join(Proxy).filter( UserProxy.is_active == True ).order_by(UserProxy.assigned_at.desc()).all() @web_bp.route('/get-proxy') @login_required def get_proxy(): """获取代理 - 添加到期时间检查""" # 检查用户是否到期 if not current_user.can_access_system(): return jsonify({'error': '您的账户已到期或已被禁用'}), 403 @web_bp.route('/proxy-info/') @login_required @admin_required def proxy_info(proxy_id): """获取代理信息""" proxy = Proxy.query.get_or_404(proxy_id) return jsonify(proxy.to_dict()) @web_bp.route('/crawlers', methods=['GET', 'POST']) @login_required @admin_required def crawlers(): form = CrawlerRuleForm() if form.validate_on_submit(): rule = CrawlerRule( name=form.name.data, url=form.url.data, pattern_type=form.pattern_type.data, pattern=form.pattern.data, protocol=form.protocol.data, interval=form.interval.data, is_active=form.is_active.data ) db.session.add(rule) db.session.commit() flash('Crawler rule added successfully') return redirect(url_for('web.crawlers')) @web_bp.route('/run-crawler/') @login_required @admin_required def run_crawler(rule_id): rule = CrawlerRule.query.get_or_404(rule_id) crawler = ProxyCrawler() proxies = crawler.crawl_rule(rule) new_count = crawler.save_proxies(proxies) @web_bp.route('/users', methods=['GET', 'POST']) @login_required @admin_required def users(): form = UserForm() @web_bp.route('/edit-user/') @login_required @admin_required def edit_user(user_id): """获取用户信息用于编辑""" user = User.query.get_or_404(user_id) @web_bp.route('/import', methods=['GET', 'POST']) @login_required @admin_required def import_proxies(): form = ImportForm() importer = ProxyImporter() @web_bp.route('/api/docs') @login_required def api_docs(): return render_template('api_docs.html') @web_bp.route('/replace-proxy/') @login_required def replace_proxy(proxy_id): """更换用户代理""" user_proxy = UserProxy.query.get_or_404(proxy_id) @web_bp.route('/release-proxy/') @login_required def release_proxy(proxy_id): """释放用户代理""" user_proxy = UserProxy.query.get_or_404(proxy_id) @web_bp.route('/delete-user/', methods=['DELETE']) @login_required @admin_required def delete_user(user_id): """删除用户""" if user_id == current_user.id: return jsonify({'error': '不能删除自己'}), 400 @web_bp.route('/reset-api-key/', methods=['POST']) @login_required @admin_required def reset_api_key(user_id): """重置用户API Key""" user = User.query.get_or_404(user_id) new_key = user.generate_api_key() db.session.commit() @web_bp.route('/users/add', methods=['POST']) @login_required @admin_required def add_user(): """添加新用户""" form = UserForm() if form.validate_on_submit(): try: # 检查用户名是否已存在 existing_user = User.query.filter_by(username=form.username.data).first() if existing_user: flash('用户名已存在', 'danger') return redirect(url_for('web.users')) @web_bp.route('/users/edit', methods=['POST']) @login_required @admin_required def edit_user_post(): """编辑用户信息""" form = UserForm() user_id = request.form.get('user_id') @web_bp.route('/users//edit') @login_required @admin_required def get_user_info(user_id): """获取用户信息用于编辑(API接口)""" user = User.query.get_or_404(user_id) @web_bp.route('/clear-invalid-proxies', methods=['POST']) @login_required @admin_required def clear_invalid_proxies(): """清理无效代理""" try: # 删除无效代理 invalid_proxies = Proxy.query.filter_by(is_valid=False).all() deleted_count = len(invalid_proxies) @web_bp.route('/test-crawler-rule', methods=['POST']) @login_required @admin_required def test_crawler_rule(): """测试抓取规则""" try: # 检查请求内容类型 if not request.is_json: return jsonify({ 'success': False, 'message': '请求必须是JSON格式', 'proxies': [], 'html_content': '', 'element_count': 0, 'valid_count': 0, 'raw_elements': [] }), 400 @web_bp.route('/crawler-detail/.json') @login_required @admin_required def crawler_detail_json(rule_id): """获取规则详情(JSON格式)""" rule = CrawlerRule.query.get_or_404(rule_id) return jsonify(rule.to_dict()) @web_bp.route('/crawler/') @login_required @admin_required def crawler_detail(rule_id): """查看抓取规则详情页面""" rule = CrawlerRule.query.get_or_404(rule_id) return render_template('crawler_detail.html', rule=rule) @web_bp.route('/toggle-crawler/') @login_required @admin_required def toggle_crawler(rule_id): """切换抓取规则状态""" rule = CrawlerRule.query.get_or_404(rule_id) rule.is_active = not rule.is_active db.session.commit() return jsonify({'success': True}) @web_bp.route('/edit-crawler/', methods=['GET', 'POST']) @login_required @admin_required def edit_crawler(rule_id): """编辑抓取规则""" rule = CrawlerRule.query.get_or_404(rule_id) form = CrawlerRuleForm(obj=rule) @web_bp.route('/delete-crawler/', methods=['DELETE']) @login_required @admin_required def delete_crawler(rule_id): """删除抓取规则""" try: rule = CrawlerRule.query.get_or_404(rule_id) db.session.delete(rule) db.session.commit() web/forms.py from flask_wtf import FlaskForm from wtforms import StringField,PasswordField, BooleanField, IntegerField, SelectField, TextAreaField, SubmitField from wtforms.validators import DataRequired,Email, Length, EqualTo, ValidationError, URL ,Optional from database.models import User 添加首次运行表单 class FirstRunForm(FlaskForm): db_type = SelectField('数据库类型', choices=[ ('sqlite', 'SQLite'), ('mysql', 'MySQL') ], default='sqlite', validators=[DataRequired()]) db_host = StringField('数据库主机', default='localhost', validators=[Optional()]) db_port = StringField('数据库端口', default='3306', validators=[Optional()]) db_name = StringField('数据库名称', default='proxy_pool', validators=[Optional()]) db_user = StringField('用户名', default='root', validators=[Optional()]) db_password = PasswordField('密码', validators=[Optional()]) admin_username = StringField('管理员用户名', validators=[DataRequired(), Length(1, 64)]) admin_password = PasswordField('管理员密码', validators=[DataRequired(), Length(6, 128)]) admin_password2 = PasswordField('确认密码', validators=[DataRequired(), EqualTo('admin_password')]) submit = SubmitField('保存配置') 添加数据库迁移表单 class DatabaseMigrationForm(FlaskForm): db_type = SelectField('目标数据库类型', choices=[ ('sqlite', 'SQLite'), ('mysql', 'MySQL') ], validators=[DataRequired()]) db_host = StringField('数据库主机', default='localhost', validators=[Optional()]) db_port = StringField('数据库端口', default='3306', validators=[Optional()]) db_name = StringField('数据库名称', default='proxy_pool', validators=[Optional()]) db_user = StringField('用户名', default='root', validators=[Optional()]) db_password = PasswordField('密码', validators=[Optional()]) submit = SubmitField('开始迁移') class LoginForm(FlaskForm): username = StringField('用户名', validators=[DataRequired(), Length(1, 64)]) password = PasswordField('密码', validators=[DataRequired()]) remember_me = BooleanField('记住我') submit = SubmitField('登录') class RegistrationForm(FlaskForm): username = StringField('用户名', validators=[DataRequired(), Length(1, 64)]) email = StringField('邮箱', validators=[DataRequired(), Email()]) password = PasswordField('密码', validators=[DataRequired(), Length(6, 128)]) password2 = PasswordField('确认密码', validators=[DataRequired(), EqualTo('password')]) submit = SubmitField('注册') class CrawlerRuleForm(FlaskForm): name = StringField('规则名称', validators=[DataRequired(), Length(1, 100)]) url = StringField('目标URL', validators=[DataRequired(), URL(), Length(1, 500)]) pattern_type = SelectField('匹配类型', choices=[ ('css', 'CSS选择器'), ('xpath', 'XPath'), ('regex', '正则表达式') ], validators=[DataRequired()]) pattern = TextAreaField('主匹配模式', validators=[DataRequired()]) class UserForm(FlaskForm): username = StringField('用户名', validators=[DataRequired(), Length(1, 64)]) email = StringField('邮箱', validators=[DataRequired(), Email()]) password = PasswordField('密码', validators=[Length(6, 128)]) max_proxies = IntegerField('最大代理数', default=100, validators=[DataRequired()]) rate_limit = IntegerField('速率限制(次/分钟)', default=60, validators=[DataRequired()]) expires_at = StringField('到期时间', render_kw={'placeholder': 'YYYY-MM-DD HH:MM:SS 或留空为永久'}) is_active = BooleanField('账户激活', default=True) is_admin = BooleanField('管理员权限') submit = SubmitField('保存用户') class SystemConfigForm(FlaskForm): crawl_interval = IntegerField('抓取间隔(秒)', default=3600, validators=[DataRequired()]) validate_interval = IntegerField('验证间隔(秒)', default=300, validators=[DataRequired()]) validate_timeout = IntegerField('验证超时(秒)', default=10, validators=[DataRequired()]) validate_url = StringField('验证URL', default='http://httpbin.org/ip', validators=[DataRequired(), URL()]) class ImportForm(FlaskForm): import_type = SelectField('导入类型', choices=[ ('text', '文本导入'), ('api', 'API导入') ], validators=[DataRequired()]) protocol = SelectField('默认协议', choices=[ ('http', 'HTTP'), ('https', 'HTTPS'), ('socks4', 'SOCKS4'), ('socks5', 'SOCKS5') ], default='http') text = TextAreaField('代理文本', render_kw={'placeholder': '每行一个代理,格式: 协议://用户:密码@IP:端口 或 IP:端口'}) api_url = StringField('API地址', render_kw={'placeholder': '请输入API URL'}) pattern = StringField('匹配模式', render_kw={'placeholder': '正则表达式模式'}) submit = SubmitField('导入代理') class ClearProxiesForm(FlaskForm): submit = SubmitField('清理无效代理') app.py 首次运行不对了,首次运行网页跳转安装配置数据库和管理用户,修复后完整的代码
09-22
D:\p\python.exe C:\Users\vento\Desktop\数据采集作业\selenium爬取酷狗top500.py Traceback (most recent call last): File "D:\p\lib\site-packages\urllib3\connectionpool.py", line 789, in urlopen response = self._make_request( File "D:\p\lib\site-packages\urllib3\connectionpool.py", line 490, in _make_request raise new_e File "D:\p\lib\site-packages\urllib3\connectionpool.py", line 466, in _make_request self._validate_conn(conn) File "D:\p\lib\site-packages\urllib3\connectionpool.py", line 1095, in _validate_conn conn.connect() File "D:\p\lib\site-packages\urllib3\connection.py", line 730, in connect sock_and_verified = _ssl_wrap_socket_and_match_hostname( File "D:\p\lib\site-packages\urllib3\connection.py", line 909, in _ssl_wrap_socket_and_match_hostname ssl_sock = ssl_wrap_socket( File "D:\p\lib\site-packages\urllib3\util\ssl_.py", line 469, in ssl_wrap_socket ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname) File "D:\p\lib\site-packages\urllib3\util\ssl_.py", line 513, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "D:\p\lib\ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "D:\p\lib\ssl.py", line 1040, in _create self.do_handshake() File "D:\p\lib\ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\p\lib\site-packages\requests\adapters.py", line 667, in send resp = conn.urlopen( File "D:\p\lib\site-packages\urllib3\connectionpool.py", line 843, in urlopen retries = retries.increment( File "D:\p\lib\site-packages\urllib3\util\retry.py", line 474, in increment raise reraise(type(error), error, _stacktrace) File "D:\p\lib\site-packages\urllib3\util\util.py", line 38, in reraise raise value.with_traceback(tb) File "D:\p\lib\site-packages\urllib3\connectionpool.py", line 789, in urlopen response = self._make_request( File "D:\p\lib\site-packages\urllib3\connectionpool.py", line 490, in _make_request raise new_e File "D:\p\lib\site-packages\urllib3\connectionpool.py", line 466, in _make_request self._validate_conn(conn) File "D:\p\lib\site-packages\urllib3\connectionpool.py", line 1095, in _validate_conn conn.connect() File "D:\p\lib\site-packages\urllib3\connection.py", line 730, in connect sock_and_verified = _ssl_wrap_socket_and_match_hostname( File "D:\p\lib\site-packages\urllib3\connection.py", line 909, in _ssl_wrap_socket_and_match_hostname ssl_sock = ssl_wrap_socket( File "D:\p\lib\site-packages\urllib3\util\ssl_.py", line 469, in ssl_wrap_socket ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname) File "D:\p\lib\site-packages\urllib3\util\ssl_.py", line 513, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "D:\p\lib\ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "D:\p\lib\ssl.py", line 1040, in _create self.do_handshake() File "D:\p\lib\ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None)) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\p\lib\site-packages\webdriver_manager\core\http.py", line 32, in get resp = requests.get( File "D:\p\lib\site-packages\requests\api.py", line 73, in get return request("get", url, params=params, **kwargs) File "D:\p\lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "D:\p\lib\site-packages\requests\sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "D:\p\lib\site-packages\requests\sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "D:\p\lib\site-packages\requests\adapters.py", line 682, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None)) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\vento\Desktop\数据采集作业\selenium爬取酷狗top500.py", line 17, in <module> driver = init_driver() File "C:\Users\vento\Desktop\数据采集作业\selenium爬取酷狗top500.py", line 14, in init_driver driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options) File "D:\p\lib\site-packages\webdriver_manager\chrome.py", line 40, in install driver_path = self._get_driver_binary_path(self.driver) File "D:\p\lib\site-packages\webdriver_manager\core\manager.py", line 35, in _get_driver_binary_path binary_path = self._cache_manager.find_driver(driver) File "D:\p\lib\site-packages\webdriver_manager\core\driver_cache.py", line 107, in find_driver driver_version = self.get_cache_key_driver_version(driver) File "D:\p\lib\site-packages\webdriver_manager\core\driver_cache.py", line 154, in get_cache_key_driver_version return driver.get_driver_version_to_download() File "D:\p\lib\site-packages\webdriver_manager\core\driver.py", line 48, in get_driver_version_to_download return self.get_latest_release_version() File "D:\p\lib\site-packages\webdriver_manager\drivers\chrome.py", line 59, in get_latest_release_version response = self._http_client.get(url) File "D:\p\lib\site-packages\webdriver_manager\core\http.py", line 35, in get raise exceptions.ConnectionError(f"Could not reach host. Are you offline?") requests.exceptions.ConnectionError: Could not reach host. Are you offline?这是什么问题我是一名刚开始学习爬虫的一名学生,请帮我修改一下
06-16
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Multi-Protocol AI Agent Server Startup Script Supports WebSocket and SSE multi-protocol client multi-session mode Supports loading configuration from .env file """ import asyncio import argparse import signal import sys from pathlib import Path # 添加当前目录到路径 sys.path.insert(0, str(Path(__file__).parent)) from server.server import AgentServer from utils.logger import init_logging from server.protocol.constants import DefaultConfig import uvicorn class ServerManager: """服务器管理器""" def __init__(self, host: str, port: int, log_level: str = "INFO"): self.host = host self.port = port self.log_level = log_level self.server = None self.running = False self.uvicorn_server = None self.uvicorn_task = None async def start(self): """启动服务器""" try: log_config = { "log_level": self.log_level, "log_dir": "logs", "enable_console": True, "enable_file": True } # 创建服务器实例 self.server = AgentServer(self.host, self.port, log_config) # 启动服务器组件 await self.server.start() print(f"🚀 Multi-Protocol AI Agent Server started on {self.host}:{self.port}") print(f"📊 Stats API: http://{self.host}:{self.port}/stats") print(f"🧪 Test page: http://{self.host}:{self.port}/test") print(f"💡 WebSocket endpoint: ws://{self.host}:{self.port}/ws") print(f"📬 SSE chat endpoint: POST http://{self.host}:{self.port}/sse/message") print(f"🔌 OpenAI compatible endpoint: POST http://{self.host}:{self.port}/v1/chat/completions") print() print("✨ Features:") print(" - Multi-protocol support (WebSocket + SSE + HTTP + OpenAI API)") print(" - Multi-client support with unique client IDs") print(" - Multiple sessions per client") print(" - Protocol and network layer separation") print(" - Reliable messaging with ACK mechanism") print(" - Real-time statistics and monitoring") print(" - Built-in test interface") print(" - WebSocket/SSE: Stream thinking process + final result") print(" - HTTP: Direct final result only") print(" - OpenAI API: Standard compatible with optional streaming") print() print("Press Ctrl+C to stop the server") self.running = True # 启动Uvicorn服务器 config = uvicorn.Config( app=self.server.app, host=self.host, port=self.port, log_level=self.log_level.lower(), access_log=True, # 禁用Uvicorn的信号处理,使用我们自己的 use_colors=False, loop="uvloop" ) self.uvicorn_server = uvicorn.Server(config) # 使用任务来运行Uvicorn,这样我们可以控制中断 self.uvicorn_task = asyncio.create_task(self.uvicorn_server.serve()) # 等待Uvicorn任务完成或被中断 try: await self.uvicorn_task except asyncio.CancelledError: # 任务被取消,这是正常的 pass except Exception as e: print(f"❌ Failed to start server: {e}") sys.exit(1) async def stop(self): """停止服务器""" if self.running: print("\n🛑 Stopping server...") self.running = False # 取消Uvicorn任务 if self.uvicorn_task and not self.uvicorn_task.done(): self.uvicorn_task.cancel() try: await self.uvicorn_task except asyncio.CancelledError: pass # 停止Uvicorn服务器 if self.uvicorn_server: self.uvicorn_server.should_exit = True # 停止AgentServer if self.server: await self.server.stop() print("✅ Server stopped") def setup_signal_handlers(self): """设置信号处理器""" def signal_handler(signum, frame): print(f"\n📢 Received signal {signum}") # 直接设置退出标志,让主循环处理 self.running = False if self.uvicorn_server: self.uvicorn_server.should_exit = True # 强制退出程序 sys.exit(0) # 在Windows上,SIGINT处理可能不可靠,所以我们主要依赖KeyboardInterrupt try: signal.signal(signal.SIGINT, signal_handler) signal.signal(signal.SIGTERM, signal_handler) except (OSError, ValueError): # Windows上可能不支持某些信号 pass def main(): """主函数""" # 验证配置 try: DefaultConfig.validate_config() except ValueError as e: print(f"❌ Configuration error: {e}") sys.exit(1) parser = argparse.ArgumentParser( description="Multi-Protocol AI Agent Server (WebSocket + SSE + HTTP + OpenAI API)", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=""" Examples: python start_server.py # 使用 .env 文件配置启动 python start_server.py -p 8080 # 覆盖端口配置 python start_server.py -H 0.0.0.0 -p 8080 --log-level DEBUG # 完整配置覆盖 Configuration: 🔧 Primary: .env file in project root 🔧 Secondary: Command line arguments (override .env) 🔧 Fallback: Built-in defaults Environment Variables (.env file): SERVER_HOST=0.0.0.0 # 服务器主机地址 SERVER_PORT=8000 # 服务器端口 LOG_LEVEL=INFO # 日志级别 Features: ✨ Multi-protocol support (WebSocket + SSE + HTTP + OpenAI API) ✨ Multi-client support (每个客户端有唯一clientid) ✨ Multi-session per client (每个客户端可有多个会话) ✨ Modular architecture (协议层和网络层分离) ✨ Reliable messaging (requestid + ack 机制) ✨ Real-time monitoring (实时统计和监控) ✨ Built-in test interface (内置测试页面) ✨ No heartbeat for SSE (SSE不需要心跳机制) ✨ OpenAI API: Standard compatible with optional streaming ✨ Environment-based configuration (.env support) Endpoints: WebSocket: ws://localhost:8000/ws HTTP or SSE: POST http://localhost:8000/v1/chat/completions """ ) parser.add_argument( "-H", "--host", default=DefaultConfig.DEFAULT_HOST, help=f"服务器主机地址 (默认: {DefaultConfig.DEFAULT_HOST}, 可通过 SERVER_HOST 环境变量设置)" ) parser.add_argument( "-p", "--port", type=int, default=DefaultConfig.DEFAULT_PORT, help=f"服务器端口 (默认: {DefaultConfig.DEFAULT_PORT}, 可通过 SERVER_PORT 环境变量设置)" ) parser.add_argument( "--log-level", choices=["DEBUG", "INFO", "WARNING", "ERROR"], default=DefaultConfig.DEFAULT_LOG_LEVEL, help=f"日志级别 (默认: {DefaultConfig.DEFAULT_LOG_LEVEL}, 可通过 LOG_LEVEL 环境变量设置)" ) parser.add_argument( "--version", action="version", version="WebSocket AI Agent Server v2.0.0" ) parser.add_argument( "--show-config", action="store_true", help="显示当前配置并退出" ) args = parser.parse_args() # 显示配置 if args.show_config: print("🔧 Current Configuration:") print(f" Server Host: {args.host}") print(f" Server Port: {args.port}") print(f" Log Level: {args.log_level}") print("\n🤖 LLM Providers Configuration:") print(f" Available Providers: {', '.join(DefaultConfig.LLM_PROVIDERS)}") print(f" Default Provider: {DefaultConfig.LLM_DEFAULT_PROVIDER}") print("\n📋 Provider Details:") for provider in DefaultConfig.LLM_PROVIDERS: config = DefaultConfig.get_provider_config(provider) # 正确检查API Key状态 api_key_status = "❌ Not Set" if config['api_key']: if config['api_key'] == "sk-your-key-here": api_key_status = "❌ Example Value (Invalid)" else: api_key_status = f"✅ Set {config['api_key'][:8]}****{config['api_key'][-8:]}..." print(f" {provider}:") print(f" ├─ Model: {config['api_model'] or '❌ Not Set'}") print(f" ├─ Base URL: {config['api_base_url'] or '❌ Not Set'}") print(f" └─ API Key: {api_key_status}") print("\n🔍 Provider Validation:") validation_passed = 0 for provider in DefaultConfig.LLM_PROVIDERS: is_valid, error = DefaultConfig.validate_provider(provider) status = "✅ Valid" if is_valid else f"❌ {error}" print(f" {provider}: {status}") if is_valid: validation_passed += 1 print(f"\n📊 Summary: {validation_passed}/{len(DefaultConfig.LLM_PROVIDERS)} providers are valid") return # 创建服务器管理器 manager = ServerManager(args.host, args.port, args.log_level) manager.setup_signal_handlers() try: asyncio.run(manager.start()) except KeyboardInterrupt: print("\n🔄 Graceful shutdown initiated...") try: asyncio.run(manager.stop()) except: pass except Exception as e: print(f"❌ Server error: {e}") try: asyncio.run(manager.stop()) except: pass sys.exit(1) if __name__ == "__main__": # 检查Python版本 if sys.version_info < (3, 10): print("❌ Python 3.10 或更高版本是必需的") sys.exit(1) try: main() except KeyboardInterrupt: print("\n👋 Goodbye!") except Exception as e: print(f"❌ Startup error: {e}") sys.exit(1) 改代码的端口
最新发布
09-30
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值