深入解析proxy.py中的HttpParser：HTTP报文解析核心组件-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_01059/article/details/148578675

深入解析proxy.py中的HttpParser：HTTP报文解析核心组件

proxy.py ⚡ Fast • 🪶 Lightweight • 0️⃣ Dependency • 🔌 Pluggable • 😈 TLS interception • 🔒 DNS-over-HTTPS • 🔥 Poor Man's VPN • ⏪ Reverse & ⏩ Forward • 👮🏿 "Proxy Server" framework • 🌐 "Web Server" framework • ➵ ➶ ➷ ➠ "PubSub" framework • 👷 "Work" acceptor & executor framework 项目地址: https://gitcode.com/gh_mirrors/pr/proxy.py

引言

在现代网络编程中，HTTP协议的解析是构建Web服务器、中转服务等网络应用的基础。proxy.py项目提供了一个高效灵活的HTTP解析器——HttpParser，它是整个项目的核心组件之一。本文将深入探讨HttpParser的设计原理、使用方法和实际应用场景。

HttpParser概述

HttpParser是一个多功能HTTP报文解析器，具有以下核心特性：

双向解析能力：可以同时解析HTTP请求和响应报文
协议扩展性：支持解析类似HTTP的协议如ICAP、SIP等
中转优化：专为中间服务场景优化设计
高效实现：基于内存视图(memoryview)实现高性能解析

HttpParser的设计哲学源于中间服务场景，这使得它在处理特有的HTTP报文格式时表现出色。

基础使用示例

1. 解析普通HTTP请求

让我们从一个基本的HTTP GET请求开始：

from proxy.http.methods import httpMethods
from proxy.http.parser import HttpParser, httpParserTypes, httpParserStates
from proxy.common.constants import HTTP_1_1

get_request = HttpParser(httpParserTypes.REQUEST_PARSER)
get_request.parse(memoryview(b'GET / HTTP/1.1\r\nHost: jaxl.com\r\n\r\n'))

print(get_request.build())

# 验证解析结果
assert get_request.is_complete
assert get_request.method == httpMethods.GET
assert get_request.version == HTTP_1_1
assert get_request.host == None  # 注意这里host为None
assert get_request.port == 80
assert get_request._url.remainder == b'/'
assert get_request.header(b'host') == b'jaxl.com'

关键点说明：

普通HTTP请求的host信息只存在于Host头中，而非请求行
必须通过header(b'host')方法获取Host头信息
_url属性包含了URL的解析结果

2. 解析HTTP中转请求

中转请求与普通HTTP请求的主要区别在于请求行包含完整URL：

proxy_request = HttpParser(httpParserTypes.REQUEST_PARSER)
proxy_request.parse(memoryview(b'GET http://jaxl.com/ HTTP/1.1\r\nHost: jaxl.com\r\n\r\n'))

print(proxy_request.build())  # 普通格式
print(proxy_request.build(for_proxy=True))  # 中转格式

# 验证解析结果
assert proxy_request.host == b'jaxl.com'  # 这里host有值
assert proxy_request.port == 80

关键点说明：

中转请求的请求行包含完整URL，因此host信息可直接从URL获取
build(for_proxy=True)可生成适合中转转发的报文格式
中转场景下host和port属性会被自动填充

3. 解析HTTPS中转请求(CONNECT方法)

HTTPS中转使用特殊的CONNECT方法建立隧道：

connect_request = HttpParser(httpParserTypes.REQUEST_PARSER)
connect_request.parse(memoryview(b'CONNECT jaxl.com:443 HTTP/1.1\r\nHost: jaxl.com:443\r\n\r\n'))

print(connect_request.build())
print(connect_request.build(for_proxy=True))

# 验证解析结果
assert connect_request.is_https_tunnel
assert connect_request.host == b'jaxl.com'
assert connect_request.port == 443

关键点说明：