
TORNADO源码分析
Tornado的web框架在web.py中实现,主要包括RequestHandler类(本质为对http请求处理的封装)和Application类(是一些列请求处理的集合,构成的一个web-application,源代码注释不翻译更容易理解:A collection of request handlers that make up a web application)。
RequestHandler分析
RequestHandler提供了一个针对http请求处理的基类封装,方法比较多,主要有以下功能:
提供了GET/HEAD/POST/DELETE/PATCH/PUT/OPTIONS等方法的功能接口,具体开发时RequestHandler的子类重写这些方法以支持不同需求的请求处理。
提供对http请求的处理方法,包括对headers,页面元素,cookie的处理。
提供对请求响应的一些列功能,包括redirect,write(将数据写入输出缓冲区),渲染模板(render, reander_string)等
其他的一些辅助功能,如结束请求/响应,刷新输出缓冲区,对用户授权相关处理等。
Application分析
源代码中的注释写的非常好:A collection of request handlers that make up a web application. Instances of this class are callable and can be passed directly to HTTPServer to serve the application. 该类初始化的第一个参数接受一个(regexp, request_class)形式的列表,指定了针对不同URL请求所采取的处理方法,包括对静态文件请求的处理(web.StaticFileHandler)。Application类中实现 __call__ 函数,这样该类就成为可调用的对象,由HTTPServer来进行调用。比如下边是httpserver.py中HTTPConection类的代码,该处request_callback即为Application对象。
def _on_headers(self, data):
# some codes...
self.request_callback(self._request)
__call__函数会遍历Application的handlers列表,匹配到相应的URL后通过handler._execute进行相应处理;如果没有匹配的URL,则会调用ErrorHandler。
在Application初始时有一个debug参数,当debug=True时,运行程序时当有代码、模块发生修改,程序会自动重新加载,即实现了auto-reload功能。该功能在autoreload.py文件中实现,是否需要reload的检查在每次接收到http请求时进行,基本原理是检查每一个sys.modules以及_watched_files所包含的模块在程序中所保存的最近修改时间和文件系统中的最近修改时间是否一致,如果不一致,则整个程序重新加载。
def _reload_on_update(modify_times):
for module in sys.modules.values():
# module test and some path handles
_check_file(modify_times, path)
for path in _watched_files:
_check_file(modify_times, path)
Tornado的autoreload模块提供了一个对外的main接口,可以通过下边的方法实现运行test.py程序运行的auto-reload。但是测试了一下,功能有限,相比于django的autorelaod模块(具有较好的封装和较完善的功能)还是有一定的差距。最主要的原因是Tornado中的实现耦合了一些ioloop的功能,因而autoreload不是一个可独立的模块。
# tornado
python -m tornado.autoreload test.py [args...]
# django
from django.utils import autoreload
autoreload.main(your-main-func)
asynchronous方法
该方法通常被用为请求处理函数的decorator,以实现异步操作,被@asynchronous修饰后的请求处理为长连接,在调用self.finish之前会一直处于连接等待状态。
总结
tornado源码分析2 一文中给出了一张tornado httpserver的工作流程图,调用Application发生在HTTPConnection大方框的handle_request椭圆中。那篇文章里使用的是一个简单的请求处理函数handle_request,无论是handle_request还是application,其本质是一个函数(可调用的对象),当服务器接收连接并读取http请求header之后进行调用,进行请求处理和应答。
http_server = httpserver.HTTPServer(handle_request)
http_server = httpserver.HTTPServer(application)
附录:autoreload.py文件源码 红色部分为实现debug=True,网页自动加载的功能部分
#!/usr/bin/env python
#
# Copyright 2009 Facebook
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
"""xAutomatically restart the server when a source file is modified.
Most applications should not access this module directly. Instead, pass the
keyword argument ``debug=True`` to the `tornado.web.Application` constructor.
This will enable autoreload mode as well as checking for changes to templates
and static resources. Note that restarting is a destructive operation
and any requests in progress will be aborted when the process restarts.
This module can also be used as a command-line wrapper around scripts
such as unit test runners. See the `main` method for details.
The command-line wrapper and Application debug modes can be used together.
This combination is encouraged as the wrapper catches syntax errors and
other import-time failures, while debug mode catches changes once
the server has started.
This module depends on `.IOLoop`, so it will not work in WSGI applications
and Google App Engine. It also will not work correctly when `.HTTPServer`'s
multi-process mode is used.
Reloading loses any Python interpreter command-line arguments (e.g. ``-u``)
because it re-executes Python using ``sys.executable`` and ``sys.argv``.
Additionally, modifying these variables will cause reloading to behave
incorrectly.
"""
from __future__ import absolute_import, division, print_function, with_statement
import os
import sys
# sys.path handling
# -----------------
#
# If a module is run with "python -m", the current directory (i.e. "")
# is automatically prepended to sys.path, but not if it is run as
# "path/to/file.py". The processing for "-m" rewrites the former to
# the latter, so subsequent executions won't have the same path as the
# original.
#
# Conversely, when run as path/to/file.py, the directory containing
# file.py gets added to the path, which can cause confusion as imports
# may become relative in spite of the future import.
#
# We address the former problem by setting the $PYTHONPATH environment
# variable before re-execution so the new process will see the correct
# path. We attempt to address the latter problem when tornado.autoreload
# is run as __main__, although we can't fix the general case because
# we cannot reliably reconstruct the original command line
# (http://bugs.python.org/issue14208).
if __name__ == "__main__":
# This sys.path manipulation must come before our imports (as much
# as possible - if we introduced a tornado.sys or tornado.os
# module we'd be in trouble), or else our imports would become
# relative again despite the future import.
#
# There is a separate __main__ block at the end of the file to call main().
if sys.path[0] == os.path.dirname(__file__):
del sys.path[0]
import functools
import logging
import os
import pkgutil
import sys
import traceback
import types
import subprocess
import weakref
from tornado import ioloop
from tornado.log import gen_log
from tornado import process
from tornado.util import exec_in
try:
import signal
except ImportError:
signal = None
_watched_files = set()
_reload_hooks = []
_reload_attempted = False
_io_loops = weakref.WeakKeyDictionary()
def start(io_loop=None, check_time=500):
"""Begins watching source files for changes using the given `.IOLoop`. """
io_loop = io_loop or ioloop.IOLoop.current()
if io_loop in _io_loops:
return
_io_loops[io_loop] = True
if len(_io_loops) > 1:
gen_log.warning("tornado.autoreload started more than once in the same process")
add_reload_hook(functools.partial(io_loop.close, all_fds=True))
modify_times = {}
callback = functools.partial(_reload_on_update, modify_times)
scheduler = ioloop.PeriodicCallback(callback, check_time, io_loop=io_loop)
scheduler.start()
def wait():
"""Wait for a watched file to change, then restart the process.
Intended to be used at the end of scripts like unit test runners,
to run the tests again after any source file changes (but see also
the command-line interface in `main`)
"""
io_loop = ioloop.IOLoop()
start(io_loop)
io_loop.start()
def watch(filename):
"""Add a file to the watch list.
All imported modules are watched by default.
"""
_watched_files.add(filename)
def add_reload_hook(fn):
"""Add a function to be called before reloading the process.
Note that for open file and socket handles it is generally
preferable to set the ``FD_CLOEXEC`` flag (using `fcntl` or
``tornado.platform.auto.set_close_exec``) instead
of using a reload hook to close them.
"""
_reload_hooks.append(fn)
def _reload_on_update(modify_times):
if _reload_attempted:
# We already tried to reload and it didn't work, so don't try again.
return
if process.task_id() is not None:
# We're in a child process created by fork_processes. If child
# processes restarted themselves, they'd all restart and then
# all call fork_processes again.
return
for module in sys.modules.values():
# Some modules play games with sys.modules (e.g. email/__init__.py
# in the standard library), and occasionally this can cause strange
# failures in getattr. Just ignore anything that's not an ordinary
# module.
if not isinstance(module, types.ModuleType):
continue
path = getattr(module, "__file__", None)
if not path:
continue
if path.endswith(".pyc") or path.endswith(".pyo"):
path = path[:-1]
_check_file(modify_times, path)
for path in _watched_files:
_check_file(modify_times, path)
def _check_file(modify_times, path):
try:
modified = os.stat(path).st_mtime
except Exception:
return
if path not in modify_times:
modify_times[path] = modified
return
if modify_times[path] != modified:
gen_log.info("%s modified; restarting server", path)
_reload()
def _reload():
global _reload_attempted
_reload_attempted = True
for fn in _reload_hooks:
fn()
if hasattr(signal, "setitimer"):
# Clear the alarm signal set by
# ioloop.set_blocking_log_threshold so it doesn't fire
# after the exec.
signal.setitimer(signal.ITIMER_REAL, 0, 0)
# sys.path fixes: see comments at top of file. If sys.path[0] is an empty
# string, we were (probably) invoked with -m and the effective path
# is about to change on re-exec. Add the current directory to $PYTHONPATH
# to ensure that the new process sees the same path we did.
path_prefix = '.' + os.pathsep
if (sys.path[0] == '' and
not os.environ.get("PYTHONPATH", "").startswith(path_prefix)):
os.environ["PYTHONPATH"] = (path_prefix +
os.environ.get("PYTHONPATH", ""))
if sys.platform == 'win32':
# os.execv is broken on Windows and can't properly parse command line
# arguments and executable name if they contain whitespaces. subprocess
# fixes that behavior.
subprocess.Popen([sys.executable] + sys.argv)
sys.exit(0)
else:
try:
os.execv(sys.executable, [sys.executable] + sys.argv)
except OSError:
# Mac OS X versions prior to 10.6 do not support execv in
# a process that contains multiple threads. Instead of
# re-executing in the current process, start a new one
# and cause the current process to exit. This isn't
# ideal since the new process is detached from the parent
# terminal and thus cannot easily be killed with ctrl-C,
# but it's better than not being able to autoreload at
# all.
# Unfortunately the errno returned in this case does not
# appear to be consistent, so we can't easily check for
# this error specifically.
os.spawnv(os.P_NOWAIT, sys.executable,
[sys.executable] + sys.argv)
sys.exit(0)
_USAGE = """\
Usage:
python -m tornado.autoreload -m module.to.run [args...]
python -m tornado.autoreload path/to/script.py [args...]
"""
def main():
"""Command-line wrapper to re-run a script whenever its source changes.
Scripts may be specified by filename or module name::
python -m tornado.autoreload -m tornado.test.runtests
python -m tornado.autoreload tornado/test/runtests.py
Running a script with this wrapper is similar to calling
`tornado.autoreload.wait` at the end of the script, but this wrapper
can catch import-time problems like syntax errors that would otherwise
prevent the script from reaching its call to `wait`.
"""
original_argv = sys.argv
sys.argv = sys.argv[:]
if len(sys.argv) >= 3 and sys.argv[1] == "-m":
mode = "module"
module = sys.argv[2]
del sys.argv[1:3]
elif len(sys.argv) >= 2:
mode = "script"
script = sys.argv[1]
sys.argv = sys.argv[1:]
else:
print(_USAGE, file=sys.stderr)
sys.exit(1)
try:
if mode == "module":
import runpy
runpy.run_module(module, run_name="__main__", alter_sys=True)
elif mode == "script":
with open(script) as f:
global __file__
__file__ = script
# Use globals as our "locals" dictionary so that
# something that tries to import __main__ (e.g. the unittest
# module) will see the right things.
exec_in(f.read(), globals(), globals())
except SystemExit as e:
logging.basicConfig()
gen_log.info("Script exited with status %s", e.code)
except Exception as e:
logging.basicConfig()
gen_log.warning("Script exited with uncaught exception", exc_info=True)
# If an exception occurred at import time, the file with the error
# never made it into sys.modules and so we won't know to watch it.
# Just to make sure we've covered everything, walk the stack trace
# from the exception and watch every file.
for (filename, lineno, name, line) in traceback.extract_tb(sys.exc_info()[2]):
watch(filename)
if isinstance(e, SyntaxError):
# SyntaxErrors are special: their innermost stack frame is fake
# so extract_tb won't see it and we have to get the filename
# from the exception object.
watch(e.filename)
else:
logging.basicConfig()
gen_log.info("Script exited normally")
# restore sys.argv so subsequent executions will include autoreload
sys.argv = original_argv
if mode == 'module':
# runpy did a fake import of the module as __main__, but now it's
# no longer in sys.modules. Figure out where it is and watch it.
loader = pkgutil.get_loader(module)
if loader is not None:
watch(loader.get_filename())
wait()
if __name__ == "__main__":
# See also the other __main__ block at the top of the file, which modifies
# sys.path before our imports
main()