sqlalchemy多线程使用,threadlocal

本文介绍了在Tornado中使用SQLAlchemy时遇到的并发问题,即非线程安全的Session。解决方案是采用ScopedSession结合ThreadLocal,确保每个线程拥有独立的Session。文中详细解析了ThreadLocal的工作原理,并探讨了ScopedSession的使用及可能遇到的问题,如并发查询数据不一致。文章最后提到了解决此类问题的两种方法:提交或关闭旧事物以开始新事物,或关闭旧连接使用新连接查询。
部署运行你感兴趣的模型镜像

问题

tornado中使用sqlalchemy遇到一个问题,接口在并发的时候,sqlalchemy会报各种错误。

解决办法

原因是sqlalchemy用sessionmaker直接建立的session本身就不是线程安全的,怎么保证线程安全呢,一般官方推荐用scoped_session有范围的session,来封装

class scoped_session(object):

    def __init__(self, session_factory, scopefunc=None):

        self.session_factory = session_factory

        if scopefunc:
            self.registry = ScopedRegistry(session_factory, scopefunc)
        else:
            self.registry = ThreadLocalRegistry(session_factory)

    def __call__(self, **kw):

        if kw:
            if self.registry.has():
                raise sa_exc.InvalidRequestError(
                    "Scoped session is already present; "
                    "no new arguments may be specified."
                )
            else:
                sess = self.session_factory(**kw)
                self.registry.set(sess)
                return sess
        else:
            return self.registry()

    def remove(self):

        if self.registry.has():
            self.registry().close()
        self.registry.clear()

scoped_session包裹了原来的sessionmaker,实例化了一个TreadLocalRegistry对象,scoped_session()调用__call__ -> self.registry() -> TreadLocalRegistry.__call__方法,

先从本地线程取出value,如果异常则用sessionmaker创建一个session,并放在这个线程变量里面,同一个线程进来则使用同一个session

class ThreadLocalRegistry(ScopedRegistry):
    """A :class:`.ScopedRegistry` that uses a ``threading.local()``
    variable for storage.

    """

    def __init__(self, createfunc):
        self.createfunc = createfunc
        self.registry = threading.local()

    def __call__(self):
        try:
            return self.registry.value
        except AttributeError:
            val = self.registry.value = self.createfunc()
            return val

    def has(self):
        return hasattr(self.registry, "value")

    def set(self, obj):
        self.registry.value = obj

    def clear(self):
        try:
            del self.registry.value
        except AttributeError:
            pass
class _localimpl:
    """A class managing thread-local dicts"""
    __slots__ = 'key', 'dicts', 'localargs', 'locallock', '__weakref__'

    def __init__(self):
        self.dicts = {}

    def get_dict(self):
        thread = current_thread()
        return self.dicts[id(thread)][1]

    def create_dict(self):
        """Create a new dict for the current thread, and return it."""
        localdict = {}
        key = self.key
        thread = current_thread()
        idt = id(thread)
        def local_deleted(_, key=key):
            # When the localimpl is deleted, remove the thread attribute.
            thread = wrthread()
            if thread is not None:
                del thread.__dict__[key]
        def thread_deleted(_, idt=idt):
            # When the thread is deleted, remove the local dict.
            # Note that this is suboptimal if the thread object gets
            # caught in a reference loop. We would like to be called
            # as soon as the OS-level thread ends instead.
            local = wrlocal()
            if local is not None:
                dct = local.dicts.pop(idt)
        wrlocal = ref(self, local_deleted)
        wrthread = ref(thread, thread_deleted)
        thread.__dict__[key] = wrlocal
        self.dicts[idt] = wrthread, localdict
        return localdict


@contextmanager
def _patch(self):
    impl = object.__getattribute__(self, '_local__impl')
    try:
        dct = impl.get_dict()
    except KeyError:
        dct = impl.create_dict()
        args, kw = impl.localargs
        self.__init__(*args, **kw)
    with impl.locallock:
        object.__setattr__(self, '__dict__', dct)
        yield
class local:
    __slots__ = '_local__impl', '__dict__'

    def __new__(cls, *args, **kw):
        if (args or kw) and (cls.__init__ is object.__init__):
            raise TypeError("Initialization arguments are not supported")
        self = object.__new__(cls)
        impl = _localimpl()
        impl.localargs = (args, kw)
        impl.locallock = RLock()
        object.__setattr__(self, '_local__impl', impl)
        # We need to create the thread dict in anticipation of
        # __init__ being called, to make sure we don't call it
        # again ourselves.
        impl.create_dict()
        return self

    def __getattribute__(self, name):
        with _patch(self):
            return object.__getattribute__(self, name)

    def __setattr__(self, name, value):
        if name == '__dict__':
            raise AttributeError(
                "%r object attribute '__dict__' is read-only"
                % self.__class__.__name__)
        with _patch(self):
            return object.__setattr__(self, name, value)

    def __delattr__(self, name):
        if name == '__dict__':
            raise AttributeError(
                "%r object attribute '__dict__' is read-only"
                % self.__class__.__name__)
        with _patch(self):
            return object.__delattr__(self, name)

threadlocal的核心是,是一个object类,有属性_local__impl;

_local__impl里面有一个以current_thread的id为key的字典,当获取到线程锁的时候,把本地线程的dct塞到object的__dict__里面,

with impl.locallock:
    object.__setattr__(self, '__dict__', dct)
    yield

切换了线程字典上下文之后,object的属性查询和修改就是这个线程字典里面的:

def __getattribute__(self, name):
    with _patch(self):
        return object.__getattribute__(self, name)

综上所述,threadlocal对象维护一个所有线程的字典的对象_local__impl,__getattribute__,__setattr__操作__dict__的属性的之前,根据当前current_thread的id,找到__dict__并获取锁,然后替换threadlocal对象的__dict__,最后__getattribute__,__setattr__。

 

心得

1.tornado多线程使用sqlalchemy 有两个方式(flask-sqlalchemy原理一样)

a.通过threadlocal在使用的地方动态获取

b.handler的时候threadlocal里面获取后,层层函数传递下去(不可取)

2.contextmanager,生成器转换成with上线文的装饰器

3.需要掌握threadlocal的设计思路,运用于解决多线程上下文问题

-----------------------------------------------------------------------------------------------------------------------------

使用scoped_sesssion后,并发情况下,一个session修改数据了,另一个sesion查询的还是老的数据,难道sqlalchemy有查询缓存?

sqlalchemy本身没有缓存数据,而是scoped_session的工作流有问题,官网推荐是https://docs.sqlalchemy.org/en/13/orm/contextual.html?highlight=scoped#sqlalchemy.orm.scoping.scoped_session

 The scoped_session.remove() method first calls Session.close() on the current Session, which has the effect of releasing any connection/transactional resources owned by the Session first, then discarding the Session itself. “Releasing” here means that connections are returned to their connection pool and any transactional state is rolled back, ultimately using the rollback() method of the underlying DBAPI connection.

它会把session.close(),释放任何它的连接和事物资源,然后del 它自身。这里的释放意思是,连接交还给连接池,任何事物状态回到初始,底层api连接还使用rollback方法。

用法是这样,那底层的原因是什么了?

A session,B seesion都是之前用过的session,A里面提交了一个修改,B里面查询不到数据,如果新启动了一个session是能知道这个修改的。

B session启动或者commit之后会自动调用transaction._begin()来开启一个新的事物,事物之间有默认的隔离级别(比如msyql是可重复读),正是这个自动开启的事物和可重复读的隔离级别让这个session保持读到老数据,数据不受外部影响。

关键代码如下:

class Session(_SessionClassMethods):
   def __init__(self):
        if not self.autocommit:
            self.begin()

   def commit(self):
        if self.transaction is None:
            if not self.autocommit:
                self.begin()
            else:
                raise sa_exc.InvalidRequestError("No transaction is begun.")

        self.transaction.commit()


class SessionTransaction(object):
   def commit(self):
        self._assert_active(prepared_ok=True)
        if self._state is not PREPARED:
            self._prepare_impl()

        if self._parent is None or self.nested:
            for t in set(self._connections.values()):
                t[1].commit()

            self._state = COMMITTED
            self.session.dispatch.after_commit(self.session)

            if self.session._enable_transaction_accounting:
                self._remove_snapshot()

        self.close()
        return self._parent

    def close(self, invalidate=False):
        self.session.transaction = self._parent
        if self._parent is None:
            for connection, transaction, autoclose in set(
                self._connections.values()
            ):
                if invalidate:
                    connection.invalidate()
                if autoclose:
                    connection.close()
                else:
                    transaction.close()

        self._state = CLOSED
        self.session.dispatch.after_transaction_end(self.session, self)

        if self._parent is None:
            if not self.session.autocommit:
                self.session.begin()
        self.session = None
        self._connections = None

所有解决办法有两个:

1.B session使用之前commit关闭上一个事物,新起一个新的时间点的事物知道了最新的改动

2.B session关闭旧的连接使用新的连接查询,sqlalchemy是推荐scoped_session.remove(),这个会在debug模式,增加这样的打印:2020-07-07 15:05:48,118 INFO sqlalchemy.engine.base.Engine ROLLBACK

您可能感兴趣的与本文相关的镜像

Llama Factory

Llama Factory

模型微调
LLama-Factory

LLaMA Factory 是一个简单易用且高效的大型语言模型(Large Language Model)训练与微调平台。通过 LLaMA Factory,可以在无需编写任何代码的前提下,在本地完成上百种预训练模型的微调

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值