使用python-cassandra遇到的一个问题

在测试一个从Cassandra集群获取数据并生成配置文件的Python程序时,遇到运行时错误。程序在运维环境报错,原因是程序在部分连接建立时就开始执行,导致在关闭时仍有未完成的连接尝试。通过对`cassandra.cluster`模块的深入研究,发现`Cluster`构造方法中的`wait_for_all_pools_to_connect`参数能解决此问题,将其设置为True,确保所有连接建立完成后再执行程序,问题得到解决。

今天和运维在一起测试我写的一个程序时,在日志中发现了一个error,然而我写完程序后自测时是OK的,想起测试之前我拍着胸脯打包票说“我的程序肯定没问题”时,红扑扑的小脸没地儿搁了。

有问题那就解决!

先说下程序背景,我的程序的目的是从Cassandra集群中取数据,然后根据取得的数据生成一个配置文件供别的程序使用。程序结构还蛮简单的,用Python来写。

Cassandra集群有6台,这里就用编号1-6表示。连接的代码如下:


# 设置登陆Cassandra的用户、密码
auth = PlainTextAuthProvider(_config['username'], _config['password'])

cluster = Cluster(_config['addresses'], auth_provider = auth)

session = cluster.connect(_config['keyspace'])

具体使用cql语句的时候就像session.execute(cql)这样子使用。

好了,以上代码在自己的测试环境中跑毫无问题。

but,运维跑得时候就有问题了,日志如下:(这里只贴了关键日志)

2016-12-13 15:26:42,725 - cassandra.pool - DEBUG - 76 - create_file_for_mbw : Host 1 is now marked up
2016-12-13 15:26:42,725 - cassandra.pool - DEBUG - 76 - create_file_for_mbw : Host 2 is now marked up
2016-12-13 15:26:42,725 - cassandra.pool - DEBUG - 76 - create_file_for_mbw : Host 3 is now marked up
2016-12-13 15:26:42,725 - cassandra.pool - DEBUG - 76 - create_file_for_mbw : Host 4 is now marked up
2016-12-13 15:26:42,726 - cassandra.pool - DEBUG - 76 - create_file_for_mbw : Host 5 is now marked up
2016-12-13 15:26:42,725 - cassandra.pool - DEBUG - 76 - create_file_for_mbw : Host 6 is now marked up
2016-12-13 15:26:42,970 - cassandra.cluster - DEBUG - 76 - create_file_for_mbw : [control connection] Finished fetching ring info
######  到这里是收集完成Cassandra集群信息

2016-12-13 15:26:42,970 - cassandra.cluster - DEBUG - 76 - create_file_for_mbw : [control connection] Rebuilding token map due to topology changes
2016-12-13 15:26:43,077 - cassandra.cluster - DEBUG - 76 - create_file_for_mbw : Control connection created

######  开始初始化与集群中4、5Cassandra数据库的连接
2016-12-13 15:26:43,078 - cassandra.pool - DEBUG - 55 - thread : Initializing connection for host 5
2016-12-13 15:26:43,080 - cassandra.pool - DEBUG - 55 - thread : Initializing connection for host 4

######   在连接4、5Cassandra的时候完成用户认证
2016-12-13 15:26:43,253 - cassandra.connection - DEBUG - 378 - asyncorereactor : Connection <AsyncoreConnection(140278113775440) 10.1.5.205:9042> successfully authenticated
2016-12-13 15:26:43,265 - cassandra.connection - DEBUG - 378 - asyncorereactor : Connection <AsyncoreConnection(140278096170704) 10.1.5.204:9042> successfully authenticated

######   完成与4、5Cassandra的连接
2016-12-13 15:26:43,302 - cassandra.pool - DEBUG - 55 - thread : Finished initializing connection for host 5
2016-12-13 15:26:43,303 - cassandra.cluster - DEBUG - 55 - thread : Added pool for host 5 to session
2016-12-13 15:26:43,304 - cassandra.pool - DEBUG - 55 - thread : Finished initializing connection for host 4
2016-12-13 15:26:43,305 - cassandra.cluster - DEBUG - 55 - thread : Added pool for host 4 to session

######  准备连接6、1
2016-12-13 15:26:43,303 - cassandra.pool - DEBUG - 55 - thread : Initializing connection for host 6
2016-12-13 15:26:43,306 - cassandra.pool - DEBUG - 55 - thread : Initializing connection for host 1

######   我的程序执行日志。。。在这里会使用通过cluster.connect()连接后的session来执行cql语句。

######   我的程序执行完毕

######   关闭与集群的连接
2016-12-13 15:26:43,494 - cassandra.cluster - DEBUG - 24 - atexit : Shutting down Cluster Scheduler
2016-12-13 15:26:43,495 - cassandra.cluster - DEBUG - 24 - atexit : Shutting down control connection

######   关闭1、5、4、6的连接
2016-12-13 15:26:43,495 - cassandra.io.asyncorereactor - DEBUG - 316 - asyncorereactor : Closing connection (140278113772368) to 1
2016-12-13 15:26:43,495 - cassandra.io.asyncorereactor - DEBUG - 320 - asyncorereactor : Closed socket to 1
2016-12-13 15:26:43,496 - cassandra.io.asyncorereactor - DEBUG - 316 - asyncorereactor : Closing connection (140278113775440) to 5
2016-12-13 15:26:43,496 - cassandra.io.asyncorereactor - DEBUG - 320 - asyncorereactor : Closed socket to 5
2016-12-13 15:26:43,496 - cassandra.io.asyncorereactor - DEBUG - 316 - asyncorereactor : Closing connection (140278096170704) to 4
2016-12-13 15:26:43,497 - cassandra.io.asyncorereactor - DEBUG - 320 - asyncorereactor : Closed socket to 4
2016-12-13 15:26:48,306 - cassandra.io.asyncorereactor - DEBUG - 316 - asyncorereactor : Closing connection (140278095583888) to 6
2016-12-13 15:26:48,306 - cassandra.io.asyncorereactor - DEBUG - 320 - asyncorereactor : Closed socket to 6

###### 连接到6,在进行用户认证的时候报错,连接失败,OperationTimedOut
2016-12-13 15:26:48,306 - cassandra.connection - DEBUG - 324 - asyncorereactor : Connection to 6 was closed during the authentication process
2016-12-13 15:26:48,307 - cassandra.cluster - WARNING - 55 - thread : Failed to create connection pool for new host 6:
Traceback (most recent call last):
  File "cassandra/cluster.py", line 2300, in cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool (cassandra/cluster.c:40585)
    new_pool = HostConnection(host, distance, self)
  File "cassandra/pool.py", line 329, in cassandra.pool.HostConnection.__init__ (cassandra/pool.c:6245)
    self._connection = session.cluster.connection_factory(host.address)
  File "cassandra/cluster.py", line 1119, in cassandra.cluster.Cluster.connection_factory (cassandra/cluster.c:15085)
    return self.connection_class.factory(address, self.connect_timeout, *args, **kwargs)
  File "cassandra/connection.py", line 333, in cassandra.connection.Connection.factory (cassandra/connection.c:5790)
    raise OperationTimedOut("Timed out creating connection (%s seconds)" % timeout)
OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None

######   准备连接1
2016-12-13 15:26:48,311 - cassandra.pool - DEBUG - 55 - thread : Initializing connection for host 3

######   连接到1,进行用户认证的时候报错,连接失败,OperationTimedOut
2016-12-13 15:26:48,312 - cassandra.connection - DEBUG - 324 - asyncorereactor : Connection to 1 was closed during the authentication process
2016-12-13 15:26:48,313 - cassandra.cluster - WARNING - 55 - thread : Failed to create connection pool for new host 1:
Traceback (most recent call last):
  File "cassandra/cluster.py", line 2300, in cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool (cassandra/cluster.c:40585)
    new_pool = HostConnection(host, distance, self)
  File "cassandra/pool.py", line 329, in cassandra.pool.HostConnection.__init__ (cassandra/pool.c:6245)
    self._connection = session.cluster.connection_factory(host.address)
  File "cassandra/cluster.py", line 1119, in cassandra.cluster.Cluster.connection_factory (cassandra/cluster.c:15085)
    return self.connection_class.factory(address, self.connect_timeout, *args, **kwargs)
  File "cassandra/connection.py", line 333, in cassandra.connection.Connection.factory (cassandra/connection.c:5790)
    raise OperationTimedOut("Timed out creating connection (%s seconds)" % timeout)
OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None

根据以上的注释,看到这里应该很明白了,在拿到了4、5的连接,还没等所有连接完成,就开始执行我的程序了,我的程序执行完毕后cluster就开始shutdown了,而这时剩下的4个连接还没有连接完成,会继续走他的流程,去连接Cassandra,但是连接已经关闭,所以就报如上错误了。

对于这种问题,第一直觉应该是有配置项的,翻了源码(cluster__init__方法中的配置项),找了文档,没找到相关的,苦恼不已。最后在看到clusterconnect()方法时,看到这个方法是这样定义的:

def connect(self, keyspace=None, wait_for_all_pools=False)

wait_for_all_pools是什么鬼?往下继续翻。

session = self._new_session(keyspace)

if wait_for_all_pools:

    wait_futures(session._initial_connect_futures)
return session

感觉有戏了!

继续找wait_futures方法,在开头看到了

from concurrent.futures import ThreadPoolExecutor, FIRST_COMPLETED, wait as wait_futures

最后在concurrent.futures._base中找到了

def wait(fs, timeout=None, return_when=ALL_COMPLETED)

而这个方法的介绍是 **Wait for the futures in the given sequence to complete.
** 。恩,让我脸红的就是你了。

culster.connect()方法加上参数wait_for_all_pools=True,再让运维试了下,OK了!

wait_for_all_pools=True的设置是让集群中所有连接都连上后再返回一个session,而默认的做法是客户端只要与集群中有连接便返回session

心情舒畅~~

version: '3.8' services: cassandra-1: image: rdsource.tp-link.com:8088/cassandra:4.1.7 container_name: cassandra-1 environment: - CASSANDRA_CLUSTER_NAME=TestCluster - CASSANDRA_DC=DC1 - CASSANDRA_SEEDS=cassandra-1, cassandra-2 - CASSANDRA_LISTEN_ADDRESS=cassandra-1 - CASSANDRA_RPC_ADDRESS=0.0.0.0 - CASSANDRA_BROADCAST_RPC_ADDRESS=cassandra-1 volumes: - ./cassandra-data/node1:/var/lib/cassandra/data networks: - cassandra-net cassandra-2: image: rdsource.tp-link.com:8088/cassandra:4.1.7 container_name: cassandra-2 environment: - CASSANDRA_CLUSTER_NAME=TestCluster - CASSANDRA_DC=DC1 - CASSANDRA_SEEDS=cassandra-1, cassandra-2 - CASSANDRA_LISTEN_ADDRESS=cassandra-2 - CASSANDRA_RPC_ADDRESS=0.0.0.0 - CASSANDRA_BROADCAST_RPC_ADDRESS=cassandra-2 depends_on: - cassandra-1 volumes: - ./cassandra-data/node2:/var/lib/cassandra/data networks: - cassandra-net cassandra-3: image: rdsource.tp-link.com:8088/cassandra:4.1.7 container_name: cassandra-3 environment: - CASSANDRA_CLUSTER_NAME=TestCluster - CASSANDRA_DC=DC1 - CASSANDRA_SEEDS=cassandra-1, cassandra-1 - CASSANDRA_LISTEN_ADDRESS=cassandra-3 - CASSANDRA_RPC_ADDRESS=0.0.0.0 - CASSANDRA_BROADCAST_RPC_ADDRESS=cassandra-3 depends_on: - cassandra-1 volumes: - ./cassandra-data/node3:/var/lib/cassandra/data networks: - cassandra-net networks: cassandra-net: driver: bridge yz@yz-virtual-machine:/mnt/hgfs/code-docker/docker-cassandra$ docker-compose up -d Recreating cassandra-1 ... ERROR: for cassandra-1 'ContainerConfig' ERROR: for cassandra-1 'ContainerConfig' Traceback (most recent call last): File "/usr/bin/docker-compose", line 33, in <module> sys.exit(load_entry_point('docker-compose==1.29.2', 'console_scripts', 'docker-compose')()) File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 81, in main command_func() File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 203, in perform_command handler(command, command_options) File "/usr/lib/python3/dist-packages/compose/metrics/decorator.py", line 18, in wrapper result = fn(*args, **kwargs) File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 1186, in up to_attach = up(False) File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 1166, in up return self.project.up( File "/usr/lib/python3/dist-packages/compose/project.py", line 697, in up results, errors = parallel.parallel_execute( File "/usr/lib/python3/dist-packages/compose/parallel.py", line 108, in parallel_execute raise error_to_reraise File "/usr/lib/python3/dist-packages/compose/parallel.py", line 206, in producer result = func(obj) File "/usr/lib/python3/dist-packages/compose/project.py", line 679, in do return service.execute_convergence_plan( File "/usr/lib/python3/dist-packages/compose/service.py", line 579, in execute_convergence_plan return self._execute_convergence_recreate( File "/usr/lib/python3/dist-packages/compose/service.py", line 499, in _execute_convergence_recreate containers, errors = parallel_execute( File "/usr/lib/python3/dist-packages/compose/parallel.py", line 108, in parallel_execute raise error_to_reraise File "/usr/lib/python3/dist-packages/compose/parallel.py", line 206, in producer result = func(obj) File "/usr/lib/python3/dist-packages/compose/service.py", line 494, in recreate return self.recreate_container( File "/usr/lib/python3/dist-packages/compose/service.py", line 612, in recreate_container new_container = self.create_container( File "/usr/lib/python3/dist-packages/compose/service.py", line 330, in create_container container_options = self._get_container_create_options( File "/usr/lib/python3/dist-packages/compose/service.py", line 921, in _get_container_create_options container_options, override_options = self._build_container_volume_options( File "/usr/lib/python3/dist-packages/compose/service.py", line 960, in _build_container_volume_options binds, affinity = merge_volume_bindings( File "/usr/lib/python3/dist-packages/compose/service.py", line 1548, in merge_volume_bindings old_volumes, old_mounts = get_container_data_volumes( File "/usr/lib/python3/dist-packages/compose/service.py", line 1579, in get_container_data_volumes container.image_config['ContainerConfig'].get('Volumes') or {} KeyError: 'ContainerConfig' Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 155, in apport_excepthook pr.write(f) File "/usr/lib/python3/dist-packages/problem_report.py", line 401, in write file.write(v.replace(b'\n', b'\n ')) OSError: [Errno 28] No space left on device During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 153, in apport_excepthook with os.fdopen(os.open(pr_filename, OSError: [Errno 28] No space left on device Original exception was: Traceback (most recent call last): File "/usr/bin/docker-compose", line 33, in <module> sys.exit(load_entry_point('docker-compose==1.29.2', 'console_scripts', 'docker-compose')()) File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 81, in main command_func() File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 203, in perform_command handler(command, command_options) File "/usr/lib/python3/dist-packages/compose/metrics/decorator.py", line 18, in wrapper result = fn(*args, **kwargs) File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 1186, in up to_attach = up(False) File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 1166, in up return self.project.up( File "/usr/lib/python3/dist-packages/compose/project.py", line 697, in up results, errors = parallel.parallel_execute( File "/usr/lib/python3/dist-packages/compose/parallel.py", line 108, in parallel_execute raise error_to_reraise File "/usr/lib/python3/dist-packages/compose/parallel.py", line 206, in producer result = func(obj) File "/usr/lib/python3/dist-packages/compose/project.py", line 679, in do return service.execute_convergence_plan( File "/usr/lib/python3/dist-packages/compose/service.py", line 579, in execute_convergence_plan return self._execute_convergence_recreate( File "/usr/lib/python3/dist-packages/compose/service.py", line 499, in _execute_convergence_recreate containers, errors = parallel_execute( File "/usr/lib/python3/dist-packages/compose/parallel.py", line 108, in parallel_execute raise error_to_reraise File "/usr/lib/python3/dist-packages/compose/parallel.py", line 206, in producer result = func(obj) File "/usr/lib/python3/dist-packages/compose/service.py", line 494, in recreate return self.recreate_container( File "/usr/lib/python3/dist-packages/compose/service.py", line 612, in recreate_container new_container = self.create_container( File "/usr/lib/python3/dist-packages/compose/service.py", line 330, in create_container container_options = self._get_container_create_options( File "/usr/lib/python3/dist-packages/compose/service.py", line 921, in _get_container_create_options container_options, override_options = self._build_container_volume_options( File "/usr/lib/python3/dist-packages/compose/service.py", line 960, in _build_container_volume_options binds, affinity = merge_volume_bindings( File "/usr/lib/python3/dist-packages/compose/service.py", line 1548, in merge_volume_bindings old_volumes, old_mounts = get_container_data_volumes( File "/usr/lib/python3/dist-packages/compose/service.py", line 1579, in get_container_data_volumes container.image_config['ContainerConfig'].get('Volumes') or {} KeyError: 'ContainerConfig'
08-30
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值