odoo 多数据库引起的问题

odoo本身支持多数据库还是挺方便的,再没有开启多worker的情况下,感觉还不错。随着业务量的增长,我们势必会要开启多核以发挥最大化的性能。然而,odoo的多核模式却还隐藏着若干个坑,现在我们要提到的就是多数据库模式的坑。

问题的起源

之前讲到过,我们的系统需要跟其他的系统进行对接,为了解耦,消息队列就成了一个必然的选择。在我们搭建对接完成之后,服务跑的还可以,以为万事大吉,可以不用再过多过问消息对队列的问题了。然而没想到的事,有一天有人说两个系统之间的数据不一致,根据日志发现,跑的好好的RabbitMQ,突然就断开了。具体的表现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2019-11-19 06:53:11,381 10002 INFO ? odoo.sql_db: ConnectionPool(used=0/count=0/max=64): Closed 4 connections to 'user=odoo password=xxx dbname=erp host=172.19.0.12 port=5432 sslmode=prefer' 
2019-11-19 06:53:11,399 10002 INFO ? odoo.sql_db: ConnectionPool(used=0/count=0/max=64): Closed 1 connections to 'user=odoo password=xxx dbname=erp-dev host=172.19.0.12 port=5432 sslmode=prefer'
2019-11-19 06:53:11,417 10002 INFO ? odoo.sql_db: ConnectionPool(used=0/count=0/max=64): Closed 1 connections to 'user=odoo password=xxx dbname=odoo host=172.19.0.12 port=5432 sslmode=prefer'
2019-11-19 06:53:11,434 10002 INFO ? odoo.sql_db: ConnectionPool(used=0/count=0/max=64): Closed 1 connections to 'user=odoo password=xxx dbname=demo host=172.19.0.12 port=5432 sslmode=prefer'
2019-11-19 06:53:16,247 10003 INFO erp odoo.addons.base.models.ir_cron: Starting job `Backup scheduler`.
2019-11-19 06:53:16,273 10003 INFO erp odoo.modules.registry: Invalidating all model caches after database signaling.
2019-11-19 06:53:16,280 10003 ERROR erp odoo.addons.base.models.ir_cron: Call from cron Backup scheduler for server action #762 failed in Job #24
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/odoo/addons/base/models/ir_cron.py", line 102, in _callback
--
ERROR: cursor already closed
2019-11-19 06:55:21,297 10002 INFO ? odoo.sql_db: ConnectionPool(used=0/count=0/max=64): Closed 5 connections to 'user=odoo password=xxx dbname=erp host=172.19.0.12 port=5432 sslmode=prefer'
2019-11-19 06:55:21,307 10002 ERROR erp odoo.addons.orssica_erp.models.consumer: Traceback (most recent call last):
File "/home/ubuntu/odoo/odoo-dev/custom/he_addons/orssica_erp/models/consumer.py", line 127, in handler_output
picking = picking_obj.create(picking_data)
File "<decorator-gen-394>", line 2, in create
File "/usr/lib/python3/dist-packages/odoo/api.py", line 430, in _model_create_single
return create(self, arg)
File "/home/ubuntu/odoo/odoo-dev/custom/he_addons/pos_retail/models/stock/stock_picking.py", line 15, in create
picking = super(stock_picking, self).create(vals)
File "<decorator-gen-194>", line 2, in create
--
psycopg2.InterfaceError: cursor already closed

从日志上分析,是原本在接收到消息之后要执行的后续动作因为数据库连接的关闭而被阻断了。为了查找连接被阻断的原因,我们查看了odoo相关的配置:

1
2
3
4
5
6
7
limit_time_cpu = 7200
limit_time_real = 7200
; limit_request = 200
workers = 4
limit_memory_hard = 1610612736
limit_memory_soft = 4294967296
; list_db = False

最开始以为是多核模式下的内存设置问题,后来经过几次调整,发现问题并未得到改善。显然也不是因为CPU时间问题,因为如果时CPU时间超时的错误提示并非直接关闭数据库。

源码分析

在几经搜索无果之后,只能去源码里一探究竟了。我们发现,报错之前总是伴随着数据库连接关闭。而数据库连接关闭的日志是INFO,显然这是odoo的主动行为而并非意外关闭。根据提示,我们定位到sql_db.py文件的相关代码:

1
2
3
4
5
6
7
8
9
10
11
@locked
def close_all(self, dsn=None):
count = 0
last = None
for i, (cnx, used) in tools.reverse_enumerate(self._connections):
if dsn is None or cnx._original_dsn == dsn:
cnx.close()
last = self._connections.pop(i)[0]
count += 1
_logger.info('%r: Closed %d connections %s', self, count,
(dsn and last and 'to %r' % last.dsn) or '')

然后顺藤摸瓜地找到了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def process_work(self):
rpc_request = logging.getLogger('odoo.netsvc.rpc.request')
rpc_request_flag = rpc_request.isEnabledFor(logging.DEBUG)
_logger.debug("WorkerCron (%s) polling for jobs", self.pid)
db_names = self._db_list()
if len(db_names):
self.db_index = (self.db_index + 1) % len(db_names)
db_name = db_names[self.db_index]
self.setproctitle(db_name)
if rpc_request_flag:
start_time = time.time()
start_memory = memory_info(psutil.Process(os.getpid()))

from odoo.addons import base
base.models.ir_cron.ir_cron._acquire_job(db_name)

# dont keep cursors in multi database mode
if len(db_names) > 1:
odoo.sql_db.close_db(db_name)
if rpc_request_flag:
run_time = time.time() - start_time
end_memory = memory_info(psutil.Process(os.getpid()))
vms_diff = (end_memory - start_memory) / 1024
logline = '%s time:%.3fs mem: %sk -> %sk (diff: %sk)' % \
(db_name, run_time, start_memory / 1024, end_memory / 1024, vms_diff)
_logger.debug("WorkerCron (%s) %s", self.pid, logline)

self.request_count += 1
if self.request_count >= self.request_max and self.request_max < len(db_names):
_logger.error("There are more dabatases to process than allowed "
"by the `limit_request` configuration variable: %s more.",
len(db_names) - self.request_max)
else:
self.db_index = 0

从这段代码上来看,当odoo发现你是多数据库模式,新开一个worker的时候就会把之前的worker的数据库连接关掉。这就是造成,我们之前的数据库连接被莫名其妙干掉的原因。

知道了原因,解决方案就浮出水面了。既然多数据库模式下会出现这个问题,那么我们就把多余的数据库干掉就好了。

经验与教训

多数据库模式下,新开worker会关闭所有数据库连接。因此,我们在生产环境中最好不要建里多个数据库,如果要,也要用多个postgresql的用户使之独立,再结合odoo的db_filter参数,确保_db_list方法获取到的数据库只有一个就能避免这个问题。

你的支持我的动力