CheckPoint没有自动执行[TimesTen运维基础]

最新推荐文章于 2020-08-14 16:22:52 发布

TangYun_

最新推荐文章于 2020-08-14 16:22:52 发布

阅读量1k

点赞数

分类专栏： TimesTen Troubleshooting 文章标签： TimesTen checkpoint ckpt 内存数据库 ttckpthistory

本文链接：https://blog.youkuaiyun.com/TangYun_/article/details/29644759

版权

TimesTen Troubleshooting 专栏收录该内容

5 篇文章

订阅专栏

CheckPoint没有自动执行：
今天接到一个客户的电话，说他们有一个库的CheckPoint历史时间比较奇怪，而且事务日志一致都没有删除。
1、看了一下事务持有日志，确实有点奇怪持有日志的是CheckPoint文件，而且也没有主备复制和长事务。
Command> call ttlogholds;
< 11302, 54794696, Checkpoint                   , ocstt.ds0 >
< 11831, 3753696, Checkpoint                    , ocstt.ds1 >
2 rows found.
2、查看了一下CheckPoint历史：
Command> select sysdate from dual;
< 2014-05-19 17:28:54 >
1 row found.
Command> call ttckpthistory;
< 2014-06-09 09:40:32.625312, 2014-06-09 09:40:33.600128, Fuzzy           , Completed       , Checkpointer    , <NULL>, 0, 11032, 54794784, 94, 629145600, 80, 5586352, 40, 2354992, 2346248, <NULL> >
< 2014-06-09 09:30:32.410709, 2014-06-09 09:30:32.531441, Fuzzy           , Completed       , Checkpointer    , <NULL>, 1, 11032, 13583360, 94, 629145600, 80, 5586352, 40, 2354992, 2346248, <NULL> >
< 2014-06-06 16:16:18.869326, 2014-06-06 16:16:19.013980, Fuzzy           , Completed       , Checkpointer    , <NULL>, 0, 10864, 12890096, 94, 629145600, 80, 5586352, 35, 2198336, 2243848, <NULL> >
< 2014-06-06 16:06:18.671965, 2014-06-06 16:06:18.853700, Fuzzy           , Completed       , Checkpointer    , <NULL>, 1, 10863, 50556048, 91, 629145600, 74, 5296896, 27, 1934448, 1879304, <NULL> >
< 2014-06-06 15:56:18.577550, 2014-06-06 15:56:18.659042, Fuzzy           , Completed       , Checkpointer    , <NULL>, 0, 10863, 21179784, 91, 629145600, 76, 5504656, 25, 1689048, 1867016, <NULL> >
< 2014-06-06 15:46:18.444551, 2014-06-06 15:46:18.564260, Fuzzy           , Completed       , Checkpointer    , <NULL>, 1, 10862, 58908472, 91, 629145600, 76, 5504656, 33, 2225416, 2215176, <NULL> >
< 2014-06-06 15:36:18.280709, 2014-06-06 15:36:18.431814, Fuzzy           , Completed       , Checkpointer    , <NULL>, 0, 10862, 29515448, 91, 629145600, 76, 5504656, 33, 2225416, 2215176, <NULL> >
< 2014-05-18 10:22:21.430088, 2014-05-18 10:22:23.745732, Static          , Completed       , Subdaemon       , <NULL>, 1, 11831, 3753784, 775, 629145600, 763, 53787560, 775, 629145600, 53789960, <NULL> >
8 rows found.
发现CheckPoint的历史中前面几行的时间都是2014-06-09和2014-06-06，但是今天才2014-05-19号，难怪CheckPoint一直都没有执行。
3、检查sys.odbc.ini的配置和configuration配置，并允许ttconfig存储过程查看，CheckPoint的配置均正常，与客户确认操作系统时间有做过调整。
怀疑是客户修改操作系统时间引起，最后通过MetaLink文档 ID 1379020.1（Checkpointing Not Occurring）确认。

4、手动执行两次CheckPoint，CheckPoint正常执行，而且事务日志并清除；但是下次CheckPoint时间仍然不能自动执行CheckPoint。需要等到操作系统时间大于CkptHistory的时间才能正常自动执行CheckPoint。
5、解决办法：
有三种解决办法：
a)、通过ttBulkCp或ttMigrate导出数据，重建DSN后再重启导入数据。

b)、采用Crontab调度定时任务，定时执行CheckPoint。

c)、修改CheckPoint的方式为按照事务日志变化量自动发起CheckPoint，（如：call ttCkptConfig (0,1000,0); ）。

----------------------------------End---------------------------------------------
参考文档：

Checkpointing Not Occurring (文档 ID 1379020.1)

Applies to:
TimesTen Data Server - Version 7.0.5.0.0 to 11.2.1 [Release 7.0 to 11.2]
Information in this document applies to any platform.
This problem could potentially occur in any TimesTen data store.

Symptoms

-Customer reported that automatic checkpointing was not occurring in a production application. All attempts to restart checkpointing failed.

-The problem was occurring in 2 different data stores existing on the same server. The problem has been previously observed by another customer and occured on 8 different data stores running on the same server.

-The customer was using time-interval based checkpointing, which is the default TimesTen checkpoint configuration. The default is for the checkpointer to execute a checkpoint once every 600 seconds (10 minutes).
Changes

No changes were specifically made to either of the data stores themselves. However it turns out that an FE was making changes to they system clock while TimesTen was operational.
Cause

Checkpoint histories from both Nodes showed a checkpoint entry to be 8 years in the future. Both data stores had a checkpoint history showing that a checkpoint was performed on 11-Nov-2019:
.

/* NODE_1 */

Command> call ttCkptHistory;
< 2019-11-11 17:03:00.711478, 2019-11-11 17:03:23.156722, Fuzzy , Completed , Checkpointer , <NULL>, 0, 746589, 30757968, 212794, 3221225472, 210492, 2312255864, 212794, 3221225472, 2315922824, <NULL> >
< 2011-11-17 01:19:31.976204, 2011-11-17 01:19:32.489286, Fuzzy , Completed , Checkpointer , <NULL>, 0, 746498, 3216136, 212794, 3221225472, 210478, 2311956160, 28, 1373176, 1224072, <NULL> >
< 2011-11-17 01:14:31.758497, 2011-11-17 01:14:54.678123, Fuzzy , Completed , Checkpointer , <NULL>, 1, 746498, 3215984, 212794, 3221225472, 210478, 2311956160, 104516, 941230496, 1181552008, <NULL> >
< 2011-11-17 01:09:31.809785, 2011-11-17 01:09:56.464715, Fuzzy , Completed , Checkpointer , <NULL>, 0, 746498, 3148336, 212794, 3221225472, 210492, 2312255864, 119230, 1106716584, 1348410760, <NULL> >
< 2011-11-17 01:04:31.866130, 2011-11-17 01:04:56.486027, Fuzzy , Completed , Checkpointer , <NULL>, 1, 746493, 8825888, 212794, 3221225472, 210492, 2312255864, 122153, 1135240656, 1373654408, <NULL> >
< 2011-11-17 00:59:31.964454, 2011-11-17 00:59:56.541392, Fuzzy , Completed , Checkpointer , <NULL>, 0, 746487, 61057424, 212794, 3221225472, 210492, 2312255864, 124353, 1161729472, 1404247432, <NULL> >
< 2011-11-17 00:54:31.498796, 2011-11-17 00:54:58.654976, Fuzzy , Completed , Checkpointer , <NULL>, 1, 746482, 38247720, 212794, 3221225472, 210492, 2312255864, 153016, 1407583008, 1602108808, <NULL> >
< 2011-11-17 22:40:01.549312, 2011-11-17 22:40:25.057321, Fuzzy , Completed , User , <NULL>, 1, 749263, 14956640, 212793, 3221225472, 210490, 2312125496, 193103, 1889295352, 1961270664, <NULL> >
8 rows found.

/* NODE_2 */

Command> call ttCkptHistory;
< 2019-11-11 17:03:00.711478, 2019-11-11 17:03:23.156722, Fuzzy , Completed , Checkpointer , <NULL>, 0, 746589, 30757968, 212794, 3221225472, 210492, 2312255864, 212794, 3221225472, 2315922824, <NULL> >
< 2011-11-17 01:19:31.976204, 2011-11-17 01:19:32.489286, Fuzzy , Completed , Checkpointer , <NULL>, 0, 746498, 3216136, 212794, 3221225472, 210478, 2311956160, 28, 1373176, 1224072, <NULL> >
< 2011-11-17 01:14:31.758497, 2011-11-17 01:14:54.678123, Fuzzy , Completed , Checkpointer , <NULL>, 1, 746498, 3215984, 212794, 3221225472, 210478, 2311956160, 104516, 941230496, 1181552008, <NULL> >
< 2011-11-17 01:09:31.809785, 2011-11-17 01:09:56.464715, Fuzzy , Completed , Checkpointer , <NULL>, 0, 746498, 3148336, 212794, 3221225472, 210492, 2312255864, 119230, 1106716584, 1348410760, <NULL> >
< 2011-11-17 01:04:31.866130, 2011-11-17 01:04:56.486027, Fuzzy , Completed , Checkpointer , <NULL>, 1, 746493, 8825888, 212794, 3221225472, 210492, 2312255864, 122153, 1135240656, 1373654408, <NULL> >
< 2011-11-17 00:59:31.964454, 2011-11-17 00:59:56.541392, Fuzzy , Completed , Checkpointer , <NULL>, 0, 746487, 61057424, 212794, 3221225472, 210492, 2312255864, 124353, 1161729472, 1404247432, <NULL> >
< 2011-11-17 00:54:31.498796, 2011-11-17 00:54:58.654976, Fuzzy , Completed , Checkpointer , <NULL>, 1, 746482, 38247720, 212794, 3221225472, 210492, 2312255864, 153016, 1407583008, 1602108808, <NULL> >
< 2011-11-17 22:40:01.777759, 2011-11-17 22:40:28.797874, Fuzzy , Completed , User , <NULL>, 0, 749261, 15723336, 213130, 3221225472, 210827, 2315085008, 195219, 2827231536, 1997065608, <NULL> >
8 rows found.

Customer subsequently determined that changes had been made to the server system clock which resulted in a checkpoint being registered as having been performed on Nov 11, 2019, i.e., 8 years in the future.

Because a checkpoint resides in the checkpoint history data structure with a date 8 years in the future, the next time-interval based checkpoint will not occur until <ckpt time interval> + Nov 11, 2019. In this case that means that no automatic checkpoint will occur which is to say it won't happen until about 17:09 on Nov 11, 2019. Because of the logic used to update the internal checkpoint history structure, this bad checkpoint date will not be flushed out of the structure until 8 checkpoints at a time later than it have been performed. So unless customer chooses to rebuild his data stores, customer will have to operate for the next 8 years with a corrupted checkpoint history structure in the data store header.
Solution

Customer has the following possible solutions and workarounds:

(1) Rebuild the affected data stores. Export the data from the data store using ttBulkCp or ttMigrate, destroy the current data store, create a new data store with identical attributes as the old data store and import the data back in. This is the safest solution and also the most time-consuming solution.
(2) Enable a cron job which wakes up at a defined interval, connects to the data store and performs a manual checkpoint by calling 'ttCkpt' .

(3) Modify the automatic checkpointing algorithm of the data store so that it is dependent on accumulated transaction log volume instead of a time interval. If customer executes the following command in ttisql:

call ttCkptConfig (0,1000,0);

then the checkpointer will automatically execute a checkpoint each time the amount of transaction log data generated since the last checkpoint exceeds 1000 megabytes (1 gigabyte). Enabling a checkpoint algorithm based on accumulated log volume causes the checkpointer thread to ignore date stamp information in the checkpoint history structure, thus working around the date corruption in the checkpoint history. Check the TimesTen Reference Manual for more information on the use of 'ttCkptconfig' to change default checkpointing behavior.

References
BUG:13402829 - CORRUPTED DATES IN CHECKPOINT HISTORY ARE BLOCKING AUTOMATIC CHECKPOINTING
=======================End=================================================================