记录本周一个诡异故障,大概是这个情况,我方程序定期查或者收对端一个数据项,判断是否有变更,如果有变更,我就提示给用户,让用户同步一下数据或者开启自动同步,有变更就触发同步。但是,近期对端说没有修改数据项,我方程序却频繁通知用户不一致需要同步,用户很恼火,要求追查原因。
数据是否改变的逻辑是靠修改时间确定的,定期报上来的其实就是两项:数据项名称+整型的时间值。这个时间值应该是unix典型的时间,从1970.1.1开始的秒数,通过时间转换函数转换为年月日时分秒。这个逻辑运行多年没有出过问题,查日志发现偶发同一个数据,日期不同。这两个日期有一个客户和对端的日期对上,另外一个对不上,我怀疑对端有两个日期,对端程序员说就一份数据,很奇怪。
既然双方都认为没有问题,只能靠日志报文检查了,因为偶发,还很难抓住,原来也出过类似问题,都不了了之了,这次想证明是对方问题,就下功夫抓报文,终于抓到了,结果大跌眼镜,同样的整数值,时间转换函数偶发转换成另外的值了,竟然是我们这边的问题。
仔细核对了出错代码:
time1 = 从对端TCP读
struct tm* tt_tm;
tt_tm=ACE_OS::gmtime(&time1);
ACE_OS::strftime(strTime, "%Y%m%d%H%M%S", tt_tm);
就两行代码,还会出错?
ACE其实是个老古董了,封装的是操作系统libc的库,问了一下豆包,说可能线程不安全,但也不确定,建议看ACE源码,看来ACE相关的信息太少,大模型都给不出确切答案。直接查ACE代码:
ACE_OS::gmtime(const time_t *t) {
ACE_OSCALL_RETURN (::gmtime(t), struct tm *,0);
}
glibc-2.38/time/gmtime.c
__gmtime64(const __time64_t *t) {
return __tz_convert(*t, 0, &_tmbuf);
}
struct tm*
gmtime(const time_t *t)
{
__time64_t t64 = *t;
return __gmtime64(&t64);
}
glibc-2.38/time/localtime.c
#include <time.h>
struct tm _tmbuf
破案了,_tmbuf是个全局变量,大家都用到,多线程并发肯定就乱了。
豆瓣提示可以用ACE_OS::gmtime_r,或者使用boost的时间函数boost/date_time,都是线程安全的。
做了个简单实验:程序有点笨,主要是检验功能。
使用pthread,clion里面编译缺省没有,需要在CMakeLists.txt中增加一行:
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")
// test gmtime and gmtime_r
#include <iostream>
#include <pthread.h>
#include <time.h>
#include <unistd.h>
pthread_t pthid1,pthid2,pthid3,pthid4;
void *pth1_main(void *arg);
void *pth2_main(void *arg);
void *pth3_main(void *arg);
void *pth4_main(void *arg);
int main() {
std::cout << "test gmtime" << std::endl;
if(pthread_create(&pthid1,NULL,pth1_main,(void*)0)!=0)
{
printf("create pthid1 failed\n");
return -1;
}
// sleep(1);
if(pthread_create(&pthid2,NULL,pth2_main,(void*)0)!=0)
{
printf("create pthid2 failed\n");
return -1;
}
if(pthread_create(&pthid3,NULL,pth3_main,(void*)0)!=0)
{
printf("create pthid3 failed\n");
return -1;
}
if(pthread_create(&pthid4,NULL,pth4_main,(void*)0)!=0)
{
printf("create pthid4 failed\n");
return -1;
}
// printf("thread1\n");
pthread_join(pthid1,NULL);
// printf("thread2\n");
pthread_join(pthid2,NULL);
pthread_join(pthid3,NULL);
pthread_join(pthid4,NULL);
printf("waiting...\n");
return 0;
}
void *pth1_main(void *arg) {
printf("thread1\n");
time_t time1 = 0x65a904bb;
struct tm *timeinfo;
timeinfo = gmtime(&time1);
printf("1Year: %d ", timeinfo->tm_year + 1900);
printf("1Month: %d ", timeinfo->tm_mon + 1);
printf("1Day: %d ", timeinfo->tm_mday);
printf("1Hour: %d ", timeinfo->tm_hour);
printf("1Minute: %d ", timeinfo->tm_min);
printf("1Second: %d\n", timeinfo->tm_sec);
if(timeinfo->tm_sec != 11) {
printf("thread1 error\n");
}
pthread_exit(NULL);
}
void *pth2_main(void *arg){
printf("thread2\n");
time_t time1 = 0x64dface1;
struct tm *timeinfo;
timeinfo = gmtime(&time1);
printf("2Year: %d ", timeinfo->tm_year + 1900);
printf("2Month: %d ", timeinfo->tm_mon + 1);
printf("2Day: %d ", timeinfo->tm_mday);
printf("2Hour: %d ", timeinfo->tm_hour);
printf("2Minute: %d ", timeinfo->tm_min);
printf("2Second: %d\n", timeinfo->tm_sec);
if(timeinfo->tm_sec != 45)
printf("thread2 error\n");
pthread_exit(NULL);
}
void *pth3_main(void *arg) {
printf("thread3\n");
time_t time1 = 0x65a904bc;
struct tm timeinfo;
gmtime_r(&time1, &timeinfo);
printf("3Year: %d ", timeinfo.tm_year + 1900);
printf("3Month: %d ", timeinfo.tm_mon + 1);
printf("3Day: %d ", timeinfo.tm_mday);
printf("3Hour: %d ", timeinfo.tm_hour);
printf("3Minute: %d ", timeinfo.tm_min);
printf("3Second: %d\n", timeinfo.tm_sec);
if(timeinfo.tm_sec != 12) {
printf("thread3 error\n");
}
pthread_exit(NULL);
}
void *pth4_main(void *arg){
printf("thread4\n");
time_t time1 = 0x64dface2;
struct tm timeinfo;
gmtime_r(&time1, &timeinfo);
printf("4Year: %d ", timeinfo.tm_year + 1900);
printf("4Month: %d ", timeinfo.tm_mon + 1);
printf("4Day: %d ", timeinfo.tm_mday);
printf("4Hour: %d ", timeinfo.tm_hour);
printf("4Minute: %d ", timeinfo.tm_min);
printf("4Second: %d\n", timeinfo.tm_sec);
if(timeinfo.tm_sec != 46)
printf("thread4 error\n");
pthread_exit(NULL);
}
果然很容易复现问题,线程都不用循环,thread1线程就乱了。boost没有实验,不太熟悉,据说也没有问题。
我想试试windows有没有问题,用我最新安装的VS2022,发现人家直接给提示gmtime已废弃,编译不过,提示替换为gmtime_s(晕,和unix还后缀不同),替换后才编译过了,可以正常运行,多线程安全
#include <iostream>
#include <time.h>
#include <thread>
void f1() {
time_t time1 = 0x65a904bb;
//struct tm* timeinfo;
struct tm timeinfo_s;
struct tm* timeinfo = &timeinfo_s;
//timeinfo = gmtime(&time1);
gmtime_s(&timeinfo_s,&time1);
printf("1Year: %d ", timeinfo->tm_year + 1900);
printf("1Month: %d ", timeinfo->tm_mon+1);
printf("1Day: %d ", timeinfo->tm_mday);
printf("1Second: %d\n", timeinfo->tm_sec);
}
void f2() {
time_t time1 = 0x64dface1;
struct tm timeinfo_s;
struct tm* timeinfo = &timeinfo_s;
//timeinfo = gmtime(&time1);
gmtime_s(&timeinfo_s, &time1);
printf("2Year: %d ", timeinfo->tm_year + 1900);
printf("2Month: %d ", timeinfo->tm_mon + 1);
printf("2Day: %d ", timeinfo->tm_mday);
printf("2Second: %d\n", timeinfo->tm_sec);
}
int main() {
std::cout << "test gmtime" << std::endl;
std::thread t1(f1);
std::thread t2(f2);
t1.join();
t2.join();
return 0;
}
看来有个强大的C++ IDE编译器还是很重要的。