对pgpoo-II的pool_process_context的 proc_id 的理解

本文解析了pgpool进程上下文中的proc_id概念,并详细解释了其初始化过程及my_proc_id的赋值逻辑,揭示了pgpool进程管理的内部机制。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

已开始,我以为:pool_process_context 里面的 proc_id是 其 进程ID。其实满不是那么回事:

看Source:

/*                                        
 * Child process context:                                        
 * Manages per pgpool child process context                                        
 */                                        
typedef struct {                                        
    /*                                    
     * process start time, info on connection to backend etc. 
     */                                    
    ProcessInfo   *process_info;                        
    int proc_id;  /* Index to process table(ProcessInfo) (!= UNIX's PID) */        
                                        
    /*                                    
     * PostgreSQL server description. Placed on shared memory. 
     * Includes backend up/down info, hostname, data directory etc.
     */                                    
    BackendDesc *backend_desc;         
    int local_session_id;   /* local session id */                        
                                        
} POOL_PROCESS_CONTEXT;

再看 pool_init_process_context

/*                                    
 * Initialize per process context                                    
 */                                    
void pool_init_process_context(void){                                    
    process_context = &process_context_d; 
    if (!process_info){                                
        pool_error("pool_init_process_context: process_info is not set"); 
        child_exit(1);                            
    }                                
    process_context->process_info = process_info;                                
                                    
    if (!pool_config->backend_desc){                                
        pool_error("pool_init_process_context: backend_desc is not set"); 
        child_exit(1);                            
    }                                
    process_context->backend_desc = pool_config->backend_desc; 
    process_context->proc_id = my_proc_id;  
    process_context->local_session_id = 0;     /* initialize local session counter */            
} 

proc_id = my_proc_id。

而 my_proc_id 来自于 main.c 中生成子进程一段:

int my_proc_id;                        
/*                        
* fork a child                        
*/                        
pid_t fork_a_child(int unix_fd, int inet_fd, int id){                        
    pid_t pid;        
    pid = fork();      
    if (pid == 0){                    
        /* Before we unconditionally closed pipe_fds[0] and pipe_fds[1] 
         * here, which is apparently wrong since in the start up of                
         * pgpool, pipe(2) is not called yet and it mistakenly closes
         * fd 0. Now we check the fd > 0 before close(), expecting                
         * pipe returns fds greater than 0.  Note that we cannot                
         * unconditionally remove close(2) calls since fork_a_child()   
         * may be called *after* pgpool starting up.                
         */                
        if (pipe_fds[0] > 0){                
            close(pipe_fds[0]);            
            close(pipe_fds[1]);            
        }                
                        
        myargv = save_ps_display_args(myargc, myargv);                
                        
        /* call child main */                
        POOL_SETMASK(&UnBlockSig);                
        reload_config_request = 0;                
        my_proc_id = id;                
        run_as_pcp_child = false;                
        do_child(unix_fd, inet_fd);                
    }else if (pid == -1){                    
        pool_error("fork() failed. reason: %s", strerror(errno));                
        myexit(1);                
    }                    
    return pid;                    
}

具体来说,在 main.c的代码中,有如下一段。

        /* fork the children */                                    
        for (i=0;i<pool_config->num_init_children;i++){    
            process_info[i].pid = fork_a_child(unix_fd, inet_fd, i); 
            process_info[i].start_time = time(NULL);  
        } 

所以,如果pgpool.conf中 num_init_children为128(缺省值),

那么,每个子进程的 my_proc_id 就会分别是:0,1,2...127。

也就是说,每个子进程的  process_context->proc_id 就是0,1,2,...127。

PS D:\DATAJUICER> python data-juicer-main/tools/postprocess/count_token.py ` >> --data_path token.jsonl ` >> --text_keys text ` >> --tokenizer_method gpt2 ` >> --num_proc 1 2025-07-07 16:19:59.098 | INFO | __main__:prepare_tokenizer:22 - Loading tokenizer from HuggingFace... 2it [00:00, 4999.17it/s] 0%| | 0/2 [00:05<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "D:\software\python\Lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 16, in count_token_single num += len(TOKENIZER.tokenize(sample[key])) ^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'tokenize' """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 61, in <module> fire.Fire(main) File "D:\software\python\Lib\site-packages\fire\core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\fire\core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 55, in main token_count += res.get() ^^^^^^^^^ File "D:\software\python\Lib\multiprocessing\pool.py", line 774, in get raise self._value AttributeError: 'NoneType' object has no attribute 'tokenize'
07-08
2025-07-07 16:19:59.098 | INFO | __main__:prepare_tokenizer:22 - Loading tokenizer from HuggingFace... 2it [00:00, 4999.17it/s] 0%| | 0/2 [00:05<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "D:\software\python\Lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 16, in count_token_single num += len(TOKENIZER.tokenize(sample[key])) ^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'tokenize' """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 61, in <module> fire.Fire(main) File "D:\software\python\Lib\site-packages\fire\core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\fire\core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 55, in main token_count += res.get() ^^^^^^^^^ File "D:\software\python\Lib\multiprocessing\pool.py", line 774, in get raise self._value AttributeError: 'NoneType' object has no attribute 'tokenize' PS D:\DATAJUICER> python data-juicer-main/tools/postprocess/count_token.py ` >> --data_path token.jsonl ` >> --text_keys text ` >> --tokenizer_method ' D:/DATAJUICER/gpt2' ` >> --num_proc 1 2025-07-07 16:21:56.465 | INFO | __main__:prepare_tokenizer:22 - Loading tokenizer from HuggingFace... Traceback (most recent call last): File "D:\software\python\Lib\site-packages\transformers\utils\hub.py", line 470, in cached_files hf_hub_download( File "D:\software\python\Lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn validate_repo_id(arg_value) File "D:\software\python\Lib\site-packages\huggingface_hub\utils\_validators.py", line 154, in validate_repo_id raise HFValidationError( huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': ' D:/DATAJUICER/gpt2'. Use `repo_type` argument if needed. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 61, in <module> fire.Fire(main) File "D:\software\python\Lib\site-packages\fire\core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\fire\core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 35, in main prepare_tokenizer(tokenizer_method) File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 23, in prepare_tokenizer TOKENIZER = AutoTokenizer.from_pretrained(tokenizer_method, trust_remote_code=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 982, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 814, in get_tokenizer_config resolved_config_file = cached_file( ^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\transformers\utils\hub.py", line 312, in cached_file file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\transformers\utils\hub.py", line 523, in cached_files _get_cache_file_to_return(path_or_repo_id, filename, cache_dir, revision) for filename in full_filenames ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\transformers\utils\hub.py", line 140, in _get_cache_file_to_return resolved_file = try_to_load_from_cache(path_or_repo_id, full_filename, cache_dir=cache_dir, revision=revision) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn validate_repo_id(arg_value) File "D:\software\python\Lib\site-packages\huggingface_hub\utils\_validators.py", line 154, in validate_repo_id raise HFValidationError( huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': ' D:/DATAJUICER/gpt2'. Use `repo_type` argument if needed.
07-08
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值