Linux/FreeBSD下用C语言开发PHP的so扩展模块例解

本文详细介绍如何使用C语言为PHP开发扩展模块,通过示例代码展示了如何实现字符串编码转换等功能。

用C语言编写PHP的扩展模块的方法有几种,根据最后的表现形式有两种,一种是直接编译进php,一种是编译为php的so扩展模块来被php调用,另外根据编译的方式有两种,一种使用phpize工具(php编译后有的),一种使用ext_skel工具(php自带的),我们使用最多,也是最方便的方式就是使用ext_skel工具来编写php的so扩展模块,这里也主要介绍这种方式。

我们在php的源码目录里面可以看到有个ext目录(我这里说的php都是基于Linux平台的php来说的,不包括windows下的),在 ext目录下有个工具 ext_skel ,这个工具可以让我们简单的开发出php的扩展模块,它提供了一个通用的php扩展模块开发步骤和模板。下面我们以开发一个在php里面进行 utf8/gbk/gb2312三种编码转换的扩展模块为例子进行说明。在这个模块中,我们要最终提供以下几个函数接口:

(1) string toplee_big52gbk(string s)
将输入字符串从BIG5码转换成GBK
(2) string toplee_gbk2big5(string s)
将输入字符串从GBK转换成BIG5码
(3) string toplee_normalize_name(string s)
将输入字符串作以下处理:全角转半角,strim,大写转小写
(4) string toplee_fan2jian(int code, string s)
将输入的GBK繁体字符串转换成简体
(5) string toplee_decode_utf(string s)
将utf编码的字符串转换成UNICODE
(6) string toplee_decode_utf_gb(string s)
将utf编码的字符串转换成GB
(7) string toplee_decode_utf_big5(string s)
将utf编码的字符串转换成BIG5
(8) string toplee_encode_utf_gb(string s)
将输入的GBKf编码的字符串转换成utf编码

首先,我们进入ext目录下,运行下面命令:
#./ext_skel –extname=toplee
这时,php会自动在ext目录下为我们生成一个目录toplee,里面包含下面几个文件
.cvsignore
CREDITS
EXPERIMENTAL
config.m4
php_toplee.h
tests
toplee.c
toplee.php

其中最有用的就是config.m4和toplee.c文件
接下来我们修改config.m4文件
#vi ./config.m4
找到里面有类似这样几行

dnl PHP_ARG_WITH(toplee, for toplee support,
dnl Make sure that the comment is aligned:
dnl [  --with-toplee             Include toplee support])

dnl Otherwise use enable:

dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [  --enable-toplee           Enable toplee support])

上面的几行意思是说告诉php编译的使用使用那种方式加载我们的扩展模块toplee,我们使用–with-toplee的方式,于是我们修改为下面的样子

PHP_ARG_WITH(toplee, for toplee support,
Make sure that the comment is aligned:
[  --with-toplee             Include toplee support])

dnl Otherwise use enable:

dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [  --enable-toplee           Enable toplee support])

然后我们要做的关键事情就是编写toplee.c,这个是我们编写模块的主要文件,如果您什么都不修改,其实也完成了一个php扩展模块的编写,里面有类似下面的几行代码

PHP_FUNCTION(confirm_toplee_compiled)
{
        
char *arg = NULL;
        
int arg_len, len;
        
char string[256];
 
        
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &arg, &arg_len) == FAILURE) {
                
return;
        
}
 
        
len = sprintf(string, "Congratulations! You have successfully modified ext/%.78s/config.m4. Module %.78s is now compiled into PHP.", "toplee", arg);
        
RETURN_STRINGL(string, len, 1);
}

如果我们在后面完成php的编译时把新的模块编译进去,那么我们就可以在php脚本中调用函数toplee(),它会输出一段字符串 “Congratulations! You have successfully modified ext/toplee/config.m4. Module toplee is now compiled into PHP.”


 


下面是我们对toplee.c的修改,让其支持我们预先规划的功能和接口,下面是toplee.c的源代码

/*
  +----------------------------------------------------------------------+
  | PHP Version 4                                                        |
  +----------------------------------------------------------------------+
  | Copyright (c) 1997-2002 The PHP Group                                |
  +----------------------------------------------------------------------+
  | This source file is subject to version 2.02 of the PHP license,      |
  | that is bundled with this package in the file LICENSE, and is        |
  | available at through the world-wide-web at                           |
  | http://www.php.net/license/2_02.txt.                                 |
  | If you did not receive a copy of the PHP license and are unable to   |
  | obtain it through the world-wide-web, please send a note to          |
  | license@php.net so we can mail you a copy immediately.               |
  +----------------------------------------------------------------------+
  | Author:                                                              |
  +----------------------------------------------------------------------+
 
 
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
*/

 
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
 
#include "php.h"
#include "php_ini.h"
#include "ext/standard/info.h"
#include "php_gbk.h"
#include "toplee_util.h"
 
/* If you declare any globals in php_gbk.h uncomment this:
ZEND_DECLARE_MODULE_GLOBALS(gbk)
*/

 
/* True global resources - no need for thread safety here */
static int le_gbk;
 
/* {{{ gbk_functions[]
 *
 * Every user visible function must have an entry in gbk_functions[].
 */

function_entry gbk_functions[] = {
    
PHP_FE(toplee_decode_utf,    NULL)
    
PHP_FE(toplee_decode_utf_gb,    NULL)
    
PHP_FE(toplee_decode_utf_big5,    NULL)
    
PHP_FE(toplee_encode_utf_gb,    NULL)
 
    
PHP_FE(toplee_big52gbk,    NULL)
    
PHP_FE(toplee_gbk2big5,    NULL)
    
PHP_FE(toplee_fan2jian,    NULL)
    
PHP_FE(toplee_normalize_name,    NULL)
    
{NULL, NULL, NULL}    /* Must be the last line in gbk_functions[] */
};
/* }}} */
 
/* {{{ gbk_module_entry
 */

zend_module_entry gbk_module_entry = {
#if ZEND_MODULE_API_NO >= 20010901
    
STANDARD_MODULE_HEADER,
#endif
    
"gbk",
    
gbk_functions,
    
PHP_MINIT(gbk),
    
PHP_MSHUTDOWN(gbk),
    
PHP_RINIT(gbk),        /* Replace with NULL if there's nothing to do at request start */
    
PHP_RSHUTDOWN(gbk),    /* Replace with NULL if there's nothing to do at request end */
    
PHP_MINFO(gbk),
#if ZEND_MODULE_API_NO >= 20010901
    
"0.1", /* Replace with version number for your extension */
#endif
    
STANDARD_MODULE_PROPERTIES
};
/* }}} */
 
#ifdef COMPILE_DL_GBK
ZEND_GET_MODULE(gbk)
#endif
 
/* {{{ PHP_INI
 */

/* Remove comments and fill if you need to have entries in php.ini*/
PHP_INI_BEGIN()
    
PHP_INI_ENTRY("gbk2uni",            "",        PHP_INI_SYSTEM,    NULL)
    
PHP_INI_ENTRY("uni2gbk",            "",        PHP_INI_SYSTEM,    NULL)
    
PHP_INI_ENTRY("uni2big5",            "",        PHP_INI_SYSTEM,    NULL)
    
PHP_INI_ENTRY("big52uni",            "",        PHP_INI_SYSTEM,    NULL)
    
PHP_INI_ENTRY("big52gbk",            "",        PHP_INI_SYSTEM,    NULL)
    
PHP_INI_ENTRY("gbk2big5",            "",        PHP_INI_SYSTEM,    NULL)
//    STD_PHP_INI_ENTRY("gbk.global_value",      "42", PHP_INI_ALL, OnUpdateInt, global_value, zend_gbk_globals, gbk_globals)
//    STD_PHP_INI_ENTRY("gbk.global_string", "foobar", PHP_INI_ALL, OnUpdateString, global_string, zend_gbk_globals, gbk_globals)
PHP_INI_END()
 
/* }}} */
 
/* {{{ php_gbk_init_globals
 */

/* Uncomment this function if you have INI entries
static void php_gbk_init_globals(zend_gbk_globals *gbk_globals)
{
    gbk_globals->global_value = 0;
    gbk_globals->global_string = NULL;
}
*/

/* }}} */
 
char gbk2uni_file[256];
char uni2gbk_file[256];
char big52uni_file[256];
char uni2big5_file[256];
char gbk2big5_file[256];
char big52gbk_file[256];
 
//utf file init flag
static int initutf=0;
 
/* {{{ PHP_MINIT_FUNCTION
 */

PHP_MINIT_FUNCTION(gbk)
{
    
/* If you have INI entries, uncomment these lines
    ZEND_INIT_MODULE_GLOBALS(gbk, php_gbk_init_globals, NULL);*/

    
REGISTER_INI_ENTRIES();
    
memset(gbk2uni_file, 0, sizeof(gbk2uni_file));
    
memset(uni2gbk_file, 0, sizeof(uni2gbk_file));
    
memset(big52uni_file, 0, sizeof(big52uni_file));
    
memset(uni2big5_file, 0, sizeof(uni2big5_file));
    
memset(gbk2big5_file, 0, sizeof(gbk2big5_file));
    
memset(big52gbk_file, 0, sizeof(big52gbk_file));
    
    
strncpy(gbk2uni_file, INI_STR("gbk2uni"), sizeof(gbk2uni_file)-1);
    
strncpy(uni2gbk_file, INI_STR("uni2gbk"), sizeof(uni2gbk_file)-1);
    
strncpy(big52uni_file, INI_STR("big52uni"), sizeof(big52uni_file)-1);
    
strncpy(uni2big5_file, INI_STR("uni2big5"), sizeof(uni2big5_file)-1);
    
strncpy(gbk2big5_file, INI_STR("gbk2big5"), sizeof(uni2big5_file)-1);
    
strncpy(big52gbk_file, INI_STR("big52gbk"), sizeof(uni2big5_file)-1);
 
    
//InitMMResource();
    
InitResource();
    
if ((uni2gbk_file[0] == '/0') || (uni2big5_file[0] == '/0')
      ||
(gbk2big5_file[0] == '/0') || (big52gbk_file[0] == '/0')
      ||
(gbk2uni_file[0] == '/0') || (big52uni_file[0] == '/0'))
    
{
        
return FAILURE;
    
}
 
    
if (gbk2uni_file[0] != '/0')
    
{
        
if (LoadOneCodeTable(CODE_GBK2UNI, gbk2uni_file) != NULL)
        
{
            
toplee_cleanup_mmap(NULL);
            
return FAILURE;
        
}
    
}
 
    
if (uni2gbk_file[0] != '/0')
    
{
        
if (LoadOneCodeTable(CODE_UNI2GBK, uni2gbk_file) != NULL)
        
{
            
toplee_cleanup_mmap(NULL);
            
return FAILURE;
        
}
    
}
 
    
if (big52uni_file[0] != '/0')
    
{
        
if (LoadOneCodeTable(CODE_BIG52UNI, big52uni_file) != NULL)
        
{
            
toplee_cleanup_mmap(NULL);
            
return FAILURE;
        
}
    
}
 
    
if (uni2big5_file[0] != '/0')
    
{
        
if (LoadOneCodeTable(CODE_UNI2BIG5, uni2big5_file) != NULL)
        
{
            
toplee_cleanup_mmap(NULL);
            
return FAILURE;
        
}
    
}
    
    
if (gbk2big5_file[0] != '/0')
    
{
        
if (LoadOneCodeTable(CODE_GBK2BIG5, gbk2big5_file) != NULL)
        
{
            
toplee_cleanup_mmap(NULL);
            
return FAILURE;
        
}
    
}
 
    
if (big52gbk_file[0] != '/0')
    
{
        
if (LoadOneCodeTable(CODE_BIG52GBK, big52gbk_file) != NULL)
        
{
            
toplee_cleanup_mmap(NULL);
            
return FAILURE;
        
}
    
}
    
    
initutf = 1;
    
return SUCCESS;
}
/* }}} */
 
/* {{{ PHP_MSHUTDOWN_FUNCTION
 */

PHP_MSHUTDOWN_FUNCTION(gbk)
{
    
/* uncomment this line if you have INI entries*/
    
UNREGISTER_INI_ENTRIES();
    
    
toplee_cleanup_mmap(NULL);
    
return SUCCESS;
}
/* }}} */
 
/* Remove if there's nothing to do at request start */
/* {{{ PHP_RINIT_FUNCTION
 */

PHP_RINIT_FUNCTION(gbk)
{
    
return SUCCESS;
}
/* }}} */
 
/* Remove if there's nothing to do at request end */
/* {{{ PHP_RSHUTDOWN_FUNCTION
 */

PHP_RSHUTDOWN_FUNCTION(gbk)
{
    
return SUCCESS;
}
/* }}} */
 
/* {{{ PHP_MINFO_FUNCTION
 */

PHP_MINFO_FUNCTION(gbk)
{
    
php_info_print_table_start();
    
php_info_print_table_header(2, "gbk support", "enabled");
    
php_info_print_table_end();
 
    
/* Remove comments if you have entries in php.ini*/
    
DISPLAY_INI_ENTRIES();
    
}
/* }}} */
 
 
/* Remove the following function when you have succesfully modified config.m4
   so that your module can be compiled into PHP, it exists only for testing
   purposes. */

 
/* {{{ proto  toplee_decode_utf(string s)
    */

PHP_FUNCTION(toplee_decode_utf)
{
    
char *s = NULL, *t=NULL;
    
int argc = ZEND_NUM_ARGS();
    
int s_len;
 
    
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
        
return;
 
    
if (!initutf)
        
RETURN_FALSE
    
t = strdup(s);
    
if (t==NULL)
        
RETURN_FALSE
 
 
    
DecodePureUTF(t, KEEP_UNICODE);
    
RETVAL_STRING(t,1);
    
free(t);
    
return;
}
/* }}} */
 
/* {{{ proto  toplee_decode_utf_gb(string s)
    */

PHP_FUNCTION(toplee_decode_utf_gb)
{
    
char *s = NULL, *t=NULL;
    
int argc = ZEND_NUM_ARGS();
    
int s_len;
 
    
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
        
return;
 
    
if (!initutf)
        
RETURN_FALSE
    
t = strdup(s);
    
if (t==NULL)
        
RETURN_FALSE
 
    
DecodePureUTF(t, DECODE_UNICODE);
    
RETVAL_STRING(t,1);
    
free(t);
    
return;
 
}
/* }}} */
 
/* {{{ proto  toplee_decode_utf_big5(string s)
    */

PHP_FUNCTION(toplee_decode_utf_big5)
{
    
char *s = NULL, *t=NULL;
    
int argc = ZEND_NUM_ARGS();
    
int s_len;
 
    
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
        
return;
 
    
if (!initutf)
        
RETURN_FALSE
    
t = strdup(s);
    
if (t==NULL)
        
RETURN_FALSE
 
 
    
DecodePureUTF(t, DECODE_UNICODE | DECODE_BIG5);
    
RETVAL_STRING(t,1);
    
free(t);
    
return;
}
/* }}} */
int EncodePureUTF(unsigned char* strSrc,
    
unsigned char* strDst, int nDstLen, int nFlag)
{
    
int nRet;
    
int pos;
    
unsigned short c;
    
unsigned short* uBuf;
    
int nSize;
    
int nLen;
    
int nReturn;
 
    
nLen=strlen((const char*)strSrc);
    
if(nDstLen < nLen*2+1)
        
return 0;
 
    
nSize=nLen+1;
    
uBuf=(unsigned short*)emalloc(sizeof(unsigned short)*nSize);
 
    
nRet=MultiByteToWideChar(936, 0, (const char*)strSrc, strlen((const char*)strSrc),
        
uBuf, nSize);
 
    
nReturn=0;
    
pos=nRet;
    
while(pos>0)
    
{
        
c = *uBuf;
        
if (c < 0x80) {
            
strDst[nReturn++] = (char) c;
        
} else if (c < 0x800) {
            
strDst[nReturn++] = (0xc0 | (c >> 6));
            
strDst[nReturn++] = (0x80 | (c & 0x3f));
        
} else if (c < 0x10000) {
            
strDst[nReturn++] = (0xe0 | (c >> 12));
            
strDst[nReturn++] = (0x80 | ((c >> 6) & 0x3f));
            
strDst[nReturn++] = (0x80 | (c & 0x3f));
        
} else if (c < 0x200000) {
            
strDst[nReturn++] = (0xf0 | (c >> 18));
            
strDst[nReturn++] = (0x80 | ((c >> 12) & 0x3f));
            
strDst[nReturn++] = (0x80 | ((c >> 6) & 0x3f));
            
strDst[nReturn++] = (0x80 | (c & 0x3f));
        
}
        
pos--;
        
uBuf++;
    
}
    
strDst[nReturn]='/0';
 
    
return nReturn;
}
 
/* {{{ proto  toplee_encode_utf_gb(string s)
    */

PHP_FUNCTION(toplee_encode_utf_gb)
{
    
char *s = NULL;
    
int argc = ZEND_NUM_ARGS();
    
int s_len;
    
char* sRet;
 
    
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
        
return;
 
    
if (!initutf)
        
RETURN_FALSE
    
sRet=emalloc(strlen(s)*2+1);
 
    
EncodePureUTF(s, sRet, strlen(s)*2+1, 0);
    
RETVAL_STRING(sRet,1);
    
return;
}
/* }}} */
 
 
/* {{{ proto  toplee_big52gbk(string s)
    */

PHP_FUNCTION(toplee_big52gbk)
{
    
char *s = NULL;
    
int argc = ZEND_NUM_ARGS();
    
int s_len;
    
char* sRet = NULL;
 
    
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
        
return;
 
    
if (!initutf)
        
RETURN_FALSE
        
    
sRet=estrdup(s);
    
if (NULL == sRet)
        
RETURN_FALSE
 
    
BIG52GBK(sRet, strlen(sRet));
    
RETVAL_STRING(sRet,1);
    
return;
}
/* }}} */
 
/* {{{ proto  toplee_gbk2big5(string s)
    */

PHP_FUNCTION(toplee_gbk2big5)
{
    
char *s = NULL;
    
int argc = ZEND_NUM_ARGS();
    
int s_len;
    
char* sRet = NULL;
 
    
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
        
return;
 
    
if (!initutf)
        
RETURN_FALSE
        
    
sRet=estrdup(s);
    
if (NULL == sRet)
        
RETURN_FALSE
 
    
GBK2BIG5(sRet, strlen(sRet));
    
RETVAL_STRING(sRet,1);
    
return;
}
/* }}} */
 
/* {{{ proto  toplee_normalize_name(string s)
    */

PHP_FUNCTION(toplee_normalize_name)
{
    
char *s = NULL;
    
int argc = ZEND_NUM_ARGS();
    
int s_len;
    
char* sRet = NULL;
 
    
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
        
return;
 
    
if (!initutf)
        
RETURN_FALSE
        
        
NormalizeName( s );   
 
        
RETURN_STRING(s, 1 );
 
        
    
return;
}
/* }}} */
 
/* {{{ proto  toplee_fan2jian(int code, string s)
    */

PHP_FUNCTION(toplee_fan2jian)
{
    
char *s = NULL;
    
int argc = ZEND_NUM_ARGS();
    
int s_len, code;
    
char* sRet = NULL;
    
char *pSource;
        
char *pDest1=NULL, *pDest2=NULL;
        
int nSourceLen, nDestLen;
 
    
if (zend_parse_parameters(argc TSRMLS_CC, "ls", &code, &s, &s_len) == FAILURE)
        
return;
 
    
if (!initutf)
        
RETURN_FALSE
 
        
pSource = s;
        
nSourceLen = s_len;
        
pDest1 = malloc(nSourceLen * 2);
        
pDest2 = malloc(nSourceLen+1);
        
if (NULL == pDest1 || NULL == pDest2)
                
goto _f2j_err;
 
        
memset(pDest1, 0, nSourceLen * 2);
        
memset(pDest2, 0, nSourceLen + 1);
        
nDestLen = MultiByteToWideChar(code, 0, pSource, nSourceLen, (short *)pDest1, nSourceLen * 2);
        
        
if (0 >= nDestLen)
                
goto _f2j_err;
                
        
nDestLen = WideCharToMultiByte(code, 0, (short *)pDest1, nDestLen, pDest2, nSourceLen, NULL, NULL);
        
if (0 >= nDestLen)
                
goto _f2j_err;
 
        
RETVAL_STRING(pDest2, 1);
        
if (pDest1 != NULL)
                
free(pDest1);
        
if (pDest2 != NULL)
                
free(pDest2);
        
return;
 
_f2j_err:
        
if (pDest1 != NULL)
                
free(pDest1);
        
if (pDest2 != NULL)
                
free(pDest2);
        
RETURN_FALSE;
}
/* }}} */
 
/*
 * Local variables:
 * tab-width: 4
 * c-basic-offset: 4
 * End:
 * vim600: noet sw=4 ts=4 fdm=marker
 * vim<600: noet sw=4 ts=4
 */

事实上我们在这个文件里面定义了所有我们要实现的接口,剩下的部分就是我们再编写几个具体实现的C语言代码,有关C具体实现的技术细节就不在此讨论,有个关键的大家注意就是,您可以在ext/toplee目录下加入您所有用于实现您在toplee.c里面定义的接口的C源文件和头文件,让 toplee.c在编译的时候可以调用到,这些都是标准的C语言语法。Michael就不另说,下我把我们实现的几个代码都贴出来:
chn_util.h

#ifndef __CHN_UTIL_H__
#define __CHN_UTIL_H__
 
#include "common.h"
 
#define    LANG_GB            1
#define LANG_B5            2
 
#define GB_FULL_COUNT    (20+26*2+5+4+26)
#define B5_FULL_COUNT    (20+26*2+5+4+24)
 
BOOL FullToHalf(char *str, int nLang);
 
void LowerString(char* str);
 
void TrimString(char* str);
 
#endif // __CHN_UTIL_H__

chn_util.c

#include <stdio.h>
#include <assert.h>
#include <string.h>
#include "common.h"
#include "chn_util.h"
 
 
// 0123456789 !@()-_+'<>
static char *GBFull[GB_FULL_COUNT] =
        
{"", "", "", "", "", "", "", "", "", "",
        
" ", "", "", "", "", "_", "", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "",
        
"", "·", "", "", "",
        
"", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", ""
};
 
static char GBEnHalf[GB_FULL_COUNT+1] =
        
"0123456789 @()-_+/'<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
        
"....&<<>>,,;;::/?/?!!-/'/'/"/"~:`|[]{}#$%";
 
// ⒈⒉⒊⒋⒌⒍⒎⒏⌒∨∠ˇ≌≈
static char *B5Full[B5_FULL_COUNT] =
        
{"", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "ˇ", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"", "", "", "", "", "", "", "",
        
"", "", "", "", "",
        
"", "", "", "",
        
"", "", "", "", "", "", "", "", "", "", "",
        
"ˉ", "ˇ", "¨", "", "°", "", "", "", "", "", "",
        
"", ""
};
 
static char B5EnHalf[B5_FULL_COUNT+1] =
        
"0123456789 @()-_+/'<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
        
"....&<<>>,,;;::/?/?!!-/'/'/"/"~|[]{}#$%";
 
 
static int _bFHSortFlag=0;
 
static void _sorttable(char* tableFull[], char* tableHalf, int nSize)
{
    
int i,j;
    
char* p;
    
char cTemp;
 
    
for(i=0;i<nSize;i++)
    
{
        
for(j=i+1;j<nSize;j++)
        
{
            
if(strcmp(tableFull[i],tableFull[j])<0)
            
{
                
p=tableFull[i];
                
tableFull[i]=tableFull[j];
                
tableFull[j]=p;
                
cTemp=tableHalf[i];
                
tableHalf[i]=tableHalf[j];
                
tableHalf[j]=cTemp;
            
}
        
}
    
}
}
 
BOOL FullToHalf(char *str, int nCodePage)
{
    
char *pSrc = str;
    
char *pDest = str;
    
char **pFull;
    
char *pEnHalf;
    
int nCount;
    
BOOL bContinue = FALSE;
    
int nHigh,nLow,nMid,nResult;
 
    
if(!_bFHSortFlag)
    
{
        
_sorttable(GBFull,GBEnHalf, GB_FULL_COUNT);
        
_sorttable(B5Full,B5EnHalf, B5_FULL_COUNT);
        
_bFHSortFlag=1;
    
}
 
    
assert(NULL != str);
    
if ((LANG_GB == nCodePage) || (936==nCodePage))
    
{
        
pFull = GBFull;
        
pEnHalf = GBEnHalf;
        
nCount = GB_FULL_COUNT;
    
}
    
else if ((LANG_B5 == nCodePage) || (950==nCodePage))
    
{
        
pFull = B5Full;
        
pEnHalf = B5EnHalf;
        
nCount = B5_FULL_COUNT;
    
}
    
else
    
{
        
assert( FALSE );
        
return FALSE;
    
}
 
    
while ('/0' != *pSrc)
    
{
        
if (0x81 <= (BYTE)*pSrc)
        
{
            
//    改用二分法,可以极大提高效率
            
nLow=0;
            
nHigh=nCount-1;
            
while(nLow <= nHigh)
            
{
                
nMid = (nLow+nHigh) / 2;
                
nResult = strncmp(pSrc, pFull[nMid], 2);;
                
if( 0 == nResult)
                
{
                    *
pDest++ = pEnHalf[nMid];
                    
pSrc+=2;
                    
bContinue=TRUE;
                    
break;
                
}
                
if( nResult > 0)
                    
nHigh=nMid-1;
                
else
                    
nLow=nMid+1;
            
}
 
            
if( !bContinue )
            
{
                
// 判断其他符号
                
if( ( 0xA1 <= (BYTE)*pSrc ) &&
                    
( 0xA9 >= (BYTE)*pSrc ) )
                
{
                    *
pDest++ = ' ';
                    
pSrc+=2;
                    
bContinue=TRUE;
                
}
            
}
 
/*            for (nIndex = 0; nIndex < nCount; nIndex++)
            {
                assert(NULL != pFull[nIndex]);
                if (NULL != pFull[nIndex])
                {
                    if (0 == strncmp(pSrc, pFull[nIndex], 2))
                    {
                        *pDest++ = pEnHalf[nIndex];    // convert full to half
                        pSrc += 2;
 
                        bContinue = TRUE;
                        break;
                    }
                }
            }*/

 
            
if (bContinue)
            
{
                
bContinue = FALSE;
                
continue;
            
}
 
            *
pDest++ = *pSrc++;    // copy head char, and the next statement copy tail char
            
if(*pSrc == '/0')
                
break;
        
}
        
        *
pDest++ = *pSrc++;    // ascii code
    
}
 
    *
pDest = '/0';
    
return TRUE;
}
 
BOOL MyIsDBCSLeadByte(BYTE TestChar)
{
    
if((TestChar>0X80) && (TestChar<0xFF))
        
return TRUE;
    
else
        
return FALSE;
}
 
 
void LowerString(char* str)
{
    
while(*str)
    
{
        
if(!MyIsDBCSLeadByte(*str))
        
{       
            
if( (*str>='A') && (*str<='Z') )
                *
str = (char)(*str+('a'-'A'));
        
}
        
else
        
{
            
str++;
            
if(!*str)
                
break;
        
}
        
str++;
    
}
    
return ;
}
 
BOOL myisspace(char c)
{
    
return ((c==' ') || (c=='/t') || (c=='/r') || (c=='/n'));   
}
 
void TrimString(char* str)
{
    
char*    pDst;
    
char*    pSrc;
    
char*    pLast;
    
char    cCurrent;
    
int    nState;
 
 
    
pLast=pDst=pSrc=str;
    
nState=0;
 
    
while(*pSrc)
    
{
        
cCurrent=*pSrc;
        
switch(nState)
        
{
        
case 0:
            
if(!myisspace(cCurrent))
            
{
                
nState=1;
                
continue;
            
}
            
break;
        
case 1:
            
if(myisspace(cCurrent))
            
{
                
nState=2;
                *
pDst=cCurrent;
            
}
            
else
            
{
                *
pDst=cCurrent;
                
pLast=pDst+1;
            
}
            
pDst++;
            
break;
        
case 2:
            
if(myisspace(cCurrent))
            
{
                *
pDst=cCurrent;
            
}
            
else
            
{
                *
pDst=cCurrent;
                
pLast=pDst+1;
            
}
            
pDst++;
            
break;
        
}
        
pSrc++;
    
}
 
    *
pLast='/0';
    
return;
}


toplee_util.c

......
 
int ToBase64(void* pSrc,int nSrcLen, char* strBase64, int* nBase64Len)
{
    
static char *v = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
 
 
.......... 中间代码有长达
3000多行,本文省略掉了 ........
 
void NormalizeName( char *p )
{
        
FullToHalf( p, CODE_PAGE_GBK );
        
TrimString( p );
        
LowerString( p );
}

.

toplee_util.h

#ifndef __TOPLEE_UTIL_INCLUDE__
#define __TOPLEE_UTIL_INCLUDE__    1
 
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <string.h>
#include <stdlib.h>
#ifdef LINUX
#include <time.h>
#endif
 
#include "common.h"
 
//#include "euc2uni.h"
 
/*
typedef int    BOOL;
*/

#ifndef TRUE
#define TRUE    1
#define FALSE    0
#endif
 
#define ASCII                0
#define HZ_HEAD                1
#define HZ_TAIL                2
 
#ifdef BIG_ENDDING
#define DEFAULT_UNICODE            0x3000
#define DEFAULT_GBK_CODE        0xA1A1
#define DEFAULT_BIG5_CODE        0xA140
#else
#define DEFAULT_UNICODE            0x0030
#define DEFAULT_GBK_CODE        0xA1A1
#define DEFAULT_BIG5_CODE        0x40A1
#endif
 
#define CODE_PAGE_GBK    936
#define CODE_PAGE_BIG5    950
#define CODE_PAGE_EUC    932
 
#define CHARSET_DEFAULT    0
#define CHARSET_UNICODE    1
#define CHARSET_UTF8        2
 
 
// 24066 = ( 0xFE - 0x81 + 1 ) * ( 0xFE - 0x40 + 1)
#define GBK_COUNT            24066
 
// 16999 = ( 0xF9 - 0xA1 + 1 ) * ( 0xFE - 0x40 + 1)
#define BIG5_COUNT            16999
 
typedef struct tagMMapFile2
{
    
BOOL bUsed;
    
struct stat finfo;
    
void *mm;
} MMapFile;
 
 
//int LoadEuc2UniTable(char *strFileName);
//void FreeEuc2UniTable(void);
 
int ToBase64(void* pSrc,int nSrcLen, char* strBase64, int* nBase64Len);
int FromBase64(char* strSrc, int nSrcLen, void* pDest, int* nDestLen);
int htmlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
 
int MultiByteToWideChar(unsigned int uCodePageunsigned long lFlags,
    
char *pMultiByteStr, int nMultiByte,
    
unsigned short *pWideChar, int nWideChar);
int WideCharToMultiByte(unsigned int uCodePage, unsigned long dwFlags,
    
unsigned short *pWideCharStr, int nWideChar,
    
char *pMultiByteStr, int nMultiByte,
    
const char* lpDefaultChar, int* lpUseDefaultChar);
 
#define ASCII                0
#define HZ_HEAD                1
#define HZ_TAIL                2
 
void GBK2BIG5(char *lpString, int cbString);
void BIG52GBK(char *lpString, int cbString);
 
void LowerString(char *str);
void TrimString(char *str);
void DecodeFormString(char *str);
void DecodeUTF(char *str);
 
#define DECODE_UNICODE    0
#define KEEP_UNICODE    1
 
#define DECODE_GBK        0
#define DECODE_BIG5        2
 
int DecodePureUTF(unsigned char *str, int nFlag);
 
 
#define LANG_GB            1        // used by httpstrtoint and FullToHalf
#define LANG_B5            2
#define LANG_ENG        3
#define LANG_UNKNOWN    4
 
int httpstrtoint(char* strHttp);
void lowerhttpprefix(char* strUrl);
 
 
#define FULL_COUNT    (21+26*2+5)
 
BOOL FullToHalf(char *str, int nLang);
 
 
#define    URLDESCSEPCHAR        '|'
char* DescriptFromUrl(char* strUrl);
 
#define CODE_GBK2UNI    1
#define CODE_UNI2GBK    2
#define CODE_BIG52UNI    3
#define CODE_UNI2BIG5    4
#define CODE_GBK2BIG5    5
#define CODE_BIG52GBK    6
 
 
const char *mmapOneFile(char *pFileName, MMapFile *mmapfile);
void toplee_cleanup_mmap(void *dummy);
void InitMMResource(void);
const char* LoadOneCodeTable(int nType, char* strFileName);
 
int getcuryear();
 
char* mstrncpy(char* strDest, char* strSrc, size_t nCount);
int formurlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
 
int wmlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
int htmlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
 
#define MAX_INTERNAL_BUFF    16384
int gb2uni_encode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
int unicodeencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
 
char *stristr(const char *big, const char *little);
 
 
 
typedef struct auto_string
{
    
int    len, inc_len;
    
char    *strval;
}struAutoString;
#define DEF_INC_LEN        (1024)
#define DEF_INT_LEN        12
 
void init_auto_string(struAutoString *astr, int inc_len);
int add_auto_string(struAutoString *astr, char *new_str);
void free_auto_string(struAutoString *astr);
 
int unistrcmp(const char *str1, int str1len, const char *str2, int str2len);
 
void NormalizeName( char *p );
 
#endif // __TOPLEE_UTIL_INCLUDE__

php_toplee.h

/*
  +----------------------------------------------------------------------+
  | PHP Version 4                                                        |
  +----------------------------------------------------------------------+
  | Copyright (c) 1997-2002 The PHP Group                                |
  +----------------------------------------------------------------------+
  | This source file is subject to version 2.02 of the PHP license,      |
  | that is bundled with this package in the file LICENSE, and is        |
  | available at through the world-wide-web at                           |
  | http://www.php.net/license/2_02.txt.                                 |
  | If you did not receive a copy of the PHP license and are unable to   |
  | obtain it through the world-wide-web, please send a note to          |
  | license@php.net so we can mail you a copy immediately.               |
  +----------------------------------------------------------------------+
  | Author:                                                              |
  +----------------------------------------------------------------------+
 
 
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
*/

 
#ifndef PHP_GBK_H
#define PHP_GBK_H
 
extern zend_module_entry gbk_module_entry;
#define phpext_gbk_ptr &gbk_module_entry
 
#ifdef PHP_WIN32
#define PHP_GBK_API __declspec(dllexport)
#else
#define PHP_GBK_API
#endif
 
#ifdef ZTS
#include "TSRM.h"
#endif
 
PHP_MINIT_FUNCTION(gbk);
PHP_MSHUTDOWN_FUNCTION(gbk);
PHP_RINIT_FUNCTION(gbk);
PHP_RSHUTDOWN_FUNCTION(gbk);
PHP_MINFO_FUNCTION(gbk);
 
PHP_FUNCTION(confirm_gbk_compiled);    /* For testing, remove later. */
 
PHP_FUNCTION(toplee_decode_utf);
PHP_FUNCTION(toplee_decode_utf_gb);
PHP_FUNCTION(toplee_decode_utf_big5);
PHP_FUNCTION(toplee_encode_utf_gb);
 
PHP_FUNCTION(toplee_big52gbk);
PHP_FUNCTION(toplee_gbk2big5);
PHP_FUNCTION(toplee_fan2jian);
PHP_FUNCTION(toplee_normalize_name);
 
/*
      Declare any global variables you may need between the BEGIN
    and END macros here:     
 
ZEND_BEGIN_MODULE_GLOBALS(gbk)
    int   global_value;
    char *global_string;
ZEND_END_MODULE_GLOBALS(gbk)
*/

 
/* In every utility function you add that needs to use variables
   in php_gbk_globals, call TSRM_FETCH(); after declaring other
   variables used by that function, or better yet, pass in TSRMLS_CC
   after the last function argument and declare your utility function
   with TSRMLS_DC after the last declared argument.  Always refer to
   the globals in your function as GBK_G(variable).  You are
   encouraged to rename these macros something shorter, see
   examples in any other php module directory.
*/

 
#ifdef ZTS
#define GBK_G(v) TSRMG(gbk_globals_id, zend_gbk_globals *, v)
#else
#define GBK_G(v) (gbk_globals.v)
#endif
 
#endif    /* PHP_GBK_H */
 
 
/*
 * Local variables:
 * tab-width: 4
 * c-basic-offset: 4
 * indent-tabs-mode: t
 * End:
 */

至此,我们完成了所有C 代码的编写,本模块实现还需要用到几个码表文件,比如gb2b5.tab,uni2gb.tab之类的,这些码表文件我就不提供了,可以查一些文档如何生成,网上也有很多这样的tab码表文件下载。

接下来,我们就可以进行测试和编译了

回到php源码的根目录,运行命令
#./buildconf
#./configure –with-toplee=shared ……
#./make
#./make install

此时,就完成了模块往php里面的编译,由于加上了shared参数,toplee模块将编译后生成 toplee.so,可以在php.ini或者extensions.ini文件里面使用extension=toplee.so来调用,也可以在php 中使用dl()函数动态调用,然后就可以在php里面使用之前我们定义好的几个函数接口了。 
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值