Linux串口编程短信篇（二） ——— PDU编码（详解 UTF-8 转 Unicode）

最新推荐文章于 2025-04-10 22:54:31 发布

梦小羊

最新推荐文章于 2025-04-10 22:54:31 发布

阅读量2.9k

点赞数 9

分类专栏：嵌入式学习--4G模块与串口文章标签：字符串 unicode 编程语言嵌入式Linux 4G通信模块

本文链接：https://blog.youkuaiyun.com/weixin_45121946/article/details/107319031

版权

嵌入式学习--4G模块与串口专栏收录该内容

10 篇文章

订阅专栏

文章目录

一. 电话号码处理
- 1.1 短信中心号码的处理
- 1.2 收件人号码处理
二. 短信内容处理
- 2.1 UTF-8 转 Unicode
- 2.2 Hex 转 String
三. 最终汇总
- 3.1 PDU格式拼接
- 3.2 函数 pdu_encod()
四. 模块头文件 PDU.h

一. 电话号码处理

1.1 短信中心号码的处理

在这里插入图片描述
实现：

int Processing_center_number(char *center_number)
{
    int i;
    char temp;
    char head[20] = "0891";  //08: length   91: globalization

    if(!center_number)
    {
        printf("Invalid input\n");
        return -1;
    }


    /*  Drop the ‘+’ sign, add ‘F’, and swap the parity bits */
    strcpy(center_number,&center_number[1]);

    strcat(center_number,"F");

    for(i = 0; i < 14;i+=2)
    {
        temp = center_number[i];
        center_number[i] = center_number[i+1];
        center_number[i+1] = temp;
    }


    /* Add header */
    strcat(head,center_number);
    strcpy(center_number,head);

    return 0;
}

函数设计只有一个参数，那就是保存了短信中心号的数组，函数内部借助strcpy,strcat 等字符串操作的函数将传进来的短信中心号处理成PDU格式所需要的其中一段字符

ps:不同的运营商和不同的地区短信中心号都是不相同的，可以通过百度查询自己手机卡的短信中心号，当然，我后面也会写到一个函数，借助AT命令来获取短信中心号，电信使用的CDMA网络发送短信可以不需要短信中心号，但统一按使用中心号来处理。

1.2 收件人号码处理

在这里插入图片描述
实现

int Processing_phone_number(char *phone_number)
{
    int    i;
    char   temp;
    char   head[64] = "11000D91";
    char   tail[64] = "000800";

/* 
 *  1100: fixed
 *  0D: the length of the mobile phone number, not counting the + sign, expressed in hexadecimal 
 *  91: sent to the mobile phone
 *
 * */

    if(!phone_number)
    {
        printf("Invalid input\n");
        return -1;
    }

    /* Drop the ‘+’ sign, add ‘F’, and swap the parity bits */
    strcat(phone_number,"F");
    strcpy(phone_number,&phone_number[1]);
    for(i = 0; i < 14;i+=2)
    {
        temp = phone_number[i];
        phone_number[i] = phone_number[i+1];
        phone_number[i+1] = temp;
    }

    /* Add header,tail */
    strcat(head,phone_number);
    strcat(head,tail);
    strcpy(phone_number,head);

    return 0;
}

除了使用strcpy和strcat，还可以使用 sprintf 函数来完成字符串的拼接，更加方便

二. 短信内容处理

2.1 UTF-8 转 Unicode

ASCII码是由美国在上世纪60年代制定的一套编码方法，使用了一个字节的7个位，即0 ~127来表示，对于字母来说，一个字节足以表示，但随着计算机的发展与需求，欧洲，亚洲的文字使用这短短的7个位是完全不足以表示的，咱们伟大的汉字就有10万以上的总量，所以，便引出了Unicode编码，它可以存储世界上的所有符号文字，但是，他仅仅规定了字符的二进制的表示，却没有规定该如何存储，如果规定用3，4个字符来存储，那对于英文来说，将会使得内存有极大的浪费，最终，便提出了我们现在使用最为广泛的UTF-8编码，
UTF-8是Unicode的实现方式之一.， UTF-8最大的一个特点, 就是它是一种变长的编码方式. 它可以使用1~6个字节表示一个符号,
根据不同的符号而变化字节长度.

要想写出UTF-8格式与Unicode格式之间的转换，必须了解他们是如何编码的

UTF-8的编码规则
UTF-8的编码规则很简单, 只有两条:

对于单字节的符号, 字节的第一位设为0, 后面7位为这个符号的unicode码. 因此对于
英语字母, UTF-8编码和ASCII码是相同的.
对于n字节的符号(n>1), 第一个字节的前n位都设为1, 第n+1位设为0, 后面字节的前
两位一律设为10. 剩下的没有提及的二进制位, 全部为这个符号的unicode码.

下面来看UTF-8是如何划分Unicode的编码的

在这里插入图片描述
图中所示正如上文所述，0~127之间UTF-8 与 Unicode编码是相同的，而超过1字节后，便按照规律设置，来看一个例子
梦字的unicode编码为68A6，即二进制为： 01101000 10100110，其值位于第三个取值范围，即其使用的UTF-8编码是 1110xxxx 10xxxxxx 10xxxxxx,将二进制一次填入x处，未结束的补0，如图：

在这里插入图片描述
由此，就可以知道他们之间是如何相互转换的了

实现

/* UTF-8 -> Unicode , LSB */
int utf8_to_unicode(char* utf8_buf,char* unic_buf)
{
    if(!utf8_buf)
    {
        printf("Invalid parameter\n");
        return -1;
    }
    char *temp = unic_buf;

    char b1,b2,b3,b4;  //b1: high data bit   b4: low data bits

    while(*utf8_buf)
    {
        if(*utf8_buf > 0x00 && *utf8_buf <= 0x7E)  //Single byte
        {
            *temp = 0;
            temp++;
            *temp = *utf8_buf;
            temp++;
            utf8_buf++;  //Next unprocessed character
        }

        else if(((*utf8_buf) & 0xE0) == 0xC0)  //Double bytes
        {
            b1 = *utf8_buf;
            b2 = *(utf8_buf+1);

            if((b2 & 0xC0) != 0x80)  //Check the legality of characters,Double bytes of UTF-8: 110xxxxx 10xxxxxx
                return -1;

            *temp = (b1 >> 2) & 0x07;
            temp++;
            *temp = (b1 << 6) + (b2 & 0x3F);
            temp++;
            utf8_buf+=2;
        }

        else if(((*utf8_buf) & 0xF0) == 0xE0)  //Three bytes
        {
            b1 = *utf8_buf;
            b2 = *(utf8_buf+1);
            b3 = *(utf8_buf+2);
            if ( ((b2 & 0xC0) != 0x80) || ((b3 & 0xC0) != 0x80) )  //Check the legality of characters,1110xxxx 10xxxxxx 10xxxxxx
                return -1;

            *temp = (b1 << 4) + ((b2 >> 2) & 0x0F);
            temp++;
            *temp = (b2 << 6) + (b3 & 0x3F);
            temp++;
            utf8_buf+=3;
        }

        else if(*utf8_buf >= 0xF0 && *utf8_buf < 0xF8) //Four bytes
        {
            b1 = *utf8_buf;
            b2 = *(utf8_buf+1);
            b3 = *(utf8_buf+2);
            b4 = *(utf8_buf+3);
            if ( ((b2 & 0xC0) != 0x80) || ((b3 & 0xC0) != 0x80) || ((b4 & 0xC0) != 0x80) )  //11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
                return -1;
            *temp = ((b1 << 2) & 0x1C)  + ((b2 >> 4) & 0x03);
             temp++;
            *temp = (b2 << 4) + ((b3 >> 2) & 0x0F);
             temp++;
            *temp = (b3 << 6) + (b4 & 0x3F);

             temp++;
             utf8_buf+=4;
        }

        else
            return -1;

    }

    /* Add FFFE at the end */
    *temp = 0xFF;
    temp++;
    *temp = 0xFE;

    return 0;
}

数组尾部添加0xFF,0XFE来作为结束标志符，便于后面的拼接。

2.2 Hex 转 String

发送给串口的buf,必须是字符串类型的，也就是所谓的字符流，而目前，我们获取到的Unicode属于字节流的数据，是不能直接发送给串口的，就算发送了也不会成功，所以必须将字节流的数据转为字符流，才可以进行接下来的其他拼接与发送，下面是代码

实现

int Hex2Str( const char *sSrc,  char *sDest, int nSrcLen )
{
    int              i;
    char             szTmp[3];
    if(!sSrc || !sDest || nSrcLen <= 0)
    {
        printf("Unable to transcode Hex to String,Invalid parameter.\n");
        return -1;
    }

    for( i = 0; i < nSrcLen; i++ )
    {
        if(sSrc[i] != 0xFF && sSrc[i+1] != 0xFE)  //0xFF 0xFE is the end of Unicode
        {
            sprintf( szTmp, "%02X", (unsigned char) sSrc[i] );
            memcpy( &sDest[i * 2], szTmp, 2 );
        }
        else
            break;
    }
    return 0;
}

在获取到UTF-8的SMS后，转为Unicode，再将字节流转为字符流
！！最后，将字符串长度除以2，获得的二位十六进制添加到字符串首部，至此，短信内容处理完毕。

三. 最终汇总

3.1 PDU格式拼接

将处理后的电话号码与处理后的SMS内容拼接，计算这个字符串的长度，记住这个10进制数,再发送短信时命令
AT+CMGS会用到，我们就称他为 val_cmgs
将处理好的短信中心号与上一个拼接好的字符串拼接，就得到了最终的PDU编码字符串

3.2 函数 pdu_encod()

总结一个函数，将以上函数进行封装调用，通过PDU格式进行拼接，就得到了最终函数 pdu_encod()
实现

/************************************************************************************ 
 *
 *     Function:  int pdu_encod(char *sms_buf,char *center_number,char *phone_number,char *pdu_buf,int *val_cmgs)
 *
 *    Parameter:  char *sms_buf          -    Unprocessed SMS
 *              
 *                char *center_number    -    The SMS Center Number of SIM Card
 *
 *                char *phone_number     -    Recipient's Mobile Number
 *
 *                char *pdu_buf          -    Used to save the PDU code after completion
 *
 *                int  *val_cmgs         -    After splicing the processed phone number and the processed SMS,
 *
 *                                            the length of the string is half, which is a decimal number
 *
 *
 *  Description:  Use the SMS center number, recipient number, and SMS to encode the PDU,then save it in the fourth parameter,
 *                and also record the value required by cmgs
 *               
 * Return Value:  0                      -    PDU encoding success
 *                
 *                negative number        -    Failure
 *
 ************************************************************************************/
int pdu_encod(char *sms_buf,char *center_number,char *phone_number,char *pdu_buf,int *val_cmgs)
{
    char    temp[512] = {0};
    char    str_unicode[256] = {0};
    char    unicode_buf[256] = {0};

    if(!center_number || !phone_number)
    {
        printf("Unable to perform pdu encoding,Invalid parameter.\n");
        return -1;
    }

    /* UTF8 -> Unicode */
    if(utf8_to_unicode(sms_buf,unicode_buf) != 0)
    {
        printf("UTF-8 to Unicode failed,Check your input.\n");
        return -2;
    }

    /* Hex -> Str */
    if(Hex2Str(unicode_buf,str_unicode,256) != 0)
    {
        printf("Hex to String failed.\n");
        return -3;
    }

    /* Half of Unicode length, take hex */
    sprintf(temp,"%02x%s",strlen(str_unicode)/2,str_unicode);

    Processing_center_number(center_number);
    Processing_phone_number(phone_number);

    /* Stitching */
    strcat(phone_number,temp);
   *val_cmgs = (int)strlen(phone_number)/2; //This value is used when the AT command sends PDU format SMS

    strcat(center_number,phone_number);

    strcpy(pdu_buf,center_number);

    if(1)
        printf("val_cmgs:%d\npdu:%s\n",*val_cmgs,pdu_buf);

    return 0;
}

四. 模块头文件 PDU.h

 PDU.h                                                                                                                                                                                                                                                          
/********************************************************************************
 *      Copyright:  (C) 2020 LuXiaoyang<920916829@qq.com>
 *                  All rights reserved.
 *
 *       Filename:  PUD.h
 *    Description:  This head file of PDU.c
 *
 *        Version:  1.0.0(09/07/20)
 *         Author:  LuXiaoyang <920916829@qq.com>
 *      ChangeLog:  1, Release initial version on "09/07/20 010:58:23"
 *                 
 ********************************************************************************/
#ifndef  _PDU_H_
#define  _PDU_H_

#include <stdio.h>
#include <string.h>
#include <iconv.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>



/* Convert byte stream data to a string */
int Hex2Str(const char *sSrc,char *sDest,int nSrcLen);

/* UTF8 -> Unicode */
int utf8_to_unicode(char* utf8_buf,char* unic_buf);

/* Process Recipient Number */
int Process_phone_number(char *phone_number);

/* Processing SMS Center Number */
int Process_center_number(char *center_number);

/* PDU encoding */
int pdu_encod(char *sms_buf,char *center_number,char *phone_number,char *pdu_buf,int *cmgs);


#endif   /*  ----- #ifndef _PDU_H_  ----- */