如何编写自己的文本解析代码(游戏编程精粹2、5两篇文本解析文章读后)

本文介绍了一个文本解析器的设计与实现细节,重点讲解如何通过状态机处理不同类型的文本内容,包括普通文本、注释、字符串等。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

自己解析代码,我想我有点疯了。有了ANGELSCRIPT和TINYXML,还用得着自己再去写解析文本的代码吗?当然,做为一个能让大家使用的AVG引擎脚本,显然。。。。。。,可以让用户直接用ANGELSCRIPT用类C++的语句来写,也应该有一些简化一些的,可以让没学过编程的人也可以轻松使用的命令组,这样就需要自己定义文本的格式,并解析了。

一个建议:使用UNICODE字符。因为UNICODE字符所有字符的长度都是固定的,这样就不会出现将汉字的一半解析成某个特殊字符的极端情况了。
(1)文本解析的核心类:CTokenizer
  作为文本解析代码的核心,这个类最重要的属性就是std::vector< CToken * > m_TokenList,而最重要的方法就是TokenizeString.其功能是将读入的一串字符解析为单独的Token(标记)并存入m_TokenList.

  TokenizeString的原理是:定义一个int变量curPos,记录当前读取到的字符串位置。然后在curPos小于字符串长度时循环。

  每次循环读取两个字符,是的是两个字符,不是一个字符(因为有一些标志由两个字符构成如“//”)。

  在循环内使用了状态机的机制(当然是比较原始的一种实现状态机的方法),用switch语句。

 

  while( curPos < 字符串长度 )

  {

         //读取两个字符

        switch( curState )//curState是一个TOKSTATE变量,而TOKSTATE是一个ENUM类型,定义了几种状态,不同的状态有不同的处理字符的功能。

       {

               case 空格符状态://这一状态必须放在第一个

               case 某种状态:

       }

   }

   当循环完成,整个字符串,就被拆成了若干的字符串并转成CToken类的实例,存入了m_TokenList中。

 

那么状态有几种呢。定义一个枚举  enum TOKSTATE { TKS_INWHITE, TKS_INTEXT, TKS_INQUOTES, TKS_INSINGLECOMMENT, TKS_INMULTICOMMENT}

TKS_INWHITE//默认状态

TKS_INTEXT//文本状态

TKS_INQUOTES//引用状态即“‘’”

TKS_INSINGLECOMMENT//单行注释即“//”

TKS_INMULTICOMMENT//多行注释状态,即“/**/”

可以根据自己的需要扩充状态

 

TKS_INWHITE默认状态是第一次循环时第一个处理的状态,也是初始化时默认的状态,主要处理(按顺序):

(1)检查当前字符是否是默认的定界符(如“/n”,要自己设定一个定界符字符串),调用IsDelimiter方法(检查是否是默认的定界符,并进行一些操作)并返回重新循环

(2)检查当前字符是否是一个组合标记的一部份,如遇到“/”,则检查是否后面还有一个“/”。如果确认是组合标记,将状态置为该标志对应的状态(如上面的TKS_INSINGLECOMMENT单行注释状态)。并返回,重新循环

(3)检查是否一些特殊的标记,如果是,存入m_TokenList。并返回,重新循环。

(4)如果都不是,说明读入的当前字符是一个文本,则置当前状态为TKS_INTEXT。

 

TKS_INTEXT文本状态,主要处理文本(按顺序)

(1)检查当前字符是否是定界符或特殊字符(特殊字符包括“{”“}”等),如果是将终止TKS_INTEXT状态,并将字符串缓存中的字符串作为标记存入m_TokenList,将状态重置为TKS_INWHITE状态,变量curPos减1(该字符不放入字符串缓存)。同时返回,重新循环。

(2)如果不是定界符或特殊字符,则作为字符处理,将当前字符加入字符串缓存。

 

 TKS_INQUOTES引用状态(找到一个引号):

(1)如果是引号,存入m_TokenList,如果不是,存入字符串缓存。置状态为TKS_INWHITE

 

TKS_INSINGLECOMMENT(单行注释符状态):

(1)检测当前字符是否是'/n',如果是,则表明已到单行注释的尾部,行计数器m_curLine加1。置状态为TKS_INWHITE

 

TKS_INMULTICOMMENT(多行注释符状态):

(1)检测当前字符是否是‘/’,且前一个字符是‘*’,如果是,则已到多行注释尾部,置状态为TKS_INWHITE

 

 完整代码(摘自游戏编程精粹)

 

      // Tokenize this String.

bool CTokenizer::TokenizeString( const char *strString, unsigned long ulLen, unsigned long ulFlags, const char *strDelims )

{

      int iNumDelims = (int)strlen( strDelims );//计算定界行的长度

 

      // Destroy any tokens that may have already existed.做了一些初始化的工作

      DestroyTokens();

      m_iCurrentToken = 0;

 

      TOKSTATE CurrentState = TKS_INWHITE;

      CToken *pTok = NULL;

      int iCurPos = 0;

      char strTempBuff[ 512 ] = { 0 };

      int iTempBuffSize = 0;

      char byCurChar;

      char byNxtChar;

      bool bSpecialChar = false;

    /////////////////////////////////////////////////////////////////

      while ( iCurPos < (int)ulLen )

      {

           byCurChar = strString[ iCurPos++ ];

           byNxtChar = strString[ iCurPos ];

 

           switch ( CurrentState )

           {

                 // In Whitespace.

                 case TKS_INWHITE:

                      if ( IsDelimiter( byCurChar, strDelims, iNumDelims ) )

                      {

                            continue;

                      }

 

                      switch ( byCurChar )

                      {

                            // Is this the beginning of a comment?

                            case '/'://是一个注释

                                  // Check the next character and see if it is a comment...

                                  if ( byNxtChar == '/' )

                                  {

                                       CurrentState = TKS_INSINGLECOMMENT;

                                  }

                                  // Check to see if this is a multi-line comment.

                                  else if ( byNxtChar == '*' )

                                  {

                                       CurrentState = TKS_INMULTICOMMENT;

                                  }

                                  continue;

 

                            // If we're quotes...是一个引号

                            case '"':

                                  CurrentState = TKS_INQUOTES;

                                  continue;

 

                            // If we're open brace...

                            case '{':

                                  // Make the token and throw it into the list.

                                  pTok = AllocToken( TKT_OPENBRACE );

                                  if ( pTok )

                                  {

                                       m_TokenList.push_back( pTok );

                                  }

                                  continue;

 

                            // If we're closed brace...

                            case '}':

                                  // Make the token and throw it into the list.

                                  pTok = AllocToken( TKT_CLOSEDBRACE );

                                  if ( pTok )

                                  {

                                       m_TokenList.push_back( pTok );

                                  }

                                  continue;

                      }

 

                      // If we got here then this is not a delimiter, this is text...

                      CurrentState = TKS_INTEXT;

                      iCurPos--;   // Get this character in the next run.

                      break;

 

                 // In Text.

                 case TKS_INTEXT:

                      // Check for delimiters to stop getting text.

                      bSpecialChar = false;

                      if ( !!IsDelimiter( byCurChar, strDelims, iNumDelims ) || ( bSpecialChar = !!IsSpecialCharacter( byCurChar ) ) )

                      {

                            // Make the state machine come back to this special character.

                            if ( bSpecialChar )

                            {

                                  if ( strlen( strTempBuff ) > 0 )

                                  {

                                       iCurPos--;

                                  }

                                  else

                                  {

                                       strTempBuff[ 0 ] = byCurChar;

                                  }

                            }

 

                            FinalizeToken( strTempBuff, ulFlags );

                            iTempBuffSize = 0;

                            memset( strTempBuff, 0, sizeof( strTempBuff ) );

                            CurrentState = TKS_INWHITE;

 

                            continue;

                      }

                      if ( iTempBuffSize < 512 )

                      {

                            strTempBuff[ iTempBuffSize++ ] = byCurChar;

                      }

                      break;

 

                 // In Quotations.

                 case TKS_INQUOTES:

                      // If we found the end of the quote...

                      if ( byCurChar == '"'  )

                      {

                            // Make the token and throw it into the list.

                            pTok = AllocToken( TKT_STRING, strTempBuff );

                            if ( pTok )

                            {

                                  m_TokenList.push_back( pTok );

                            }

                            iTempBuffSize = 0;

                            memset( strTempBuff, 0, 512 );

 

                            // NOTE: See note at previous quote token note.

                            // Now make the next Quote Token.

                            /*pTok = AllocToken( TKT_QUOTE );

                            if ( pTok )

                            {

                                  m_TokenList.push_back( pTok );

                            }*/

 

                            CurrentState = TKS_INWHITE;

                            continue;

                      }

                      strTempBuff[ iTempBuffSize++ ] = byCurChar;

                      break;

 

                 // In single-line comment.

                 case TKS_INSINGLECOMMENT:

                      // Check to see if the comment has ended (we've reached the end of line).

                      if ( byCurChar == '/n' )

                      {

                            CurrentState = TKS_INWHITE;

                            m_iCurLine++;

                      }

                      break;

 

                 // In multi-line comment.

                 case TKS_INMULTICOMMENT:

                      // Check to see if the comment is ending.

                      if ( byCurChar == '/' )

                      {

                            // Check previous character.

                            if ( strString[ iCurPos - 2 ] == '*' )

                            {

                                  CurrentState = TKS_INWHITE;

                            }

                      }

                      break;

           };

      }

 

      // We're done with the string but we're left over with a token.

      if ( iTempBuffSize )

      {

           FinalizeToken( strTempBuff, ulFlags );

      }

 

      return true;

}


 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值