C#关键字快速过滤方法

本文介绍了一种基于散列的高效关键词过滤算法,该算法利用字典存储关键词,并通过遍历文本进行快速匹配过滤,实现在90多行代码内完成过滤功能,且处理速度极快,适用于大规模文本数据。
本篇博客讲的方案,思路很简单,还是基于撒列,把每个关键词的第一个字作为key,把关键词作为value,把所有关键词撒列在一个Dictionary<key,value>中,由于一个关键字可能对应多个关键词,所以value其实是一个关键词集合,通过遍历要过滤的内容,与关键字字典进行匹配,匹配的话就过滤掉,由于思路简单清晰,可能出现的BUG绝对很少,实现的代码才90多行, 实现关键字过滤的功能代码才 90 多行,看到没有! 而且效率还不错,关键字和要过滤的内容都一万多字,使用的时间才10毫秒,而且这两组数据都是从记事本中读出来的。

废话真的不想再说了,看了源码之后你会觉得:我靠,原来这么简单。信不信由你, 反正源码在这里
using  System;
using  System.Collections.Generic;
using  System.Text;

namespace  WordsFilter
{
     ///   <summary>
///  关键字过滤
    
///   </summary>
     public   class  WordSearch
    {
         private  Dictionary< char , IList< string >> keyDict;
         public  WordSearch( string  keyList)
        {
            HandleKeyWords(keyList);
        }

         private   void  HandleKeyWords( string  text)
        {
             if  ( string .IsNullOrEmpty(text))
            {
                keyDict =  new  Dictionary< char , IList< string >>();
            }
             else
            {
                 string [] strList = text.Split( ' | ' );
                keyDict =  new  Dictionary< char , IList< string >>(strList.Length /  4 );
                 foreach  ( string   in  strList)
                {
                     if  (s ==  "" )
                    {
                         continue ;
                    }
                     if  (keyDict.ContainsKey(s[ 0 ]))
                    {
                        keyDict[s[ 0 ]].Add(s);
                    }
                     else
                    {
                        keyDict.Add(s[ 0 ],  new  List< string > { s });
                    }
                }
            }
        }

         public   string  Filter( string  str)
        {
             if  ( string .IsNullOrEmpty(str))
            {
                 return   string .Empty;
            }
             int  len = str.Length;
            StringBuilder sb =  new  StringBuilder(len);
             bool  isOK =  true ;
             for  ( int  i =  0 ; i < len; i++)
            {
                 if  (keyDict.ContainsKey(str ))
                {
                    foreach (string in keyDict[str])
                    {
                        isOK = true;
                        int j = i;
                        foreach (char in s)
                        {
                            if (j >= len || c != str[j++])
                            {
                                isOK = false;
                                break;
                            }
                        }
                        if (isOK)
                        {
                            i += s.Length - 1;
                            sb.Append('*', s.Length);
                            break;
                        }

                    }
                    if (!isOK)
                    {
                        sb.Append(str);
                    }
                }
                else
                {
                    sb.Append(str);
                }
            }
            return sb.ToString();
        }
         
    }
}
测试截图:
C#关键字快速过滤方法
作者:陈太汉
博客:http://www.cnblogs.com/hlxs/


用你的例子测试了一下.我循环了1000次.我的这个快很多哦.
这是测试结果:
WordSearch用时(毫秒): 5824 Milliseconds (GCs=194)
TrieFilter用时(毫秒): 1497 Milliseconds (GCs=70)
FastFilter用时(毫秒): 617 Milliseconds (GCs=70)
把你的Program改了下测试的:
    class Program
    {
        static TrieFilter tf = new TrieFilter();
        static FastFilter ff = new FastFilter();

        static void Main(string[] args)
        {
            using (StreamReader sw = new StreamReader(File.OpenRead("words.txt")))
            {
                Random random = new Random();
                string key = sw.ReadLine();
                while (key != null)
                {
                    if (key != string.Empty)
                    {
                        tf.AddKey(key);
                        ff.AddKey(key);
                    }
                    key = sw.ReadLine();
                }
            }

            string keys = IOHelper.Read("words.txt").Replace("\r\n", "|");
            WordSearch ws = new WordSearch(keys);
            string str = IOHelper.Read("content.txt");

            using (new OperationTimer("WordSearch用时(毫秒):"))
            {
                for (int i = 0; i < 1000; i++)
                {
                    string s = ws.Filter(str);
                }
                //Console.WriteLine(s);
            }
            using (new OperationTimer("TrieFilter用时(毫秒):"))
            {
                for (int i = 0; i < 1000; i++)
                {
                    string s = tf.Replace(str);
                }
            }
            using (new OperationTimer("FastFilter用时(毫秒):"))
            {
                for (int i = 0; i < 1000; i++)
                {
                    string s = ff.Replace(str);
                }
            }

            Console.Read();
        }
    }
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值