Simple Rule Engine

本文介绍了Simple Rule Engine (SRE)的基本概念、工作原理及如何使用SRE简化软件开发过程。探讨了SRE作为轻量级规则引擎的优势,并通过实例说明了其在业务规则管理和维护方面的灵活性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

SRE

Simple Rule Engine

version 1.3

Sierra Digital Solutions Corp

http://sourceforge.net/projects/sdsre/

 

 

 

If you find this application useful we would like to hear from you, feedback is what drives the project.

 

 

 

 

 

What is a rule engine? 
A Rule Engine is a software system that contains rules on behalf of another system. Many different kinds of rules can be contained in a rules engine: business, legal, company policy, navigation, computational. An example, a website that prompts the user to fill out a form that has dependencies between pages could use a rule engine to control the verification of the validity of the form and the navigition between pages. Another example, a bank could have a set of rules that dictate the interest rate and fee's associated with a checking account. It also provides a means to classify and manage rules in a centralized location instead of being hardcoded throughout an application.

 

Why use a rule engine? 
Most companies business strategies are constantly in flux due to their changing internal and external environments; the CEO has a new idea on increasing sales, consumer base is being eroded by a more aggressive competition, current business practices wiill soon be out of compliance due to changes in regulations. How do handle changes in a system in an maintainable, reusable, extensibility, and ownership?

Maintainability is increased by using a rule engine by centralizing the rules. Typically a non-rule engine based systems that contain rules of moderate complexity will implement its rules in methods of objects that encapsulate business logic (business objects). They refer to other business objects that form associations with other business objects, typically creating an intricate web of associations. Since these objects are tightly coupled changes to them can have rippling effects throughout the collection. Clarity of the rules in such a system can be quite poor when the rules cross these objects. Not only is it difficult for a developer to understand the details of such relationships, but even more so to change them.

A rule engine based system will contain rules of another application thus seperating its rules from its controller. Typically a rule engine based system will use a proxy class that delegates a call to a rule engine. Rules implemented in a rule engine are declaritive in nature. That is they are essentially a set of if...then statements, and as long as their condition(s) do not contain a dependency to other rule they are completely decoupled from eachother. Whether they fire or not has nothing to do with other rules, in fact it is as if there are no other rules. It is now possible to group rules, thus bringing clarity.

Extensibility of requirements is increased in a rule engine due to its declaritive nature. In a non-rule engine based system, adding new requirements would possibly mean new associations and as mentioned new associations across business objects could have a rippling effect. A rule engine based system will not have this problem since the rules have a higher clarity and are loosly coupled. New sets of rules can be added to the system with no effect on the existing rules.

Reusability of requirements is also increased in a rule engine due to its declaritive nature. Often its a requirement that a system can operate based on multple requirements. For example, a business wants its pricing policy for new customers to change at midnight in two weeks, while existing customers remain at their current pricing policy for the remainder of their term. Using a business object model would require multiple business objects heirarchy, that is one heirarchy for new customers and another for old customers. Or multiple business objects of the same type to distinguish between new and old customers. Or switch statements inside business object(s) methods. A rule engine based system would simply contain in the condition the term start date and decide for itself which ruleset(s) to execute.

Ownership of rules can be shifted to business units with a rule engine, especially with SRE. Rules being declarity in nature are easy to understand. The learning curve that business users would have to overcome is relatively small: XML and testing. SRE was designed around its rules being contained in a simple to understand XML file, hence the word SIMPLE in SRE. XML should take no more than a day to learn; they even have XML for dummies and XML in 24hours books. Implementing rules should have no to low learning curve, anyone who can form if..then statement, thats just about all of us, can write and maintain rules. This is actually a huge gain for business groups, they now have a way to easily own their own rules, no longer do they need to try to explain and hope developers understand what they want.

On the downside, not every system should use a rule engine. Rule engines do increase the risk and complexity to the project, and proper knowledge of how to successfully implelement a rule engine should not be taken lightly. It is critical that a system be designed with the proper data interface to the rule engine. Not doing so will severly limit its usefullness.

In conclusion, entropy (randomness) of requirements are lowered with a rule engine, they can be centralized and grouped together in a manner that adds clarity in a XML file that simple yet powerfull enough to allow even business owners to own the rules.

 

What is SRE (Simple Rule Engine)?
From a technical perspective, SRE is a comprehensive, flexible product aimed at enabling software developers to create applications that can be mainteained with minimal effort and directly supported by designated business users. It leverates the flexibility and robustness of XML into a context that business users can understand and own freeing developers to work on new functionality.  It allows developers to combine rule-based and object oriented programmng methods to add business rules to new and existing applications.  Hence the name of this product, Simple Rule Engine.

 

How does SRE simplify our development?
1. SRE makes it easy to create, edit, store, and organize rules.
2. SRE's XML is the simpliest possible manner of describing rules.
3. Rules are declaritive, not procedural, in nature allowing problems to be broken down into discrete pieces.
4. SRE allows rules to be seperated from your application allowing for flexibility for future changes.

 

Why would I want to use SRE instead of a full blown rule engine?
A full blown rule engine, like NxBRE (also open source), is far more capable than SRE but it comes at a price. That ability has a price, SRE has far better performance and is far easier to understand. Not every application needs the complexity of a bull blown rule engine, sometimes a simple one will do just fine and sometimes it wont. Choose wisely.

 

What kind of rule engines are there?
Chaining refers to the relationship between the rules; forward and backwards refer to the direction of that relationship.

Forward chaining : Often refered to being 'data driven' due to the fact that the data determines the truthality of the rule.
(Rule) If my car is green, then 
(Action) my house red. 
(Fact) I have a green car.
(therefore) I have a red house.

Backwards-chaining : Often refered to being 'goal driven' due to the fact that we are trying to determine if something exists based on existing information. 
(Rule) If my car is green, then
(Action) my house red. 
(Fact) My house is NOT red. 
(therefore) I dont have a green car.

hybrid :  combination of the above two.

 

What is contained in a rule engine?
Rule engines typically support rules, facts, priority (score), mutual exclusion, preconditions, and other functions. Because SRE's goal is to be simple and not a full blown rule engine, it only supports rules, facts, and actions.

 

What kind of rule engine is SRE?
forward chaining

 

What is declararitive programming?
(modified version from drool)
Declarative programming deals with what is, as opposed with the how to normally encountered in imperative programming languages. SRE allows for rules to be decoupled and lowers the unidirectional logical flow. Declarative rules typically take the form as follows:

if ( condition)
then ( action)

Declarative statements state  what should occur, but do not specify the procedure for actually testing the conditions. For example, an imperative method for ensuring you have an umbrella if it is raining would be:

  1. Step outside and determine if it is raining.
  2. If it is raining, then go to the closet and get an umbrella.

The above could also be represented by two declarative rules:

If it is raining
Then you need an umbrella

If you need an umbrella
Then get one from the closet

Given declarative rules, the knowledge that it is raining could produce two
courses of action:

  1. You already have an umbrella, perhaps because you always carry one, in
    which case, you're ready.
  2. You don't have an umbrella, so you go get one from the closet.

This allows you to come by the knowledge that its raining in more than one way, and respond to that knowledge in the most appropriate fashion, depending on other available knowledge. For instance, if you know you're going to be away from the house and that it is likely to rain, you can add another rule to the ones you specified above:

If it might rain later, and
If you'll be away from the house
Then you need an umbrella

At this point, you only need to go get an umbrella if you know you don't already have an umbrella, and either that it is raining, or that it might rain later while you're away from the house. If you were to write that all out in procedural code, it'd quickly get tangled and complex. Imperative logic helps us decompose complex decisions into mind-sized pieces.

 

What are Facts? 
"Something demonstrated to exist or known to have existed" as defined by dictionary.com. Its a fact that today the sky is blue, its a fact that my car is red. Facts are immutable once their value has been determined. Facts can have essentially any value.

 

What are Rules? 
Rules are Facts with conditions and possibly actions. If your house is blue then your car is green. If your in an accident then your insurance premium will increase. Rules actually inherit from Facts; they too are immutable once their value has been determined. Rules are only true or false. What are Actions? Actions are causation of changes from facts. In the last rules example, your insurance company will increase your rates if your in an accident. Actions currently can assign a value from another fact or set a fixed value, callbacks are a future enhancement.

How do Facts, Rules, and Actions work together? 
Facts contain the data from our model (incomming data). Rules also have a value the instant their dependent facts are given a value. Actions are performed the instant their Rules evaluate to true. While obviously to our computers these determinations have to be calculated, but from the view of our client program they are instant; the client must wait for a response.

 

What does SRE's ruleset look like?
(see unit test cases or examples) 
<RuleEngine> 
<Rules> 
<Rule id="R1" desc="expression"> 
<Condition><![CDATA[ ISNULL(FACT(In)) ]]></Condition> 
<Actions> 
<Action factId="Out"> 
<Expression><![CDATA[ 5 ]]></Expression> 
</Action> 
<Action factId="Out" result="false"> 
<Expression><![CDATA[ "" ]]></Expression> 
</Action> 
</Actions> 
</Rule> 
</Rules> 
<Facts> 
<Fact id="True" desc="True" type="boolean"> <xpath><![CDATA[ boolean(1) ]]></xpath> </Fact> 
<Fact id="False" desc="False" type="boolean"> <xpath><![CDATA[ boolean(0) ]]></xpath> </Fact> 
<Fact id="String" desc="True" type="string"> <xpath><![CDATA[ 'string' ]]></xpath> </Fact> 
<Fact id="In" type="double"> <xpath><![CDATA[ //number1 ]]></xpath> </Fact> 
<Fact id="Out" type="double"> <xpath><![CDATA[ //number2 ]]></xpath> </Fact> 
</Facts> 
</RuleEngine>

 

What are the elements in our XML Rules?
RuleEngine : the root node, it contains all the rules and facts.

Rule : declares a rule
Condition : declares the expression necessary to evaluate to true for actions to fire. this expression MUST always return true or false or an exception will occur.
Action : specify the fact that they are to be applied to and the expression, expressions being returned can be of any type.

Facts : declare a fact, facts are statements about data and get their data from the model.
type: specify the type of the value. 'boolean', 'string', 'double' are supported types.
xpath: specifies the xpath necessary to return and set the data. If its unable to set data an exception will be thrown.Currently only the first nodesets value will be taken, or the evaluated value for non-nodesets.

 

How are rules executes?
Once the rules have been set to a model (XmlDocument), the rules are automatically fired as the xmldocument object is modified. For example...

model["a"]["number1"].InnerText= "15"; //rules are executed because of this

could fire multiple rules due to this change. Its important to note that rules are ONLY fired if there is a change in value. That is if the above line was execute 20 times in a row with the same value the rules involved would still only be executed once! To have multiple firings, clear the value and then set it again. See bank account example.

 

How about a full example on how to execute rules?
public void xxx() 

ROM rom = new ROM(); //rules 
XmlDocument rules = new XmlDocument(); 
string directory = AppDomain.CurrentDomain.BaseDirectory + @"\..\..\Tests60-69\TesterR60.xml"; 
rules.Load( directory ); 
RuleEngine.Complier.XmlComplier.Compile(rules, rom); //model 

Debug.WriteLine( "\n\nAttaching model......."); 
XmlDocument model = new XmlDocument(); 
model.LoadXml("<A><NUMBER1></NUMBER1><NUMBER2></NUMBER2><RESULT></RESULT></A>"); 
rom.Model = model; 
Debug.WriteLine( "Model attached.......\n\n"); //attach no model 

Debug.WriteLine( "\n\nModifying model......."); 
model["a"]["number1"].InnerText= "15"; //rules are executed because of this 
Debug.WriteLine( "Model modified.......\n\n"); //now our other xml values should be set 
Debug.WriteLine(model.OuterXml); 
Assert.AreEqual("3", model["a"]["number2"].InnerText); 
}

 

What operators can be used in expressions?
It is important to note that all operators, expressions, and facts id's are case sensitive.
ISNULL
FACT
==
!=
-
+
*
/
AND
OR
NOT

 

How can I know how my rules are being executed?
First set the references to the debug dll's. These dll's have a huge performance hit in performance BUT also dump their workings to the console or other registered listeners. Then run the program.

 

2025-07-06 22:15:17 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor 2025-07-06 22:15:17 [scrapy.extensions.telnet] INFO: Telnet Password: 94478ec1879f1a75 2025-07-06 22:15:17 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.throttle.AutoThrottle'] 2025-07-06 22:15:17 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats', 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware'] 2025-07-06 22:15:17 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-07-06 22:15:17 [scrapy.middleware] INFO: Enabled item pipelines: ['nepu_spider.pipelines.ContentCleanPipeline', 'nepu_spider.pipelines.DeduplicatePipeline', 'nepu_spider.pipelines.SQLServerPipeline'] 2025-07-06 22:15:17 [scrapy.core.engine] INFO: Spider opened 2025-07-06 22:15:17 [nepu_spider.pipelines] INFO: ✅ 数据库表 'knowledge_base' 已创建或已存在 2025-07-06 22:15:17 [nepu_info] INFO: ✅ 成功连接到 SQL Server 数据库 2025-07-06 22:15:17 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-07-06 22:15:17 [scrapy.extensions.httpcache] DEBUG: Using filesystem cache storage in C:\Users\Lenovo\nepu_qa_project\.scrapy\httpcache 2025-07-06 22:15:17 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2025-07-06 22:15:17 [nepu_info] INFO: 🚀 开始爬取东北石油大学官网... 2025-07-06 22:15:17 [nepu_info] INFO: 初始URL数量: 4 2025-07-06 22:15:17 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/robots.txt> from <GET http://www.nepu.edu.cn/robots.txt> 2025-07-06 22:15:24 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.nepu.edu.cn/robots.txt> (referer: None) 2025-07-06 22:15:24 [protego] DEBUG: Rule at line 12 without any user agent to enforce it on. 2025-07-06 22:15:24 [protego] DEBUG: Rule at line 13 without any user agent to enforce it on. 2025-07-06 22:15:24 [protego] DEBUG: Rule at line 14 without any user agent to enforce it on. 2025-07-06 22:15:24 [protego] DEBUG: Rule at line 15 without any user agent to enforce it on. 2025-07-06 22:15:24 [protego] DEBUG: Rule at line 16 without any user agent to enforce it on. 2025-07-06 22:15:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/index.htm> from <GET http://www.nepu.edu.cn/> 2025-07-06 22:15:34 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/tzgg.htm> from <GET http://www.nepu.edu.cn/tzgg.htm> 2025-07-06 22:15:38 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/xwzx.htm> from <GET http://www.nepu.edu.cn/xwzx.htm> 2025-07-06 22:15:45 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/xxgk.htm> from <GET http://www.nepu.edu.cn/xxgk.htm> 2025-07-06 22:15:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/index.htm> (referer: None) 2025-07-06 22:15:51 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.gov.cn': <GET https://www.gov.cn/gongbao/content/2001/content_61066.htm> 2025-07-06 22:15:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/tzgg.htm> (referer: None) 2025-07-06 22:16:01 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.nepu.edu.cn/xwzx.htm> (referer: None) 2025-07-06 22:16:01 [nepu_info] ERROR: 请求失败: https://www.nepu.edu.cn/xwzx.htm | 状态: 404 | 错误: Ignoring non-200 response 2025-07-06 22:16:03 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.nepu.edu.cn/xxgk.htm> (referer: None) 2025-07-06 22:16:03 [nepu_info] ERROR: 请求失败: https://www.nepu.edu.cn/xxgk.htm | 状态: 404 | 错误: Ignoring non-200 response 2025-07-06 22:16:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/28877.htm> (referer: https://www.nepu.edu.cn/index.htm) 2025-07-06 22:16:05 [nepu_info] ERROR: ❌ 解析失败: https://www.nepu.edu.cn/info/1049/28877.htm | 错误: Expected selector, got <DELIM '/' at 0> Traceback (most recent call last): File "C:\Users\Lenovo\nepu_qa_project\nepu_spider\spiders\info_spider.py", line 148, in parse_item date_text = response.css(selector).get() ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\scrapy\http\response\text.py", line 147, in css return self.selector.css(query) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\selector.py", line 282, in css return self.xpath(self._css2xpath(query)) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\selector.py", line 285, in _css2xpath return self._csstranslator.css_to_xpath(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\csstranslator.py", line 107, in css_to_xpath return super(HTMLTranslator, self).css_to_xpath(css, prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\xpath.py", line 192, in css_to_xpath for selector in parse(css)) ^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 415, in parse return list(parse_selector_group(stream)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 428, in parse_selector_group yield Selector(*parse_selector(stream)) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 436, in parse_selector result, pseudo_element = parse_simple_selector(stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 544, in parse_simple_selector raise SelectorSyntaxError( cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '/' at 0> 2025-07-06 22:16:05 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://www.nepu.edu.cn/info/1049/28867.htm> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates) 2025-07-06 22:16:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1313/16817.htm> (referer: https://www.nepu.edu.cn/index.htm) 2025-07-06 22:16:05 [nepu_info] ERROR: ❌ 解析失败: https://www.nepu.edu.cn/info/1313/16817.htm | 错误: Expected selector, got <DELIM '/' at 0> Traceback (most recent call last): File "C:\Users\Lenovo\nepu_qa_project\nepu_spider\spiders\info_spider.py", line 148, in parse_item date_text = response.css(selector).get() ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\scrapy\http\response\text.py", line 147, in css return self.selector.css(query) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\selector.py", line 282, in css return self.xpath(self._css2xpath(query)) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\selector.py", line 285, in _css2xpath return self._csstranslator.css_to_xpath(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\csstranslator.py", line 107, in css_to_xpath return super(HTMLTranslator, self).css_to_xpath(css, prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\xpath.py", line 192, in css_to_xpath for selector in parse(css)) ^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 415, in parse return list(parse_selector_group(stream)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 428, in parse_selector_group yield Selector(*parse_selector(stream)) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 436, in parse_selector result, pseudo_element = parse_simple_selector(stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 544, in parse_simple_selector raise SelectorSyntaxError( cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '/' at 0> 2025-07-06 22:16:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1313/17517.htm> (referer: https://www.nepu.edu.cn/index.htm) 2025-07-06 22:16:06 [nepu_info] ERROR: ❌ 解析失败: https://www.nepu.edu.cn/info/1313/17517.htm | 错误: Expected selector, got <DELIM '/' at 0> Traceback (most recent call last): File "C:\Users\Lenovo\nepu_qa_project\nepu_spider\spiders\info_spider.py", line 148, in parse_item date_text = response.css(selector).get() ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\scrapy\http\response\text.py", line 147, in css return self.selector.css(query) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\selector.py", line 282, in css return self.xpath(self._css2xpath(query)) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\selector.py", line 285, in _css2xpath return self._csstranslator.css_to_xpath(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\csstranslator.py", line 107, in css_to_xpath return super(HTMLTranslator, self).css_to_xpath(css, prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\xpath.py", line 192, in css_to_xpath for selector in parse(css)) ^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 415, in parse return list(parse_selector_group(stream)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 428, in parse_selector_group yield Selector(*parse_selector(stream)) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 436, in parse_selector result, pseudo_element = parse_simple_selector(stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 544, in parse_simple_selector raise SelectorSyntaxError( cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '/' at 0> 2025-07-06 22:16:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1313/19127.htm> (referer: https://www.nepu.edu.cn/index.htm) 2025-07-06 22:16:07 [nepu_info] ERROR: ❌ 解析失败: https://www.nepu.edu.cn/info/1313/19127.htm | 错误: Expected selector, got <DELIM '/' at 0> Traceback (most recent call last): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 544, in parse_simple_selector raise SelectorSyntaxError( cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '/' at 0> 2025-07-06 22:16:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/28867.htm> (referer: https://www.nepu.edu.cn/index.htm) 2025-07-06 22:16:58 [nepu_info] ERROR: ❌ 解析失败: https://www.nepu.edu.cn/info/1049/28867.htm | 错误: Expected selector, got <DELIM '/' at 0> Traceback (most recent call last): File "C:\Users\Lenovo\nepu_qa_project\nepu_spider\spiders\info_spider.py", line 148, in parse_item date_text = response.css(selector).get() ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\scrapy\http\response\text.py", line 147, in css return self.selector.css(query) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\selector.py", line 282, in css return self.xpath(self._css2xpath(query)) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\selector.py", line 285, in _css2xpath return self._csstranslator.css_to_xpath(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\parsel\csstranslator.py", line 107, in css_to_xpath return super(HTMLTranslator, self).css_to_xpath(css, prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\xpath.py", line 192, in css_to_xpath for selector in parse(css)) ^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 415, in parse return list(parse_selector_group(stream)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 428, in parse_selector_group yield Selector(*parse_selector(stream)) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 436, in parse_selector result, pseudo_element = parse_simple_selector(stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\annaCONDA\Lib\site-packages\cssselect\parser.py", line 544, in parse_simple_selector raise SelectorSyntaxError( cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '/' at 0> 2025-07-06 22:16:58 [scrapy.core.engine] INFO: Closing spider (finished) 2025-07-06 22:16:58 [nepu_info] INFO: ✅ 数据库连接已关闭 2025-07-06 22:16:58 [nepu_info] INFO: 🛑 爬虫结束,原因: finished 2025-07-06 22:16:58 [nepu_info] INFO: 总计爬取页面: 86 2025-07-06 22:16:58 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function Spider.close at 0x000001FF77BA2C00> Traceback (most recent call last): File "D:\annaCONDA\Lib\site-packages\scrapy\utils\defer.py", line 312, in maybeDeferred_coro result = f(*args, **kw) File "D:\annaCONDA\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\annaCONDA\Lib\site-packages\scrapy\spiders\__init__.py", line 92, in close return closed(reason) File "C:\Users\Lenovo\nepu_qa_project\nepu_spider\spiders\info_spider.py", line 323, in closed json.dump(stats, f, ensure_ascii=False, indent=2) File "D:\annaCONDA\Lib\json\__init__.py", line 179, in dump for chunk in iterable: File "D:\annaCONDA\Lib\json\encoder.py", line 432, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "D:\annaCONDA\Lib\json\encoder.py", line 406, in _iterencode_dict yield from chunks File "D:\annaCONDA\Lib\json\encoder.py", line 406, in _iterencode_dict yield from chunks File "D:\annaCONDA\Lib\json\encoder.py", line 439, in _iterencode o = _default(o) File "D:\annaCONDA\Lib\json\encoder.py", line 180, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type SettingsAttribute is not JSON serializable 2025-07-06 22:16:58 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 35146, 'downloader/request_count': 96, 'downloader/request_method_count/GET': 96, 'downloader/response_bytes': 729404, 'downloader/response_count': 96, 'downloader/response_status_count/200': 88, 'downloader/response_status_count/302': 5, 'downloader/response_status_count/404': 3, 'dupefilter/filtered': 184, 'elapsed_time_seconds': 101.133916, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2025, 7, 6, 14, 16, 58, 758524), 'httpcache/firsthand': 96, 'httpcache/miss': 96, 'httpcache/store': 96, 'httpcompression/response_bytes': 2168438, 'httpcompression/response_count': 88, 'log_count/DEBUG': 120, 'log_count/ERROR': 89, 'log_count/INFO': 18, 'log_count/WARNING': 1, 'offsite/domains': 2, 'offsite/filtered': 4, 'request_depth_max': 3, 'response_received_count': 91, 'robotstxt/request_count': 1, 'robotstxt/response_count': 1, 'robotstxt/response_status_count/404': 1, 'scheduler/dequeued': 94, 'scheduler/dequeued/memory': 94, 'scheduler/enqueued': 94, 'scheduler/enqueued/memory': 94, 'start_time': datetime.datetime(2025, 7, 6, 14, 15, 17, 624608)} 2025-07-06 22:16:58 [scrapy.core.engine] INFO: Spider closed (finished)
最新发布
07-07
### Scrapy 爬虫日志分析与错误解决方案 #### CSS Selector 错误:`Expected selector got DELIM` 此错误通常出现在使用 `scrapy.Selector` 对象进行解析时,CSS 选择器格式存在问题。常见的原因包括: - **非法字符或语法错误**:CSS 选择器中存在未被正确转义的特殊字符(如空格、冒号、点号等),导致解析器在预期选择器的位置遇到了意外的分隔符(DELIM)。 - **字符串拼接问题**:动态构建选择器字符串时,由于格式不正确,导致最终构造的选择器无效。 解决方法包括: - **检查选择器语法**:确保选择器符合 CSS 标准,例如 `div.content > p.main` 是合法的,而 `div. content > p.main` 中的多余空格会导致解析失败。 - **转义特殊字符**:如果选择器中包含特殊字符,例如 `.` 或 `#`,需要使用反斜杠 `\` 进行转义,例如 `.class\.name`。 - **调试输出选择器**:在代码中打印出最终生成的选择器字符串,确认其格式是否正确[^1]。 #### TypeError: `Object of type SettingsAttribute is not JSON serializable` 该错误表示尝试使用 `json.dumps()` 序列化一个非标准 Python 数据类型对象(如 `SettingsAttribute` 类型)。Scrapy 的配置对象 `settings` 中的某些字段可能不是基本数据类型,因此无法直接序列化。 解决办法包括: - **转换为字典并过滤不可序列化字段**:将对象转换为字典形式,并手动移除或转换其中的不可序列化字段。例如: ```python import json def process_item(self, item, spider): try: data = json.dumps(dict(item), ensure_ascii=False) self.file.write(data + '\n') except TypeError as e: # 处理异常,例如记录日志或跳过不可序列化的字段 pass return item ``` - **自定义 JSON 编码器**:继承 `json.JSONEncoder` 并重写 `default()` 方法,以支持自定义对象的序列化。例如: ```python class CustomEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, SomeCustomType): return str(obj) # 或者返回可序列化的结构 return super().default(obj) data = json.dumps(obj, cls=CustomEncoder, ensure_ascii=False) ``` - **使用第三方库进行序列化**:如 `jsonpickle` 可用于序列化任意类型的对象,尽管这可能会牺牲一定的性能和安全性。 #### 日志分析技巧 - **启用详细日志记录**:在 `settings.py` 中设置 `LOG_LEVEL = 'DEBUG'`,以便捕获更详细的运行时信息,帮助定位问题源头。 - **查看爬虫启动阶段日志**:关注爬虫初始化阶段的日志,尤其是中间件和扩展加载部分,可以发现潜在的配置冲突或依赖问题。 - **使用 `scrapy shell` 调试选择器**:通过命令行工具交互式测试 CSS 或 XPath 表达式,快速验证选择器是否有效。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值