关于Bit Fields的两篇资料

本文详细介绍了C++中的位域(bit-field)概念及其在类和结构体中的使用方法。包括位域的基本语法、如何在类中声明位域、位域的布局及访问方式等关键信息。

C++ Language Reference

 

C++ Bit Fields

See Also

Classes, Structures, and Unions

Classes and structures can contain members that occupy less storage than an integral type. These members are specified as bit fields. The syntax for bit-field member-declarator specification follows:

declaratoropt  : constant-expression

The declarator is the name by which the member is accessed in the program. It must be an integral type (including enumerated types). The constant-expression specifies the number of bits the member occupies in the structure. Anonymous bit fields — that is, bit-field members with no identifier — can be used for padding.

Note   An unnamed bit field of width 0 forces alignment of the next bit field to the next type boundary, where type is the type of the member.

The following example declares a structure that contains bit fields:

// bit_fields1.cpp

struct Date

{

   unsigned nWeekDay  : 3;    // 0..7   (3 bits)

   unsigned nMonthDay : 6;    // 0..31  (6 bits)

   unsigned nMonth    : 5;    // 0..12  (5 bits)

   unsigned nYear     : 8;    // 0..100 (8 bits)

};

 

int main()

{

}

The conceptual memory layout of an object of type Date is shown in the following figure.

Memory Layout of Date Object

 

Note that nYear is 8 bits long and would overflow the word boundary of the declared type, unsigned int. Therefore, it is begun at the beginning of a new unsigned int. It is not necessary that all bit fields fit in one object of the underlying type; new units of storage are allocated, according to the number of bits requested in the declaration.

Microsoft Specific

The ordering of data declared as bit fields is from low to high bit, as shown in the figure above.

END Microsoft Specific

If the declaration of a structure includes an unnamed field of length 0, as shown in the following example,

// bit_fields2.cpp

struct Date

{

   unsigned nWeekDay  : 3;    // 0..7   (3 bits)

   unsigned nMonthDay : 6;    // 0..31  (6 bits)

   unsigned           : 0;    // Force alignment to next boundary.

   unsigned nMonth    : 5;    // 0..12  (5 bits)

   unsigned nYear     : 8;    // 0..100 (8 bits)

};

 

int main()

{

}

the memory layout is as shown in the following figure.

Layout of Date Object with Zero-Length Bit Field

The underlying type of a bit field must be an integral type, as described in Fundamental Types.

© Microsoft Corporation. All rights reserved.

以上摘自MSDN Library for Visual Studio .Net 2003

 

 

位域(bit-field):一种压缩空间的成员

这两天在看公司里关于MPEG -2 流结构的代码,发现师傅写关于那些结构的代码的时候出现了如下类似的代码(这里的载自C++ Prime)。

typedef unsigned int Bit;

class File {

public:

Bit mode           : 2;

Bit modified      : 1;

Bit prot_owner : 3;

Bit prot_group   : 3;

Bit prot_world   : 3;

// ...

};

    刚开始觉得有些怪怪的,虽然从结构中大概知道是什么意思,但是具体的却不怎么知道。可能是C/C++的知识有些遗漏,实践比较少吧。于是查找资料,得知:

这是一种被称为位域bit-field 特殊的类数据成员。它可以被声明用来存放特定数目的位,位域必须是有序数据类型,它可以有符号也可以无符号例如:

class File {

// ...

unsigned int modified : 1; // 位域 (bit-field)

};

位域标识符后面跟有一个冒号,然后是一个常量表达式指定位数,例如,modified 是一个只有一位构成的位域。

在类体中相邻定义的位域,如果可能的话,它们会被放在同一个整数的连续位中,并以此提供空间压缩。例如,在下列声明中,5 个位域被存储在单个unsigned int 中,它首先与位域mode 相关联

typedef unsigned int Bit;

class File {

public:

Bit mode: 2;

Bit modified: 1;

Bit prot_owner: 3;

Bit prot_group: 3;

Bit prot_world: 3;

// ...

};

对于位域的访问方式与其他类数据成员相同。例如,类的私有位域只能在类的成员函数和友元中被访问:

void File::write()

{

modified = 1;

// ...

}

void File::close()

{

if ( modified )

// ... 内容从略

}

下面的例子说明了怎样使用大于1 位的位域。

enum { READ = 01, WRITE = 02 }; // 文件模式

int main() {

File myFile;

myFile.mode |= READ;

if ( myFile.mode & READ )

cout << "myFile.mode is set to READ/n";

}

通常情况下我们会定义一组inline 成员函数,来测试每个位域成员的值。例如,类File可以定义成员isRead()isWrite()

inline int File::isRead() { return mode & READ; }

inline int File::isWrite() { return mode & WRITE; }

 

if ( myFile.isRead() ) /* ... */

有了这些成员函数,现在位域可以被声明为类File 的私有成员。

由于取地址操作符& 不能被应用在位域上,所以也没有能指向类的位域的指针位域也不能是类的静态成员

C++标准库提供了一个bitset 类模板,它可以辅助操纵位的集合。在可能的情况下,应尽可能使用它来取代位域。要是你对标准库还没没有什么概念,那你只好用这个了。

之前所说的位域都在类中体现的,那么结构体中呢?

定义位域结构的格式为:

    struct <结构体类型名>

     {

<类型1> <位域名1>:<位数>;

        <类型2> <位域名2>:<位数>;

        …

        <类型n> <位域名n>:<位数>;

     };

其中,<类型>必须是unsignedsignedint类型;<位域名>是用户命名的标识符;

<位数>表示该位域所占的二进制位数,是一个正整数。

    例如:

    struct bit8253

     {

      unsigned bcd:1;

      unsigned m:3;

      unsigned rl:2;

      unsigned sc:2;

      unsigned black:24;

     };

    说明:

    1)对于各个位域必须依次从低到高进行定义;

    2)位数为1的位域只能用unsigned类型;

    3)每个位域的位数应小于计算机的字长,但各个位域的总长度则可以大于一个字长,超过字长的部分放在下一个存储单元中。

对于这些说明只是针对结构体的呢?还是对于类中的位域一样有适用,还是部分适用呢?有待考证,但是这些都是比较细节的东西了,编程中如果遇到,要是出了错误往这里想就好了。正如福尔摩斯大致所说:除去那些不可能的,就是这些可能的了。

       了,我个人觉得位域还是比较有用的。比如在码流结构某些传输协议结构中都有一些字段小于计算机的字长,但是他们某几段的总长度又是比如4个字 节。还有就是有人提出现在的内存都是那么大的了,没有必要那么抠门了吧。但是,有人说过:程序员是为程序服务的,而机器不是。这句话的意思显然了 吧。好了,关于位域就说到此了,希望能使你也能有所收获。

 

 

参考:C++Prime

           潭浩强:C++程序设计

以上摘自:http://blog.youkuaiyun.com/Adrian_Bu/archive/2006/08/03/1015318.aspx

wave注:

(此文内容基本翻译自C++ Primer)

或者另外参见《C++ Primer

 
>> PS C:\Users\童琪琪\Desktop\bishe.6\biyesheji.6\movie_analysis_project> PS C:\Users\童琪琪\Desktop\bishe.6\biyesheji.6\movie_analysis_project\movie_analysis_project> scrapy crawl maoyan -O output.json 2025-11-14 20:48:15 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: movie_analysis_project) 2025-11-14 20:48:15 [scrapy.utils.log] INFO: Versions: lxml 5.2.1.0, libxml2 2.13.1, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)], pyOpenSSL 24.2.1 (OpenSSL 3.0.15 3 Sep 2024), cryptography 43.0.0, Platform Windows-11-10.0.26100-SP0 2025-11-14 20:48:15 [scrapy.addons] INFO: Enabled addons: [] 2025-11-14 20:48:16 [scrapy.middleware] INFO: Enabled extensions: [&#39;scrapy.extensions.feedexport.FeedExporter&#39;, &#39;scrapy.extensions.corestats.CoreStats&#39;, &#39;scrapy.extensions.logstats.LogStats&#39;, &#39;scrapy.extensions.throttle.AutoThrottle&#39;] 2025-11-14 20:48:16 [scrapy.crawler] INFO: Overridden settings: {&#39;AUTOTHROTTLE_ENABLED&#39;: True, &#39;AUTOTHROTTLE_MAX_DELAY&#39;: 10, &#39;AUTOTHROTTLE_START_DELAY&#39;: 3, &#39;BOT_NAME&#39;: &#39;movie_analysis_project&#39;, &#39;CONCURRENT_REQUESTS&#39;: 1, &#39;CONCURRENT_REQUESTS_PER_DOMAIN&#39;: 1, &#39;DNSCACHE_SIZE&#39;: 1000, &#39;DOWNLOAD_DELAY&#39;: 3, &#39;DOWNLOAD_TIMEOUT&#39;: 10, &#39;FEED_EXPORT_ENCODING&#39;: &#39;utf-8&#39;, &#39;LOG_LEVEL&#39;: &#39;INFO&#39;, &#39;NEWSPIDER_MODULE&#39;: &#39;movie_analysis_project.spiders&#39;, &#39;REQUEST_FINGERPRINTER_IMPLEMENTATION&#39;: &#39;2.7&#39;, &#39;RETRY_HTTP_CODES&#39;: [500, 502, 503, 504, 408, 403, 400, 404], &#39;RETRY_TIMES&#39;: 3, &#39;SPIDER_MODULES&#39;: [&#39;movie_analysis_project.spiders&#39;], &#39;TELNETCONSOLE_ENABLED&#39;: False, &#39;USER_AGENT&#39;: &#39;Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) &#39; &#39;AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 &#39; &#39;MicroMessenger/8.0.40(0x18002831) NetType/WIFI Language/zh_CN&#39;} 2025-11-14 20:48:17 [scrapy.middleware] INFO: Enabled downloader middlewares: [&#39;scrapy.downloadermiddlewares.retry.RetryMiddleware&#39;, &#39;scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware&#39;, &#39;scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware&#39;, &#39;scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware&#39;, &#39;scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware&#39;, &#39;scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware&#39;, &#39;scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware&#39;, &#39;scrapy.downloadermiddlewares.redirect.RedirectMiddleware&#39;, &#39;scrapy.downloadermiddlewares.cookies.CookiesMiddleware&#39;, &#39;scrapy.downloadermiddlewares.stats.DownloaderStats&#39;] 2025-11-14 20:48:17 [scrapy.middleware] INFO: Enabled spider middlewares: [&#39;scrapy.spidermiddlewares.httperror.HttpErrorMiddleware&#39;, &#39;scrapy.spidermiddlewares.offsite.OffsiteMiddleware&#39;, &#39;scrapy.spidermiddlewares.referer.RefererMiddleware&#39;, &#39;scrapy.spidermiddlewares.urllength.UrlLengthMiddleware&#39;, &#39;scrapy.spidermiddlewares.depth.DepthMiddleware&#39;] 2025-11-14 20:48:17 [scrapy.middleware] INFO: Enabled item pipelines: [&#39;movie_analysis_project.pipelines.MySQLPipeline&#39;] 2025-11-14 20:48:17 [scrapy.core.engine] INFO: Spider opened 2025-11-14 20:48:17 [maoyan] INFO: ✅ 成功连接到数据库 movie_analysis 2025-11-14 20:48:17 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-11-14 20:48:17 [maoyan] INFO: ✅ 成功获取响应: https://m.maoyan.com/ajax/movieOnInfoList, 状态码: 200 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;花江夏树,鬼头明里,下野纮&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-11-14&#39;, &#39;score&#39;: 9.7, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;鬼灭之刃:无限城篇 第一章 猗窝座再袭&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;杰西·艾森伯格,伍迪·哈里森,戴夫·弗兰科&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-11-14&#39;, &#39;score&#39;: 0.0, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;惊天魔盗团3&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;陈子平,路扬,董汶亮&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-08-02&#39;, &#39;score&#39;: 9.5, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;浪浪山小妖怪&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;李庚希,邓家佳,刘奕铁&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-10-31&#39;, &#39;score&#39;: 0.0, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;即兴谋杀&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;黄渤,范丞丞,殷桃&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-09-30&#39;, &#39;score&#39;: 9.6, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;浪浪人生&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;艾丽·范宁,迪米特里乌斯·舒斯特-科洛阿玛坦吉,Cameron Brown&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-11-07&#39;, &#39;score&#39;: 9.0, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;铁血战士:杀戮之地&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;易烊千玺,舒淇,赵又廷&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-11-22&#39;, &#39;score&#39;: 0.0, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;狂野时代&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;山姆·洛克威尔,马克·马龙,奥卡菲娜&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-08-16&#39;, &#39;score&#39;: 9.3, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;坏蛋联盟2&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;张子枫,宋佳,朱亚文&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-09-30&#39;, &#39;score&#39;: 9.7, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;志愿军:浴血和平&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;绪方惠美,林原惠美,宫村优子&#39;, &#39;category&#39;: &#39;未知&#39;, &#39;director&#39;: &#39;&#39;, &#39;release_date&#39;: &#39;2025-10-31&#39;, &#39;score&#39;: 9.4, &#39;source&#39;: &#39;maoyan&#39;, &#39;title&#39;: &#39;天鹰战士:最后的冲击&#39;} 2025-11-14 20:48:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method FeedExporter.item_scraped of <scrapy.extensions.feedexport.FeedExporter object at 0x00000232B0F17A10>> Traceback (most recent call last): File "D:\Anaconda\Lib\site-packages\scrapy\utils\defer.py", line 348, in maybeDeferred_coro result = f(*args, **kw) File "D:\Anaconda\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "D:\Anaconda\Lib\site-packages\scrapy\extensions\feedexport.py", line 572, in item_scraped slot.exporter.export_item(item) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 150, in export_item itemdict = dict(self._get_serialized_fields(item)) File "D:\Anaconda\Lib\site-packages\scrapy\exporters.py", line 65, in _get_serialized_fields item = ItemAdapter(item) File "D:\Anaconda\Lib\site-packages\itemadapter\adapter.py", line 225, in __init__ raise TypeError(f"No adapter found for objects of type: {type(item)} ({item})") TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None) 2025-11-14 20:48:17 [scrapy.core.engine] INFO: Closing spider (finished) 2025-11-14 20:48:17 [maoyan] INFO: 📦 数据库连接已关闭 2025-11-14 20:48:17 [scrapy.extensions.feedexport] INFO: Stored csv feed (0 items) in: output.csv 2025-11-14 20:48:17 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {&#39;downloader/request_bytes&#39;: 462, &#39;downloader/request_count&#39;: 1, &#39;downloader/request_method_count/GET&#39;: 1, &#39;downloader/response_bytes&#39;: 3453, &#39;downloader/response_count&#39;: 1, &#39;downloader/response_status_count/200&#39;: 1, &#39;elapsed_time_seconds&#39;: 0.813219, &#39;feedexport/success_count/FileFeedStorage&#39;: 1, &#39;finish_reason&#39;: &#39;finished&#39;, &#39;finish_time&#39;: datetime.datetime(2025, 11, 14, 12, 48, 17, 967632, tzinfo=datetime.timezone.utc), &#39;httpcompression/response_bytes&#39;: 9172, &#39;httpcompression/response_count&#39;: 1, &#39;item_scraped_count&#39;: 10, &#39;log_count/ERROR&#39;: 10, &#39;log_count/INFO&#39;: 12, &#39;log_count/WARNING&#39;: 10, &#39;response_received_count&#39;: 1, &#39;scheduler/dequeued&#39;: 1, &#39;scheduler/dequeued/memory&#39;: 1, &#39;scheduler/enqueued&#39;: 1, &#39;scheduler/enqueued/memory&#39;: 1, &#39;start_time&#39;: datetime.datetime(2025, 11, 14, 12, 48, 17, 154413, tzinfo=datetime.timezone.utc)} 2025-11-14 20:48:17 [scrapy.core.engine] INFO: Spider closed (finished) PS C:\Users\童琪琪\Desktop\bishe.6\biyesheji.6\movie_analysis_project\movie_analysis_project>
最新发布
11-15
你现在的日志非常关键,我们来 **逐层分析问题本质**。 --- ## ✅ 当前状态总结 | 项目 | 状态 | |------|------| | ✅ 请求接口 | 成功(状态码 200) | | ✅ 解析 JSON | 成功(获取到 10 部电影) | | ✅ 数据库存储 | ❌ 没有插入数据(因为 `item` 被跳过) | | ❌ 输出文件 | `output.json` 显示存储了 0 条记录 | | ⚠️ 核心错误 | `TypeError: No adapter found for objects of type: <class &#39;NoneType&#39;> (None)` | | 📊 统计信息 | `&#39;item_scraped_count&#39;: 10` → 实际上是“发送了 10 次信号”,但都不是合法 `Item` | --- ## 🔍 关键线索:为什么 `item_scraped_count=10` 却报错? 看这段日志: ```text [maoyan] WARNING: ⚠️ 接收到空或非法 item: {&#39;actors&#39;: &#39;花江夏树,鬼头明里,下野纮&#39;, ...} ``` 这说明你在 `pipelines.py` 中打印的是一个 **字典对象**,而不是 `MovieItem` 对象! 而 Scrapy 的 `FeedExporter` 和 `ItemAdapter` 只能处理: - `scrapy.Item` 子类实例(如 `MovieItem()`) - 或支持的 dict-like 类型 你现在传进去的是普通 Python 字典 → 导致 `ItemAdapter(item)` 抛出异常 → `TypeError: No adapter found` --- ## ❗ 根本原因 > 💣 你在爬虫中 **yield 的是一个字典,而不是 `MovieItem` 实例** 虽然你写了: ```python item = MovieItem() item[&#39;title&#39;] = ... ... yield item ``` 但很可能在某个地方被修改成了字典,或者你的 `MovieItem` 定义有问题。 --- ## ✅ 第一步:检查 `items.py` 确保 `MovieItem` 正确定义为 `scrapy.Item` 的子类。 ### ✅ 正确写法: ```python # items.py import scrapy class MovieItem(scrapy.Item): title = scrapy.Field() category = scrapy.Field() actors = scrapy.Field() director = scrapy.Field() release_date = scrapy.Field() score = scrapy.Field() source = scrapy.Field() ``` ⚠️ 错误示例如下(常见错误): ```python # ❌ 错误!这不是 Item,只是一个普通类 class MovieItem: def __init__(self): self.title = &#39;&#39; self.category = &#39;&#39; # ... ``` 或: ```python # ❌ 错误!继承错了 from dataclasses import dataclass @dataclass class MovieItem: title: str # ... ``` 这些都会导致 `ItemAdapter` 无法识别,最终传给 pipeline 的是不可序列化的对象或字典。 --- ## ✅ 第二步:确认 `spiders/maoyan.py` 中是否正确使用 `MovieItem` 以下是 **正确的用法**: ```python from movie_analysis_project.items import MovieItem def parse_movie_list(self, response): data = json.loads(response.text) movies = data.get(&#39;movieList&#39;, [])[:10] for m in movies: item = MovieItem() # ✅ 必须这样创建! item[&#39;title&#39;] = m.get(&#39;nm&#39;, &#39;未知&#39;) item[&#39;category&#39;] = &#39;,&#39;.join(m.get(&#39;cat&#39;, [&#39;未知&#39;])) item[&#39;actors&#39;] = m.get(&#39;star&#39;, &#39;未知&#39;).strip() item[&#39;director&#39;] = m.get(&#39;direcate&#39;, &#39;&#39;) or &#39;&#39; item[&#39;release_date&#39;] = m.get(&#39;rt&#39;, &#39;&#39;).split(&#39; &#39;)[0] item[&#39;score&#39;] = float(m.get(&#39;sc&#39;, 0)) if m.get(&#39;sc&#39;) else 0.0 item[&#39;source&#39;] = &#39;maoyan&#39; yield item # ✅ yield MovieItem 实例 ``` 如果你做了类似这样的操作,就会出错: ```python yield dict(item) # ❌ 转成字典就不再是 Item 了 ``` 或: ```python result = { &#39;title&#39;: ..., # ... } yield result # ❌ 直接 yield 字典 ``` --- ## ✅ 第三步:修改 `pipelines.py` 添加类型检查 在 pipeline 中加入判断,防止非 Item 类型进入: ```python # pipelines.py from scrapy.exceptions import DropItem from itemadapter import ItemAdapter import pymysql class MySQLPipeline: def open_spider(self, spider): try: self.connection = pymysql.connect( host=&#39;localhost&#39;, port=3306, user=&#39;root&#39;, password=&#39;123456&#39;, # 改成你的密码 database=&#39;movie_analysis&#39;, charset=&#39;utf8mb4&#39;, autocommit=False ) self.cursor = self.connection.cursor() spider.logger.info("✅ 成功连接到数据库 movie_analysis") except Exception as e: spider.logger.error(f"❌ 数据库连接失败: {e}") raise def close_spider(self, spider): if self.cursor: self.cursor.close() if self.connection: self.connection.close() spider.logger.info("📦 数据库连接已关闭") def process_item(self, item, spider): # 🔍 加强判别:必须是 Item 或可用 Adapter 包装的对象 try: adapter = ItemAdapter(item) # 这里会抛异常如果不能适配 except TypeError as e: spider.logger.error(f"❌ 不可识别的 item 类型: {type(item)} -> {item}") raise DropItem(f"无法处理 item: {e}") # ✅ 提取字段入库 try: sql = """ INSERT INTO movies (title, category, actors, director, release_date, score, source) VALUES (%s, %s, %s, %s, %s, %s, %s) ON DUPLICATE KEY UPDATE score = VALUES(score), actors = VALUES(actors) """ self.cursor.execute(sql, ( adapter.get(&#39;title&#39;), adapter.get(&#39;category&#39;), adapter.get(&#39;actors&#39;), adapter.get(&#39;director&#39;), adapter.get(&#39;release_date&#39;), adapter.get(&#39;score&#39;), adapter.get(&#39;source&#39;) )) self.connection.commit() except Exception as e: self.connection.rollback() spider.logger.error(f"❌ 存储失败: {e}, Item: {adapter.asdict()}") return item ``` --- ## ✅ 最终验证方法 添加一行调试日志到 `parse_movie_list`: ```python self.logger.debug(f"📄 item 类型: {type(item)}, 内容: {dict(item)}") ``` 运行后你会看到: ✅ 正常输出应为: ``` 📄 item 类型: <class &#39;movie_analysis_project.items.MovieItem&#39;>, 内容: {&#39;title&#39;: &#39;飞驰人生2&#39;, ...} ``` ❌ 如果出现: ``` 📄 item 类型: <class &#39;dict&#39;>, 内容: {&#39;title&#39;: &#39;...&#39;, ...} ``` 说明你 somewhere 把 `MovieItem` 转成了字典! --- ## ✅ 总结:你需要做的三件事 | 步骤 | 操作 | |------|------| | 1️⃣ | ✅ 确认 `items.py` 中 `MovieItem` 继承自 `scrapy.Item` | | 2️⃣ | ✅ 爬虫中 `yield MovieItem()` 实例,不要 `yield dict(...)` | | 3️⃣ | ✅ 在 pipeline 使用 `ItemAdapter(item)` 前做类型校验 | --- 现在再次运行: ```bash scrapy crawl maoyan -O output.json --loglevel=DEBUG ``` 你应该看到: - 控制台无 `TypeError` - `output.json` 包含 10 条结构化数据 - 数据库成功插入 10 条电影 - 日志显示 `Stored json feed (10 items) in: output.json` ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值