Python解析json文件报错:'utf8' codec can't decode byte 0xbb in position 0: invalid start byte

本文介绍了在Python中遇到的JSON解码错误“'utf8' codec can't decode byte 0xbb in position 0: invalid start byte”的原因及解决办法。此错误通常是因为文件头部存在BOM标记导致,文中提供了如何通过Sublime Text去除BOM标记的方法。

今天尝试读一个json文件,数据为一行,字典列表形式,结果一直报错

代码

[python]  view plain  copy
  1. f = file('relation.json')  
  2. d = json.load(f)  
报错

'utf8' codec can't decode byte 0xbb in position 0: invalid start byte


在使用Python的时候,经常会碰到这个报错,之前一直弄不清楚是什么问题,今天专门研究了一下,报错的意思大概是,无法解码,在位置0处有非法的开始字节。

了解后发现,在utf-8编码文件中BOM在文件头部,占用三个字节,用来标示该文件属于utf-8编码,现在已经有很多软件识别bom头,但是还有些不能识别bom头,比如PHP就不能识别bom头,这也是用记事本编辑utf-8编码后执行就会出错的原因了。

解决方案:

打开sublime,新建文件,选择File->save with encoding->UTF-8, 就可以了。


参考:

http://www.crifan.com/fixed_problem_for_python_valueerror_no_json_object_could_be_decoded/

http://jingyan.baidu.com/article/9f63fb91d72eb5c8410f0e44.html





转载自:https://i-blog.csdnimg.cn/blog_migrate/b1f6921232f24c571c0f50de299f4758.png

PS D:\DATAJUICER> python data-juicer-main/tools/postprocess/count_token.py ` >> --data_path ds2.jsonl ` >> --text_keys text ` >> --tokenizer_method gpt2 ` >> --num_proc 1 2025-07-07 16:17:33.994 | INFO | __main__:prepare_tokenizer:22 - Loading tokenizer from HuggingFace... 0it [00:00, ?it/s] Traceback (most recent call last): File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 61, in <module> fire.Fire(main) File "D:\software\python\Lib\site-packages\fire\core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\fire\core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "D:\software\python\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "D:\DATAJUICER\data-juicer-main\tools\postprocess\count_token.py", line 44, in main for sample in tqdm(reader): File "D:\software\python\Lib\site-packages\tqdm\std.py", line 1181, in __iter__ for obj in iterable: File "D:\software\python\Lib\site-packages\jsonlines\jsonlines.py", line 434, in iter yield self.read( ^^^^^^^^^^ File "D:\software\python\Lib\site-packages\jsonlines\jsonlines.py", line 307, in read lineno, line = next(self._line_iter) ^^^^^^^^^^^^^^^^^^^^^ File "<frozen codecs>", line 322, in decode File "D:\software\python\Lib\encodings\utf_8_sig.py", line 69, in _buffer_decode return codecs.utf_8_decode(input, errors, final) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
07-08
Exception in Tkinter callback Traceback (most recent call last): File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 330, in find_cookie line_string = line.decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 25: invalid continuation byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "_pydevd_bundle/pydevd_cython.pyx", line 532, in _pydevd_bundle.pydevd_cython.PyDBFrame._handle_exception File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\linecache.py", line 16, in getline lines = getlines(filename, module_globals) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\linecache.py", line 47, in getlines return updatecache(filename, module_globals) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\linecache.py", line 136, in updatecache with tokenize.open(fullname) as fp: File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 394, in open encoding, lines = detect_encoding(buffer.readline) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 381, in detect_encoding encoding = find_cookie(second) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 335, in find_cookie raise SyntaxError(msg) SyntaxError: invalid or missing encoding declaration for 'D:\\project\\desktop_jijun\\desktop_jijun\\desktop_jijun.py' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 330, in find_cookie line_string = line.decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 25: invalid continuation byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tkinter\__init__.py", line 1883, in __call__ return self.func(*args) File "D:\project\desktop_jijun\desktop_jijun\desktop_jijun.py", line 169, in consaltant content = response.content.decode('gbk') File "_pydevd_bundle/pydevd_cython.pyx", line 1366, in _pydevd_bundle.pydevd_cython.SafeCallWrapper.__call__ File "_pydevd_bundle/pydevd_cython.pyx", line 322, in _pydevd_bundle.pydevd_cython.PyDBFrame.trace_exception File "_pydevd_bundle/pydevd_cython.pyx", line 452, in _pydevd_bundle.pydevd_cython.PyDBFrame.handle_user_exception File "_pydevd_bundle/pydevd_cython.pyx", line 535, in _pydevd_bundle.pydevd_cython.PyDBFrame._handle_exception File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\linecache.py", line 16, in getline lines = getlines(filename, module_globals) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\linecache.py", line 47, in getlines return updatecache(filename, module_globals) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\linecache.py", line 136, in updatecache with tokenize.open(fullname) as fp: File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 394, in open encoding, lines = detect_encoding(buffer.readline) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 381, in detect_encoding encoding = find_cookie(second) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 335, in find_cookie raise SyntaxError(msg) SyntaxError: invalid or missing encoding declaration for 'D:\\project\\desktop_jijun\\desktop_jijun\\desktop_jijun.py' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 330, in find_cookie line_string = line.decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 25: invalid continuation byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "d:\vs\benti\common7\ide\extensions\microsoft\python\core\debugpy\__main__.py", line 45, in <module> cli.main() File "d:\vs\benti\common7\ide\extensions\microsoft\python\core\debugpy/..\debugpy\server\cli.py", line 444, in main run() File "d:\vs\benti\common7\ide\extensions\microsoft\python\core\debugpy/..\debugpy\server\cli.py", line 285, in run_file runpy.run_path(target_as_str, run_name=compat.force_str("__main__")) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\project\desktop_jijun\desktop_jijun\desktop_jijun.py", line 197, in <module> root.mainloop() File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tkinter\__init__.py", line 1420, in mainloop self.tk.mainloop(n) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tkinter\__init__.py", line 1887, in __call__ self.widget._report_exception() File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tkinter\__init__.py", line 1603, in _report_exception root.report_callback_exception(exc, val, tb) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tkinter\__init__.py", line 2341, in report_callback_exception traceback.print_exception(exc, val, tb) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\traceback.py", line 103, in print_exception for line in TracebackException( File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\traceback.py", line 509, in __init__ self.stack = StackSummary.extract( File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\traceback.py", line 366, in extract f.line File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\traceback.py", line 288, in line self._line = linecache.getline(self.filename, self.lineno).strip() File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\linecache.py", line 16, in getline lines = getlines(filename, module_globals) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\linecache.py", line 47, in getlines return updatecache(filename, module_globals) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\linecache.py", line 136, in updatecache with tokenize.open(fullname) as fp: File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 394, in open encoding, lines = detect_encoding(buffer.readline) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 381, in detect_encoding encoding = find_cookie(second) File "C:\Users\lry20\AppData\Local\Programs\Python\Python38\lib\tokenize.py", line 335, in find_cookie raise SyntaxError(msg) SyntaxError: invalid or missing encoding declaration for 'D:\\project\\desktop_jijun\\desktop_jijun\\desktop_jijun.py' a Press any key t
06-20
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值