网格搜索报错:UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 18-20: ordinal not in range(128)
E:\DLstudy\Scripts\python.exe E:/PycharmProjects/DLstudy/run/train_model.py
[INFO] tuning hyperparameters...
Traceback (most recent call last):
File "E:\PycharmProjects\DLstudy\run\train_model.py", line 22, in <module>
model.fit(trainX, trainY)
File "E:\DLstudy\lib\site-packages\sklearn\model_selection\_search.py", line 820, in fit
with parallel:
File "E:\DLstudy\lib\site-packages\joblib\parallel.py", line 725, in __enter__
self._initialize_backend()
File "E:\DLstudy\lib\site-packages\joblib\parallel.py", line 735, in _initialize_backend
n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
File "E:\DLstudy\lib\site-packages\joblib\_parallel_backends.py", line 494, in configure
self._workers = get_memmapping_executor(
File "E:\DLstudy\lib\site-packages\joblib\executor.py", line 20, in get_memmapping_executor
return MemmappingExecutor.get_memmapping_executor(n_jobs, **kwargs)
File "E:\DLstudy\lib\site-packages\joblib\executor.py", line 42, in get_memmapping_executor
manager = TemporaryResourcesManager(temp_folder)
File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 531, in __init__
self.set_current_context(context_id)
File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 535, in set_current_context
self.register_new_context(context_id)
File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 560, in register_new_context
self.register_folder_finalizer(new_folder_path, context_id)
File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 590, in register_folder_finalizer
resource_tracker.register(pool_subfolder, "folder")
File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 191, in register
self._send('REGISTER', name, rtype)
File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 204, in _send
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)
Process finished with exit code 1
解决方法
原报错代码:
model = GridSearchCV(LogisticRegression(), params, cv=3, n_jobs=-1)
将参数n_job=-1
参数删去即可,改为:
model = GridSearchCV(LogisticRegression(), params, cv=3)
大概看了一下,这个参数是指示我们需要多少个处理器进行工作的
n_jobs : int, default=None
Number of jobs to run in parallel.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.
如果指定n_jobs=-1的话,在底层有一步,使用ascii进行编码,但是每次都会编码失败。
所以如果我们不指定这个参数,则默认使用1个处理器进行工作。
如果确实想指定多个处理器
那么则需要修改我们的报错信息中出现问题的路径的代码了。
例如在我们报错信息中:
File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 204, in _send
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)
说明我出错的位置在E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py的204行 _send这个方法中,点进去
_send函数源代码:
def _send(self, cmd, name, rtype):
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
if len(name) > 512:
# posix guarantees that writes to a pipe of less than PIPE_BUF
# bytes are atomic, and that PIPE_BUF >= 512
raise ValueError('name too long')
nbytes = os.write(self._fd, msg)
assert nbytes == len(msg)
将
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
改为
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('utf8')
也就是将编码方式改为utf-8,更改后代码如下:
```python
def _send(self, cmd, name, rtype):
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('utf8')
if len(name) > 512:
# posix guarantees that writes to a pipe of less than PIPE_BUF
# bytes are atomic, and that PIPE_BUF >= 512
raise ValueError('name too long')
nbytes = os.write(self._fd, msg)
assert nbytes == len(msg)
然后再次运行代码。
别着急,还是会报错。因为我们只是修改了编码方式,还没有修改解码方式。
报错信息如下:
.............(省略)
splitted = line.strip().decode('ascii').split(':')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)
Traceback (most recent call last):
File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 253, in main
splitted = line.strip().decode('ascii').split(':')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)
同样的,在报错信息中找到出错的路径:E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py文件第253行出错了,我们找到相应的位置:
......(省略)
with open(fd, 'rb') as f:
while True:
line = f.readline()
if line == b'': # EOF
break
try:
splitted = line.strip().decode('ascii').split(':')
# name can potentially contain separator symbols (for
# instance folders on Windows)
cmd, name, rtype = (
splitted[0], ':'.join(splitted[1:-1]), splitted[-1])
......(省略)
我们只需要将
line.strip().decode('ascii').split(':')
替换为
line.strip().decode(''utf8).split(':')
、
即可。
再次运行文件即可成功。