关键报错信息:
libcufft.so.11: cannot open shared object file: No such file or directory
或类似的动态库缺失的报错。
完整报错信息:
Traceback (most recent call last):
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 174, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcufft.so.11: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 1, in
from simcse import SimCSE
File "/root/SimCSE/simcse/__init__.py", line 1, in
from .tool import SimCSE
File "/root/SimCSE/simcse/tool.py", line 5, in
import torch
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 234, in
_load_global_deps()
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 195, in _load_global_deps
_preload_cuda_deps(lib_folder, lib_name)
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 161, in _preload_cuda_deps
ctypes.CDLL(lib_path)
File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libnvJitLink.so.12: cannot open shared object file: No such file or directory
后续报错:
Traceback (most recent call last):
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 174, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 1, in
from simcse import SimCSE
File "/root/SimCSE/simcse/__init__.py", line 1, in
from .tool import SimCSE
File "/root/SimCSE/simcse/tool.py", line 5, in
import torch
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 234, in
_load_global_deps()
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 195, in _load_global_deps
_preload_cuda_deps(lib_folder, lib_name)
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 161, in _preload_cuda_deps
ctypes.CDLL(lib_path)
File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libnvJitLink.so.12: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 174, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcublas.so.12: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 1, in
from simcse import SimCSE
File "/root/SimCSE/simcse/__init__.py", line 1, in
from .tool import SimCSE
File "/root/SimCSE/simcse/tool.py", line 5, in
import torch
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 234, in
_load_global_deps()
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 195, in _load_global_deps
_preload_cuda_deps(lib_folder, lib_name)
File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 161, in _preload_cuda_deps
ctypes.CDLL(lib_path)
File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.12: cannot open shared object file: No such file or directory
问题原因分析:
由于pip 方式安装的动态库,在python main.py 方式加载时,python无法在系统的LD_LIBRARY_PATH等已知环境变量中搜索到动态库路径,所以启动报错。
解决方式:
在Linux系统新增动态库搜索路径信息,新增以下文件
/etc/ld.so.conf.d/tenser.conf
文件内容把涉及到的 模块的所需的 动态录的路径加入到环境中。请根据自己模块的路径情况调整。我的文件内容如下,:
/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_cufft_cu12-11.0.2.54-py3.8-linux-x86_64.egg/nvidia/cufft/lib/
/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_curand_cu12-10.3.2.106-py3.8-linux-x86_64.egg/nvidia/curand/lib/
/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_nvjitlink_cu12-12.3.101-py3.8-linux-x86_64.egg/nvidia/nvjitlink/lib/
/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_cusparse_cu12-12.1.0.106-py3.8-linux-x86_64.egg/nvidia/cusparse/lib/
那么 如何知道
OSError: libcusparse.so.12: cannot open shared object file: No such file or directory
这里的动态库在系统的哪个路径下呢?
可以通过 find 命令查找文件,例如:
# find / -type f -iname "libcusparse.so*"
/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_cusparse_cu12-12.1.0.106-py3.8-linux-x86_64.egg/nvidia/cusparse/lib/libcusparse.so.12
那么对应的 动态库所在路径就是
/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_cusparse_cu12-12.1.0.106-py3.8-linux-x86_64.egg/nvidia/cusparse/lib/
这个路径就可以加入到 /etc/ld.so.conf.d/tenser.conf
文件中,这个文件的名称可以任意,但都要以.conf 作为文件后缀名。
最好记得要执行一下
sudo ldconfig
让配置文件的路径生效。重新运行机器学习的代码 python main.py
就成功了。