Search Posts

python机器学习报错 libcufft.so.11: cannot open shared object file: No such file or directory 的问题复现和解决记录

关键报错信息:

libcufft.so.11: cannot open shared object file: No such file or directory

或类似的动态库缺失的报错。

完整报错信息:

Traceback (most recent call last):
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 174, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcufft.so.11: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 1, in 
    from simcse import SimCSE
  File "/root/SimCSE/simcse/__init__.py", line 1, in 
    from .tool import SimCSE
  File "/root/SimCSE/simcse/tool.py", line 5, in 
    import torch
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 234, in 
    _load_global_deps()
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 195, in _load_global_deps
    _preload_cuda_deps(lib_folder, lib_name)
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 161, in _preload_cuda_deps
    ctypes.CDLL(lib_path)
  File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvJitLink.so.12: cannot open shared object file: No such file or directory

后续报错:

Traceback (most recent call last):
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 174, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 1, in 
    from simcse import SimCSE
  File "/root/SimCSE/simcse/__init__.py", line 1, in 
    from .tool import SimCSE
  File "/root/SimCSE/simcse/tool.py", line 5, in 
    import torch
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 234, in 
    _load_global_deps()
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 195, in _load_global_deps
    _preload_cuda_deps(lib_folder, lib_name)
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 161, in _preload_cuda_deps
    ctypes.CDLL(lib_path)
  File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvJitLink.so.12: cannot open shared object file: No such file or directory

Traceback (most recent call last):
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 174, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcublas.so.12: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 1, in 
    from simcse import SimCSE
  File "/root/SimCSE/simcse/__init__.py", line 1, in 
    from .tool import SimCSE
  File "/root/SimCSE/simcse/tool.py", line 5, in 
    import torch
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 234, in 
    _load_global_deps()
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 195, in _load_global_deps
    _preload_cuda_deps(lib_folder, lib_name)
  File "/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/torch-2.1.1-py3.8-linux-x86_64.egg/torch/__init__.py", line 161, in _preload_cuda_deps
    ctypes.CDLL(lib_path)
  File "/root/miniconda3/envs/mypy38/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.12: cannot open shared object file: No such file or directory

问题原因分析:

由于pip 方式安装的动态库,在python main.py 方式加载时,python无法在系统的LD_LIBRARY_PATH等已知环境变量中搜索到动态库路径,所以启动报错。

解决方式:

在Linux系统新增动态库搜索路径信息,新增以下文件

/etc/ld.so.conf.d/tenser.conf

文件内容把涉及到的 模块的所需的 动态录的路径加入到环境中。请根据自己模块的路径情况调整。我的文件内容如下,:

/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_cufft_cu12-11.0.2.54-py3.8-linux-x86_64.egg/nvidia/cufft/lib/
/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_curand_cu12-10.3.2.106-py3.8-linux-x86_64.egg/nvidia/curand/lib/
/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_nvjitlink_cu12-12.3.101-py3.8-linux-x86_64.egg/nvidia/nvjitlink/lib/
/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_cusparse_cu12-12.1.0.106-py3.8-linux-x86_64.egg/nvidia/cusparse/lib/

那么 如何知道

OSError: libcusparse.so.12: cannot open shared object file: No such file or directory

这里的动态库在系统的哪个路径下呢?
可以通过 find 命令查找文件,例如:

# find / -type f -iname "libcusparse.so*"

/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_cusparse_cu12-12.1.0.106-py3.8-linux-x86_64.egg/nvidia/cusparse/lib/libcusparse.so.12

那么对应的 动态库所在路径就是

/root/miniconda3/envs/mypy38/lib/python3.8/site-packages/nvidia_cusparse_cu12-12.1.0.106-py3.8-linux-x86_64.egg/nvidia/cusparse/lib/

这个路径就可以加入到 /etc/ld.so.conf.d/tenser.conf 文件中,这个文件的名称可以任意,但都要以.conf 作为文件后缀名。

最好记得要执行一下

sudo  ldconfig 

让配置文件的路径生效。重新运行机器学习的代码 python main.py 就成功了。

加好友请备注:chinaoss
您可以在微信公众号联系我们
我们将24小时内回复。
取消