Search Posts

【kerneltravel原创】机器学习的paddleOCR的项目经过pyinstaller打包后可执行文件启动报错的原因和解决方式

pyinstaller打包的基于paddleOCR的可执行文件启动报错

很多人的图片转文字功能都采用 paddleOCR项目,为了发布给用户使用,往往要借助pyinstaller等打包工具。使用pyinstaller打包paddleOCR为可执行后,很多开发者遇到,可执行文件启动报错:未找到模块。

网友kerneltravel 综合分析了多个issue和pyinstaller的报错信息后,找到这个问题的原因,并给出了解决方法,同时向paddleOCR官方提交了修复代码(见 PR1PR2 ),以PR2 为准。

下面对这个问题做具体分析:

问题表现:

  1. 打包后,paddleocr应用启动报错信息1:

    Traceback (most recent call last):
    File "main.py", line 5, in
    File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module
    File "paddleocr_init_.py", line 14, in
    File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module
    File "paddleocr\paddleocr.py", line 33, in
    File "importlib_init_.py", line 126, in import_module
    ModuleNotFoundError: No module named 'tools'
    [11752] Failed to execute script 'main' due to unhandled exception!
  2. 启动报错信息2:

    raceback (most recent call last):
    File "yes .py",line 1, in 
    File"PyInstaller\loader pyimod02 importers.py", line 385,in exec moduleFileFile"paddleocrinit .py",line 14,in 
    File"pyInstaller loader pyimod02 importers.py", line 385,in exec module
    File"paddleocr\paddleocr.py",line 34,in 
    File"importlib\ init .py",line 127,in import module
    ModuleNotFoundError: No module named"ppocr'
    [23216] Failed to execute script 'yes' due to unhandled exception!
  3. 启动报错信息3:

    Traceback (most recent call last):
    File "/home/pc/Music/PD_OCR/demo.py", line 1, in
    from paddleocr import PaddleOCR,tools
    File "/home/pc/.local/lib/python3.10/site-packages/paddleocr/init.py", line 14, in
    from .paddleocr import *
    File "/home/pc/.local/lib/python3.10/site-packages/paddleocr/paddleocr.py", line 37, in
    from tools.infer import predict_system
    ModuleNotFoundError: No module named 'tools.infer'

    runtime environment:

    Ubuntu: 22:04
    paddleocr : 2.6.1.3
    paddlepaddle : 2.4.2
  4. 启动报错信息4:

    Traceback (most recent call last):
    File "main.py", line 34, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "wkr\AllWorker.py", line 31, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "wkr\ItrWorker.py", line 26, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddleocr_init_.py", line 14, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddleocr\paddleocr.py", line 21, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddle_init_.py", line 62, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddle\distributed_init_.py", line 15, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddle\distributed\spawn.py", line 24, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddle\distributed\utils\launch_utils.py", line 27, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddle\distributed\fleet_init_.py", line 31, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddle\distributed\fleet\fleet.py", line 33, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddle\fluid\ir.py", line 28, in
    File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module
    File "paddle\fluid\proto\pass_desc_pb2.py", line 16, in
    ModuleNotFoundError: No module named 'framework_pb2'
  5. 启动报错信息5:

    Traceback (most recent call last):
    File "main.py", line 2, in 
    File "PyInstaller\loader\pyimod02_importers.py", line 493, in exec_module
    File "libs\ocr.py", line 2, in 
    ModuleNotFoundError: No module named 'paddleocr'

问题原因

从报错信息分析,都是运行时无法找到paddleocr库下面的具体的模块(paddleocr.tools、paddleocr.tools.ppocr、 paddleocr.tools.infer 等多层级的模块)。

为了找到依据,打包时启用更多调试信息:
例如使用 -d all 参数,可以将可执行文件启动时加载的模块的过程打印出来:

pyinstaller.exe -F  -d all --add-data .\paddleocr;.\paddleocr    --add-data .\mklml.dll;. main.py

经过以上打包得到的可执行文件,运行时会有详细的日志输出信息,可看到在哪里失败了,例如:

# paddle.text.datasets.uci_housing not found in PYZ
# code object from 'd:\\path\\to\\dist\\main\\paddle\\text\\datasets\\uci_housing.pyc'
import 'paddle.text.datasets.uci_housing' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9E8880>
# paddle.text.datasets.wmt14 not found in PYZ
# code object from 'd:\\path\\to\\dist\\main\\paddle\\text\\datasets\\wmt14.pyc'
import 'paddle.text.datasets.wmt14' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9E8B50>
# paddle.text.datasets.wmt16 not found in PYZ
# code object from 'd:\\path\\to\\dist\\main\\paddle\\text\\datasets\\wmt16.pyc'
import 'paddle.text.datasets.wmt16' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9F20D0>
import 'paddle.text.datasets' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9DE1C0>
import 'paddle.text' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2EF9D0BE0>
import 'paddle' # <_frozen_importlib_external.SourcelessFileLoader object at 0x000001B2CBAF43A0>
# tools not found in PYZ
Traceback (most recent call last):
  File "main.py", line 4, in 
  File "", line 991, in _find_and_load
  File "", line 975, in _find_and_load_unlocked
  File "", line 671, in _load_unlocked
  File "", line 783, in exec_module
  File "", line 219, in _call_with_frames_removed
  File "paddleocr\__init__.py", line 14, in 
  File "", line 991, in _find_and_load
  File "", line 975, in _find_and_load_unlocked
  File "", line 671, in _load_unlocked
  File "", line 783, in exec_module
  File "", line 219, in _call_with_frames_removed
  File "paddleocr\paddleocr.py", line 33, in 
  File "importlib\__init__.py", line 127, in import_module
  File "", line 1014, in _gcd_import
  File "", line 991, in _find_and_load
  File "", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'tools'
[220960] Failed to execute script 'main' due to unhandled exception!
[220960] LOADER: OK.
[220960] LOADER: Manually flushing stdout and stderr
[220960] LOADER: Cleaning up Python interpreter.
# clear builtins._
# clear sys.path
# clear sys.argv
# clear sys.ps1
# clear sys.ps2
# clear sys.last_type
# clear sys.last_value
# clear sys.last_traceback
# destroy paddleocr.paddleocr
# destroy paddleocr
# clear sys.path_hooks
# clear sys.path_importer_cache
# clear sys.meta_path
# clear sys.__interactivehook__
# restore sys.stdin
# restore sys.stdout
# restore sys.stderr
# cleanup[2] removing sys

当给pyinstaller加上 --collect-all paddleocr 参数时,以上5种错误都没了。因为这个参数会把site-packages下的paddleocr 模块的所有文件都复制到打包文件目录下。

如果不一次性解决完全,想逐步验证,当一个模块hidden-import,而它的子模块没有hidden-import进来的时候,是否只出现子模块无法找到的提示。如果是,那么就说明是hidden-import进来的模块不全(实际–collect-all是将指定模块及其所有子模块都一次性导入,更全)。
那么可以采用下面的打包命令:

r'pyinstaller.exe -F --hidden-import paddleocr --hidden-import paddleocr.paddleocr --hidden-import paddleocr.ppocr --hidden-import paddleocr.ppocr.* --hidden-import paddleocr.ppstructure --hidden-import paddleocr.ppstructure.* --hidden-import paddleocr.tools --hidden-import paddleocr.tools.* -d all --add-data .\paddleocr;.\paddleocr    --add-data .\mklml.dll;. main.py'

以上命令可以随机去掉其中的 --hidden-import paddleocr.ppstructure 等句子,如果最终可执行文件提示 paddleocr.ppstructure 模块未找到,而没有再提示paddleocr模块未找到,则说明问题就处在hidden-import 也就是需要 --collect-all paddleocr 这个参数,显式指定导入整个paddleocr模块。

为什么只有paddleocr需要额外指定导入参数?

因为paddleocr 的模块没有 pyd模块,(不像 paddlepaddle模块 有 libpaddle.pyd 模块文件),所以运行时如果没有 paddleocr的 .pyd 文件,那就只能通过 将paddleocr的全量文件--collect-all 打包入可执行文件目录的方式解决。

paddleocr 能否生成pyd 文件?

暂时未知。

解决方式

pyinstaller加上 --collect-all paddleocr 参数进行打包。且注意paddleOCR 的代码中已应用了 https://github.com/PaddlePaddle/PaddleOCR/pull/10502 这个PR的2个代码文件的修改。

回顾总结

这个问题在paddleOCR 的github官网上连续2-3年有人不断遇到,但由于很多开发者没找到问题规律,所以没有复现。经kerneltravel和其他网友的共同努力,终于分析清楚了原因,也给出了准确的解决方式。
回想在解决这个问题之前,也参考过网上很多其他文章,但大部分文章的内容互相抄袭,有的给出的解决方式有很大局限性(比如打包后只能给自己电脑用,比如这篇 文章虽然也提到了一些关键解决方式,但该作者采用的

pathex=['D:/python/JobRunner/venv/Lib/site-packages/paddleocr', 'D:/python/JobRunner/venv/Lib/site-packages/paddle/libs'],

等参数,导致打包的可执行文件在别人电脑上仍然找不到依赖的包。
虽然该作者用--hidden-import= 的方式引入了skimage 等依赖的包,该作者用--hidden-import导入的包也只适用于他自己的情况,并未总结清楚。

总之,希望本文能对你解决pyinstaller打包paddleOCR有真正的帮助。如果仍有疑问,欢迎在评论区留言,或在github.com上给kerneltravel私信留言。

加好友请备注:chinaoss
您可以在微信公众号联系我们
我们将24小时内回复。
取消