!pip install pyspark
!pip install pyarrow==1.0.1
!pip install urllib3==1.26.0
!pip install spark-notebook==0.0.35 --no-deps jupyter_client pyzmq nest_asyncio
!pip install findspark==1.4.2
!pip install qgrid==1.3.1
!pip install cython==0.29.23
!pip install pyarrow==2.0.0
!pip install cryptography==3.3.2
!pip install boto3==1.10.2
!pip install azure-storage-blob==12.6.0
!pip install protobuf==3.19.4
!pip install mlflow==2.1.1
!pip install elasticsearch7==7.10.1
!pip install torch==1.12.0
!pip install transformers
!pip install jieba
!pip install pandas
!pip install lxml
!pip install pytz
!pip install termcolor
同样的代码, 23 年 9 月份的依赖构建完成可以跑,到了 24 年 9 月份构建完就不能跑了,经过几天排查发现是当时的版本没锁定,导致后来docker镜像重新构建成功后,跑任务失败:
torch加载bert模型报错Unexpected key(s) in state_dict: “bert.embeddings.position_ids“.
用最新的 transformers 跑出错,恢复到那个时候的版本后修复,顺便也把 pyspark 版本固定下来,都是坑哦
pip install transformers==4.30.0
pip install pyspark==3.0.1