前段时间帮人安装了 exawind( https://github.com/Exawind/exawind-driver ),一套风能模拟的程序,非常的难搞。分别在在一台 virt-manager 中的 Ubuntu 20.04、NVIDIA P106-100 和物理机 Ubuntu 24.04、NVIDIA RTX 4060Ti 成功安装 exawind、nalu-wind、amr-wind套件,3个都有 cuda 支持,后2个有 openfast 支持。事先声明,我对这套专业软件完全不懂,只是应人要求进行了一番折腾。下面总结一下安装过程中的要点备忘。
先说一下安装上述套件成功的命令,其中 Ubuntu 20.04 在2025年2月底和5月底各测试了一次,两次的软件版本是不同的,第一条命令后跟了两次用 spack find 显示出的软件包集合:
# virt-manager 中的 Ubuntu 20.04、NVIDIA P106-100
spack install --keep-stage --dont-restage exawind@master%gcc@=9.4.0+amr_wind_gpu+nalu_wind_gpu+cuda cuda_arch=61 build_type=Debug ^cuda@12.5 ^nalu-wind@master%gcc@9.4.0~boost~catalyst+cuda~fftw+fsi~gpu-aware-mpi+hypre~ipo+openfast+pic~rocm~shared+tests+tioga~trilinos-solvers~umpire~wind-utils ^amr-wind@main%gcc@=9.4.0~ascent+cuda~fft~gpu-aware-mpi+hdf5~helics+hypre~ipo~masa+mpi+netcdf+openfast~openmp~rocm~shared~sycl+tests+tiny_profile~umpire~waves2amr ^openfast@master ^trilinos@master 2>&1 | tee -a install.txt
# 2025年2月底的软件包集合
littlebat@ub20:~$ spack find
-- linux-ubuntu20.04-haswell / gcc@9.4.0 ------------------------
amr-wind@main gettext@0.23.1 libxcrypt@4.4.35 parallel-netcdf@1.14.0
autoconf@2.72 glibc@2.31 libxml2@2.13.5 parmetis@4.0.3
automake@1.16.5 gmake@4.4.1 lz4@1.10.0 perl@5.40.0
berkeley-db@18.1.40 hdf5@1.14.5 m4@1.4.19 pigz@2.8
bison@3.8.2 hwloc@2.11.1 matio@1.5.26 pkgconf@2.3.0
bzip2@1.0.8 hypre@2.32.0 metis@5.1.0 pmix@5.0.5
c-blosc@1.21.5 kokkos@4.5.01 nalu-wind@master readline@8.2
ca-certificates-mozilla@2023-05-30 kokkos-kernels@4.5.01 nccmp@1.9.1.0 snappy@1.2.1
cgns@4.5.0 kokkos-nvcc-wrapper@4.5.01 ncurses@6.5 tar@1.35
cmake@3.31.5 krb5@1.21.3 netcdf-c@4.9.2 tioga@1.2.0
cuda@12.5 libaec@1.0.6 nghttp2@1.64.0 trilinos@master
curl@8.11.1 libedit@3.1-20240808 numactl@2.0.18 util-macros@1.20.1
diffutils@3.10 libevent@2.1.12 openblas@0.3.29 xz@5.4.6
exawind@master libiconv@1.17 openfast@master yaml-cpp@0.6.3
findutils@4.10.0 libpciaccess@0.17 openmpi@5.0.6 zlib-ng@2.2.3
gcc-runtime@9.4.0 libsigsegv@2.14 openssh@9.9p1 zstd@1.5.6
gdbm@1.23 libtool@2.4.7 openssl@3.4.0
==> 67 installed packages
# 2025年5月底的软件包集合
littlebat@ub20:~$ spack find
-- linux-ubuntu20.04-haswell / gcc@9.4.0 ------------------------
amr-wind@main glibc@2.31 libxml2@2.13.5 perl@5.40.0
autoconf@2.72 gmake@4.4.1 lz4@1.10.0 pigz@2.8
automake@1.16.5 h5z-zfp@1.1.1 m4@1.4.19 pkgconf@2.3.0
berkeley-db@18.1.40 hdf5@1.14.5 matio@1.5.26 pmix@5.0.5
bison@3.8.2 hwloc@2.11.1 metis@5.1.0 readline@8.2
bzip2@1.0.8 hypre@2.32.0 nalu-wind@master snappy@1.2.1
c-blosc@1.21.5 kokkos@4.6.01 nccmp@1.9.1.0 tar@1.35
ca-certificates-mozilla@2023-05-30 kokkos-kernels@4.6.01 ncurses@6.5 tioga@1.2.0
cgns@4.5.0 kokkos-nvcc-wrapper@4.5.01 netcdf-c@4.9.2 trilinos@master
cmake@3.31.5 krb5@1.21.3 nghttp2@1.64.0 util-macros@1.20.1
cuda@12.5 libaec@1.0.6 numactl@2.0.18 xz@5.4.6
curl@8.11.1 libedit@3.1-20240808 openblas@0.3.29 yaml-cpp@0.6.3
diffutils@3.10 libevent@2.1.12 openfast@master zfp@1.0.0
exawind@master libiconv@1.17 openmpi@5.0.6 zlib-ng@2.2.3
findutils@4.10.0 libpciaccess@0.17 openssh@9.9p1 zstd@1.5.6
gcc-runtime@9.4.0 libsigsegv@2.14 openssl@3.4.0
gdbm@1.23 libtool@2.4.7 parallel-netcdf@1.14.0
gettext@0.23.1 libxcrypt@4.4.35 parmetis@4.0.3
==> 69 installed packages
# 物理机 Ubuntu 24.04、NVIDIA RTX 4060Ti
spack install --keep-stage --dont-restage exawind@master%gcc@=9.5.0+amr_wind_gpu+nalu_wind_gpu+cuda cuda_arch=89 build_type=Release ^cuda@12.5 ^nalu-wind@master%gcc@9.5.0~boost~catalyst+cuda~fftw~gpu-aware-mpi+hypre~ipo+openfast+pic~rocm~shared+tests+tioga~trilinos-solvers~umpire~wind-utils ^amr-wind@main%gcc@=9.5.0~ascent+cuda~fft~gpu-aware-mpi+hdf5~helics+hypre~ipo~masa+mpi+netcdf+openfast~openmp~rocm~shared~sycl+tests+tiny_profile~umpire~waves2amr ^openfast@master ^trilinos@master 2>&1 | tee -a exawind_install.txt
amr-wind+cuda+openfast3.5.3 成功的命令:
# # virt-manager 中的 Ubuntu 20.04、NVIDIA P106-100
spack install --keep-stage --dont-restage amr-wind@main%gcc@=9.4.0+cuda+mpi+netcdf+openfast+tests cuda_arch=61 ^cuda@12.5 ^openfast@3.5.3+rosco%gcc@=9.4.0
第一个要点:
尽管主页上有相关文档,但不要尝试用源码编译上面一整套软件,而是使用 spack 包管理系统进行安装。必要的情况下 amr_wind 的可以使用手工编译。这是软件开发人员在 issue 上解答问题的忠告( Should I use exawind-manager to build Nalu-Wind and how to it #1318 https://github.com/Exawind/nalu-wind/issues/1318#issuecomment-2412425406 ),也是我自己折腾的心得。那么我上面的那么一长串 spack 命令参数是怎么来的呢?基于项目的夜间自动测试公告牌构建名称修改( https://my.cdash.org/index.php?project=Exawind ),但是那个名称的参数是不全的,得自己补全,直接套用是通不过编译的。这也是开发人员的回答。
第二个要点:
上面的核心软件,包括 exawind、nalu-wind、amr-wind、openfast、trilionos 都使用master或main(amr-wind)版本,开发者也应该是这个原则进行软件配套开发的。建议不要尝试指定某个软件包的老旧版本并试图找到各个软件包的协同版本,我的折腾中是没有成功过的。除了上面第三个命令单独安装 amr-wind+cuda+openfast3.5.3 指定了老的 openfast 版本。
第三个要点:
上面的 gcc 版本和 cuda 版本都不是任意选择的。比如,在上面 Ubuntu 24.04 上的安装中,同样的配置和命令,gcc-10+cuda12.5、 gcc-13+cuda12.5、 gcc-13+cuda12.8 均报错较多,还是 gcc-9+cuda12.5报错较少。
第四个要点:
这套软件非常难搞,我怀疑是一群非IT专业的风能专家在搞,不要指望能傻瓜式一键搞定。有问题搜索一下网络,问问 AI。如果还无法解决,不要死磕碰到的问题,在主页上的 issue 上有礼貌的请教一下吧,但不要以一种别人理所应当回答你的心态交流哦。
第五个要点:
如果你在执行 ctest 发现许多报错,不要慌。据一位开发者在 issue 上的回答,ctest 的测试主要针对开发者,普通用户程序能正常运行就行了。参见:nalu-wind spack "+tests" variant isn't working #1362 https://github.com/Exawind/nalu-wind/issues/1362#issuecomment-2701071912
下面是我安装软件过程中碰到的一些细节问题。仅供参考和启发思路。
一、ZLIB::ZLIB 找不到
/home/littlebat/spack/opt/spack/linux-ubuntu20.05-haswell/gcc-9.4.0/hdf5-1.14.5-c6bklqmzralkiar24qpgah23nb4orrxg/cmake/hdf5-targets.cmake:59 (set_target_properties):
The link interface of target "hdf5-static" contains:
ZLIB::ZLIB
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
解决方法:把上面文件(hdf5-targets.cmake)中的 “\$”换成“/home/littlebat/spack/opt/spack/linux-ubuntu20.04-haswell/gcc-9.4.0/zlib-ng-2.2.3-3fi6htb3piiq4csoebqia4g5255emtno/lib/libz.so”就行。
二、找不到 netcdf_par.h,实际存在这个文件。还有测试时出现报错:“NetCDF: Parallel operation on file opened for non-parallel access”。这些问题通过 sudo apt remove 命令卸载 hdf5 相关的包解决,包括 dev 包。
三、MPI::MPI_C找不到
这个错误出现在 Ubuntu 24.04 上:
1 error found in build log:
36 -- Found MPI: TRUE (found version "3.1")
37 -- CMAKE_SYSTEM_NAME = Linux
38 -- CMAKE_CXX_COMPILER_ID = GNU
39 -- CMAKE_BUILD_TYPE = Release
40 -- Trilinos git commit = 73510a07
41 -- Configuring done (7.0s)
>> 42 CMake Error at /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar63jxn53fvlg6vg2fxs/lib/external_packages/MPI/MPIConfig.cmake:17 (target_link_librari
es):
43 The link interface of target "MPI::all_libs" contains:
44
45 MPI::MPI_C
46
47 but the target was not found. Possible reasons include:
48
See build log for details:
/tmp/ls/spack-stage/spack-stage-nalu-wind-master-4amikb4jc7jilxpqlwpn3qvssk56ohr7/spack-build-out.txt
解决方法是在 /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar63jxn53fvlg6vg2fxs/lib/external_packages/MPI/MPIConfig.cmake 中添加解决 :
find_package(MPI REQUIRED)
add_library(MPI::MPI_C INTERFACE IMPORTED)
但在编译到 exawind 之前又要把上面添加的部分去掉,否则会出现报错如下:
1 error found in build log:
23 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
24 -- Found Threads: TRUE
25 -- Enabled Kokkos devices: SERIAL;CUDA
26 -- Found MPI_C: /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/openmpi-5.0.6-neo
khhztni3czwnof43b75t54pcy2kap/bin/mpicc (found version "3.1")
27 -- Found MPI_CXX: /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/openmpi-5.0.6-n
eokhhztni3czwnof43b75t54pcy2kap/bin/mpic++ (found version "3.1")
28 -- Found MPI: TRUE (found version "3.1")
>> 29 CMake Error at /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-ut
j2sx3vq3ofar63jxn53fvlg6vg2fxs/lib/external_packages/MPI/MPIConfig.cmake:17 (add_library):
30 add_library cannot create imported target "MPI::MPI_C" because another
31 target with the same name already exists.
32 Call Stack (most recent call first):
33 /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar6
3jxn53fvlg6vg2fxs/lib/cmake/TeuchosCore/TeuchosCoreConfig.cmake:152 (include)
34 /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar6
3jxn53fvlg6vg2fxs/lib/cmake/Teuchos/TeuchosConfig.cmake:146 (include)
35 /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar6
3jxn53fvlg6vg2fxs/lib/cmake/MueLu/MueLuConfig.cmake:155 (include)
See build log for details:
/tmp/ls/spack-stage/spack-stage-exawind-master-d3hzrukozo44vdyx6w5cmkbytnuswtle/spack-build-out.txt
四、2025年5月底在 virt-manager 中的 Ubuntu 20.04 中编译时报错:
Error: InstallError: For Trilinos@[master,develop], ^kokkos version in spec must match version in Trilinos source code. Specify ^kokkos@4.6.01 for trilinos@[master,develop] instead of ^kokkos@4.5.01.
Trilinos recipe maintainers, please update the ^kokkos version range
详细信息如下:
spack install --keep-stage --dont-restage exawind@master%gcc@=9.4.0+amr_wind_gpu+nalu_wind_gpu+cuda cuda_arch=61 build_type=Release ^cuda@12.5 ^nalu-wind@master%gcc@9.4.0~boost~catalyst+cuda~fftw+fsi~gpu-aware-mpi+hypre~ipo+openfast+pic~rocm~shared+tests+tioga~trilinos-solvers~umpire~wind-utils ^amr-wind@main%gcc@=9.4.0~ascent+cuda~fft~gpu-aware-mpi+hdf5~helics+hypre~ipo~masa+mpi+netcdf+openfast~openmp~rocm~shared~sycl+tests+tiny_profile~umpire~waves2amr ^openfast@master ^trilinos@master 2>&1 | tee -a install.txt
==> No binary for trilinos-master-hnt573kswlz2x2c5rih7f7iidav2iyhe found: installing from source
==> No patches needed for trilinos
==> trilinos: Executing phase: 'cmake'
==> Error: InstallError: For Trilinos@[master,develop], ^kokkos version in spec must match version in Trilinos source code. Specify ^kokkos@4.6.01 for trilinos@[master,develop] instead of ^kokkos@4.5.01.
Trilinos recipe maintainers, please update the ^kokkos version range
/home/littlebat/spack/var/spack/repos/builtin/packages/trilinos/package.py:639, in cmake_args:
636 )
637 kokkos_version_specified = spec["kokkos"].version
638 if kokkos_version_in_trilinos_source != kokkos_version_specified:
>> 639 raise InstallError(
640 "For Trilinos@[master,develop], ^kokkos version in spec must "
641 "match version in Trilinos source code. Specify ^kokkos@{0} ".format(
642 kokkos_version_in_trilinos_source
See build log for details:
/tmp/littlebat/spack-stage/spack-stage-trilinos-master-hnt573kswlz2x2c5rih7f7iidav2iyhe/spack-build-out.txt
==> Warning: Skipping build of nalu-wind-master-dckmaqbynd7rq7szmv4tjfie5sganq4b since trilinos-master-hnt573kswlz2x2c5rih7f7iidav2iyhe failed
==> Warning: Skipping build of exawind-master-ir4qp6t7osfontywtmperyvay2mndjoc since nalu-wind-master-dckmaqbynd7rq7szmv4tjfie5sganq4b failed
解决方法是手工下载安装 kokkos@4.6.01 kokkos-kernels@4.6.01 并修改相应配置文件 spack edit trilinos,spack edit kokkos,spack edit kokkos-kernels可以安装成功了。
详细命令如下:
spack edit trilinos:
# External Kokkos
with when("@14.4: +kokkos"):
depends_on("kokkos+wrapper", when="+wrapper")
depends_on("kokkos~wrapper", when="~wrapper")
depends_on("kokkos+cuda_relocatable_device_code~shared", when="+cuda_rdc")
depends_on("kokkos+hip_relocatable_device_code~shared", when="+rocm_rdc")
depends_on("kokkos-kernels~shared", when="+cuda_rdc")
depends_on("kokkos-kernels~shared", when="+rocm_rdc")
depends_on("kokkos~complex_align")
depends_on("kokkos@4.6.01", when="@master:")
depends_on("kokkos@4.3.01", when="@16")
depends_on("kokkos@4.2.01", when="@15.1:15")
depends_on("kokkos@4.1.00", when="@14.4:15.0")
depends_on("kokkos-kernels@4.6.01", when="@master:")
depends_on("kokkos-kernels@4.3.01", when="@16")
depends_on("kokkos-kernels@4.2.01", when="@15.1:15")
depends_on("kokkos+openmp", when="+openmp")
spack edit kokkos:
# https://github.com/kokkos/kokkos/releases/download/4.6.01/kokkos-4.6.01.tar.gz
version("4.6.01", sha256="b9d70e4653b87a06dbb48d63291bf248058c7c7db4bd91979676ad5609bb1a3a")
spack edit kokkos-kernels:
# https://github.com/kokkos/kokkos-kernels/releases/download/4.6.01/kokkos-kernels-4.6.01.tar.gz
version("4.6.01", sha256="95b9357f37ab3b9c3913c00741acb2501831c28ea8664de67818ae79c69c5908")
五、其它一些重要命令用法示例
spack config edit cuda
spack compiler find
spack compilers #查看
spack install zlib %gcc #以gcc安装
spack config edit compilers
~/.spack/config.yaml 内容如下:
config:
build_stage:
- /home/littlebat/tmp/spack-stage
test_stage: /home/littlebat/tmp/test
keep_stage: True(貌似这行不起作用)
然后 spack install --keep-stage packagename 这样就不会删除临时构建目录了