exawind 安装要点

前段时间帮人安装了 exawind( https://github.com/Exawind/exawind-driver ),一套风能模拟的程序,非常的难搞。分别在在一台 virt-manager 中的 Ubuntu 20.04、NVIDIA P106-100 和物理机 Ubuntu 24.04、NVIDIA RTX 4060Ti 成功安装 exawind、nalu-wind、amr-wind套件,3个都有 cuda 支持,后2个有 openfast 支持。事先声明,我对这套专业软件完全不懂,只是应人要求进行了一番折腾。下面总结一下安装过程中的要点备忘。

先说一下安装上述套件成功的命令,其中 Ubuntu 20.04 在2025年2月底和5月底各测试了一次,两次的软件版本是不同的,第一条命令后跟了两次用 spack find 显示出的软件包集合:

# virt-manager 中的 Ubuntu 20.04、NVIDIA P106-100 
spack install --keep-stage --dont-restage exawind@master%gcc@=9.4.0+amr_wind_gpu+nalu_wind_gpu+cuda cuda_arch=61 build_type=Debug ^cuda@12.5 ^nalu-wind@master%gcc@9.4.0~boost~catalyst+cuda~fftw+fsi~gpu-aware-mpi+hypre~ipo+openfast+pic~rocm~shared+tests+tioga~trilinos-solvers~umpire~wind-utils ^amr-wind@main%gcc@=9.4.0~ascent+cuda~fft~gpu-aware-mpi+hdf5~helics+hypre~ipo~masa+mpi+netcdf+openfast~openmp~rocm~shared~sycl+tests+tiny_profile~umpire~waves2amr ^openfast@master ^trilinos@master 2>&1 | tee -a install.txt 

# 2025年2月底的软件包集合
littlebat@ub20:~$ spack find
-- linux-ubuntu20.04-haswell / gcc@9.4.0 ------------------------
amr-wind@main                       gettext@0.23.1              libxcrypt@4.4.35  parallel-netcdf@1.14.0
autoconf@2.72                       glibc@2.31                  libxml2@2.13.5    parmetis@4.0.3
automake@1.16.5                     gmake@4.4.1                 lz4@1.10.0        perl@5.40.0
berkeley-db@18.1.40                 hdf5@1.14.5                 m4@1.4.19         pigz@2.8
bison@3.8.2                         hwloc@2.11.1                matio@1.5.26      pkgconf@2.3.0
bzip2@1.0.8                         hypre@2.32.0                metis@5.1.0       pmix@5.0.5
c-blosc@1.21.5                      kokkos@4.5.01               nalu-wind@master  readline@8.2
ca-certificates-mozilla@2023-05-30  kokkos-kernels@4.5.01       nccmp@1.9.1.0     snappy@1.2.1
cgns@4.5.0                          kokkos-nvcc-wrapper@4.5.01  ncurses@6.5       tar@1.35
cmake@3.31.5                        krb5@1.21.3                 netcdf-c@4.9.2    tioga@1.2.0
cuda@12.5                           libaec@1.0.6                nghttp2@1.64.0    trilinos@master
curl@8.11.1                         libedit@3.1-20240808        numactl@2.0.18    util-macros@1.20.1
diffutils@3.10                      libevent@2.1.12             openblas@0.3.29   xz@5.4.6
exawind@master                      libiconv@1.17               openfast@master   yaml-cpp@0.6.3
findutils@4.10.0                    libpciaccess@0.17           openmpi@5.0.6     zlib-ng@2.2.3
gcc-runtime@9.4.0                   libsigsegv@2.14             openssh@9.9p1     zstd@1.5.6
gdbm@1.23                           libtool@2.4.7               openssl@3.4.0
==> 67 installed packages

# 2025年5月底的软件包集合
littlebat@ub20:~$ spack find 
-- linux-ubuntu20.04-haswell / gcc@9.4.0 ------------------------
amr-wind@main                       glibc@2.31                  libxml2@2.13.5          perl@5.40.0
autoconf@2.72                       gmake@4.4.1                 lz4@1.10.0              pigz@2.8
automake@1.16.5                     h5z-zfp@1.1.1               m4@1.4.19               pkgconf@2.3.0
berkeley-db@18.1.40                 hdf5@1.14.5                 matio@1.5.26            pmix@5.0.5
bison@3.8.2                         hwloc@2.11.1                metis@5.1.0             readline@8.2
bzip2@1.0.8                         hypre@2.32.0                nalu-wind@master        snappy@1.2.1
c-blosc@1.21.5                      kokkos@4.6.01               nccmp@1.9.1.0           tar@1.35
ca-certificates-mozilla@2023-05-30  kokkos-kernels@4.6.01       ncurses@6.5             tioga@1.2.0
cgns@4.5.0                          kokkos-nvcc-wrapper@4.5.01  netcdf-c@4.9.2          trilinos@master
cmake@3.31.5                        krb5@1.21.3                 nghttp2@1.64.0          util-macros@1.20.1
cuda@12.5                           libaec@1.0.6                numactl@2.0.18          xz@5.4.6
curl@8.11.1                         libedit@3.1-20240808        openblas@0.3.29         yaml-cpp@0.6.3
diffutils@3.10                      libevent@2.1.12             openfast@master         zfp@1.0.0
exawind@master                      libiconv@1.17               openmpi@5.0.6           zlib-ng@2.2.3
findutils@4.10.0                    libpciaccess@0.17           openssh@9.9p1           zstd@1.5.6
gcc-runtime@9.4.0                   libsigsegv@2.14             openssl@3.4.0
gdbm@1.23                           libtool@2.4.7               parallel-netcdf@1.14.0
gettext@0.23.1                      libxcrypt@4.4.35            parmetis@4.0.3
==> 69 installed packages

# 物理机 Ubuntu 24.04、NVIDIA RTX 4060Ti
spack install --keep-stage --dont-restage exawind@master%gcc@=9.5.0+amr_wind_gpu+nalu_wind_gpu+cuda cuda_arch=89 build_type=Release ^cuda@12.5 ^nalu-wind@master%gcc@9.5.0~boost~catalyst+cuda~fftw~gpu-aware-mpi+hypre~ipo+openfast+pic~rocm~shared+tests+tioga~trilinos-solvers~umpire~wind-utils ^amr-wind@main%gcc@=9.5.0~ascent+cuda~fft~gpu-aware-mpi+hdf5~helics+hypre~ipo~masa+mpi+netcdf+openfast~openmp~rocm~shared~sycl+tests+tiny_profile~umpire~waves2amr ^openfast@master ^trilinos@master 2>&1 | tee -a exawind_install.txt

amr-wind+cuda+openfast3.5.3 成功的命令:

# # virt-manager 中的 Ubuntu 20.04、NVIDIA P106-100
spack install --keep-stage --dont-restage amr-wind@main%gcc@=9.4.0+cuda+mpi+netcdf+openfast+tests cuda_arch=61 ^cuda@12.5  ^openfast@3.5.3+rosco%gcc@=9.4.0

第一个要点:

尽管主页上有相关文档,但不要尝试用源码编译上面一整套软件,而是使用 spack 包管理系统进行安装。必要的情况下 amr_wind 的可以使用手工编译。这是软件开发人员在 issue 上解答问题的忠告( Should I use exawind-manager to build Nalu-Wind and how to it #1318 https://github.com/Exawind/nalu-wind/issues/1318#issuecomment-2412425406 ),也是我自己折腾的心得。那么我上面的那么一长串 spack 命令参数是怎么来的呢?基于项目的夜间自动测试公告牌构建名称修改( https://my.cdash.org/index.php?project=Exawind ),但是那个名称的参数是不全的,得自己补全,直接套用是通不过编译的。这也是开发人员的回答。

第二个要点:

上面的核心软件,包括 exawind、nalu-wind、amr-wind、openfast、trilionos 都使用master或main(amr-wind)版本,开发者也应该是这个原则进行软件配套开发的。建议不要尝试指定某个软件包的老旧版本并试图找到各个软件包的协同版本,我的折腾中是没有成功过的。除了上面第三个命令单独安装 amr-wind+cuda+openfast3.5.3 指定了老的 openfast 版本。

第三个要点:

上面的 gcc 版本和 cuda 版本都不是任意选择的。比如,在上面 Ubuntu 24.04 上的安装中,同样的配置和命令,gcc-10+cuda12.5、 gcc-13+cuda12.5、 gcc-13+cuda12.8 均报错较多,还是 gcc-9+cuda12.5报错较少。

第四个要点:

这套软件非常难搞,我怀疑是一群非IT专业的风能专家在搞,不要指望能傻瓜式一键搞定。有问题搜索一下网络,问问 AI。如果还无法解决,不要死磕碰到的问题,在主页上的 issue 上有礼貌的请教一下吧,但不要以一种别人理所应当回答你的心态交流哦。

第五个要点:

如果你在执行 ctest 发现许多报错,不要慌。据一位开发者在 issue 上的回答,ctest 的测试主要针对开发者,普通用户程序能正常运行就行了。参见:nalu-wind spack "+tests" variant isn't working #1362 https://github.com/Exawind/nalu-wind/issues/1362#issuecomment-2701071912

下面是我安装软件过程中碰到的一些细节问题。仅供参考和启发思路。

一、ZLIB::ZLIB 找不到

/home/littlebat/spack/opt/spack/linux-ubuntu20.05-haswell/gcc-9.4.0/hdf5-1.14.5-c6bklqmzralkiar24qpgah23nb4orrxg/cmake/hdf5-targets.cmake:59 (set_target_properties):
The link interface of target "hdf5-static" contains:

ZLIB::ZLIB
but the target was not found. Possible reasons include:

* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.

解决方法:把上面文件(hdf5-targets.cmake)中的 “\$”换成“/home/littlebat/spack/opt/spack/linux-ubuntu20.04-haswell/gcc-9.4.0/zlib-ng-2.2.3-3fi6htb3piiq4csoebqia4g5255emtno/lib/libz.so”就行。

二、找不到 netcdf_par.h,实际存在这个文件。还有测试时出现报错:“NetCDF: Parallel operation on file opened for non-parallel access”。这些问题通过 sudo apt remove 命令卸载 hdf5 相关的包解决,包括 dev 包。

三、MPI::MPI_C找不到

这个错误出现在 Ubuntu 24.04 上:

1 error found in build log:
     36    -- Found MPI: TRUE (found version "3.1")
     37    -- CMAKE_SYSTEM_NAME = Linux
     38    -- CMAKE_CXX_COMPILER_ID = GNU
     39    -- CMAKE_BUILD_TYPE = Release
     40    -- Trilinos git commit = 73510a07
     41    -- Configuring done (7.0s)
  >> 42    CMake Error at /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar63jxn53fvlg6vg2fxs/lib/external_packages/MPI/MPIConfig.cmake:17 (target_link_librari
           es):
     43      The link interface of target "MPI::all_libs" contains:
     44    
     45        MPI::MPI_C
     46    
     47      but the target was not found.  Possible reasons include:
     48    

See build log for details:
  /tmp/ls/spack-stage/spack-stage-nalu-wind-master-4amikb4jc7jilxpqlwpn3qvssk56ohr7/spack-build-out.txt

解决方法是在 /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar63jxn53fvlg6vg2fxs/lib/external_packages/MPI/MPIConfig.cmake 中添加解决 :
find_package(MPI REQUIRED)
add_library(MPI::MPI_C INTERFACE IMPORTED)

但在编译到 exawind 之前又要把上面添加的部分去掉,否则会出现报错如下:

1 error found in build log:
     23    -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
     24    -- Found Threads: TRUE
     25    -- Enabled Kokkos devices: SERIAL;CUDA
     26    -- Found MPI_C: /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/openmpi-5.0.6-neo
           khhztni3czwnof43b75t54pcy2kap/bin/mpicc (found version "3.1")
     27    -- Found MPI_CXX: /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/openmpi-5.0.6-n
           eokhhztni3czwnof43b75t54pcy2kap/bin/mpic++ (found version "3.1")
     28    -- Found MPI: TRUE (found version "3.1")
  >> 29    CMake Error at /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-ut
           j2sx3vq3ofar63jxn53fvlg6vg2fxs/lib/external_packages/MPI/MPIConfig.cmake:17 (add_library):
     30      add_library cannot create imported target "MPI::MPI_C" because another
     31      target with the same name already exists.
     32    Call Stack (most recent call first):
     33      /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar6
           3jxn53fvlg6vg2fxs/lib/cmake/TeuchosCore/TeuchosCoreConfig.cmake:152 (include)
     34      /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar6
           3jxn53fvlg6vg2fxs/lib/cmake/Teuchos/TeuchosConfig.cmake:146 (include)
     35      /home/ls/spack/opt/spack/linux-ubuntu24.04-x86_64_v4/gcc-9.5.0/trilinos-master-utj2sx3vq3ofar6
           3jxn53fvlg6vg2fxs/lib/cmake/MueLu/MueLuConfig.cmake:155 (include)

See build log for details:
  /tmp/ls/spack-stage/spack-stage-exawind-master-d3hzrukozo44vdyx6w5cmkbytnuswtle/spack-build-out.txt

四、2025年5月底在 virt-manager 中的 Ubuntu 20.04 中编译时报错:

Error: InstallError: For Trilinos@[master,develop], ^kokkos version in spec must match version in Trilinos source code. Specify ^kokkos@4.6.01 for trilinos@[master,develop] instead of ^kokkos@4.5.01.
Trilinos recipe maintainers, please update the ^kokkos version range

详细信息如下:

spack install --keep-stage --dont-restage exawind@master%gcc@=9.4.0+amr_wind_gpu+nalu_wind_gpu+cuda cuda_arch=61 build_type=Release ^cuda@12.5 ^nalu-wind@master%gcc@9.4.0~boost~catalyst+cuda~fftw+fsi~gpu-aware-mpi+hypre~ipo+openfast+pic~rocm~shared+tests+tioga~trilinos-solvers~umpire~wind-utils ^amr-wind@main%gcc@=9.4.0~ascent+cuda~fft~gpu-aware-mpi+hdf5~helics+hypre~ipo~masa+mpi+netcdf+openfast~openmp~rocm~shared~sycl+tests+tiny_profile~umpire~waves2amr  ^openfast@master ^trilinos@master 2>&1 | tee -a install.txt

==> No binary for trilinos-master-hnt573kswlz2x2c5rih7f7iidav2iyhe found: installing from source
==> No patches needed for trilinos
==> trilinos: Executing phase: 'cmake'
==> Error: InstallError: For Trilinos@[master,develop], ^kokkos version in spec must match version in Trilinos source code. Specify ^kokkos@4.6.01 for trilinos@[master,develop] instead of ^kokkos@4.5.01.
Trilinos recipe maintainers, please update the ^kokkos version range

/home/littlebat/spack/var/spack/repos/builtin/packages/trilinos/package.py:639, in cmake_args:
        636            )
        637            kokkos_version_specified = spec["kokkos"].version
        638            if kokkos_version_in_trilinos_source != kokkos_version_specified:
  >>    639                raise InstallError(
        640                    "For Trilinos@[master,develop], ^kokkos version in spec must "
        641                    "match version in Trilinos source code. Specify ^kokkos@{0} ".format(
        642                        kokkos_version_in_trilinos_source

See build log for details:
  /tmp/littlebat/spack-stage/spack-stage-trilinos-master-hnt573kswlz2x2c5rih7f7iidav2iyhe/spack-build-out.txt

==> Warning: Skipping build of nalu-wind-master-dckmaqbynd7rq7szmv4tjfie5sganq4b since trilinos-master-hnt573kswlz2x2c5rih7f7iidav2iyhe failed
==> Warning: Skipping build of exawind-master-ir4qp6t7osfontywtmperyvay2mndjoc since nalu-wind-master-dckmaqbynd7rq7szmv4tjfie5sganq4b failed

解决方法是手工下载安装 kokkos@4.6.01 kokkos-kernels@4.6.01 并修改相应配置文件 spack edit trilinos,spack edit kokkos,spack edit kokkos-kernels可以安装成功了。

详细命令如下:

spack edit trilinos:
    # External Kokkos
    with when("@14.4: +kokkos"):
        depends_on("kokkos+wrapper", when="+wrapper")
        depends_on("kokkos~wrapper", when="~wrapper")
        depends_on("kokkos+cuda_relocatable_device_code~shared", when="+cuda_rdc")
        depends_on("kokkos+hip_relocatable_device_code~shared", when="+rocm_rdc")
        depends_on("kokkos-kernels~shared", when="+cuda_rdc")
        depends_on("kokkos-kernels~shared", when="+rocm_rdc")
        depends_on("kokkos~complex_align")
        depends_on("kokkos@4.6.01", when="@master:")
        depends_on("kokkos@4.3.01", when="@16")
        depends_on("kokkos@4.2.01", when="@15.1:15")
        depends_on("kokkos@4.1.00", when="@14.4:15.0")
        depends_on("kokkos-kernels@4.6.01", when="@master:")
        depends_on("kokkos-kernels@4.3.01", when="@16")
        depends_on("kokkos-kernels@4.2.01", when="@15.1:15")
        depends_on("kokkos+openmp", when="+openmp")


spack edit kokkos:
# https://github.com/kokkos/kokkos/releases/download/4.6.01/kokkos-4.6.01.tar.gz
    version("4.6.01", sha256="b9d70e4653b87a06dbb48d63291bf248058c7c7db4bd91979676ad5609bb1a3a")

spack edit kokkos-kernels:
# https://github.com/kokkos/kokkos-kernels/releases/download/4.6.01/kokkos-kernels-4.6.01.tar.gz
    version("4.6.01", sha256="95b9357f37ab3b9c3913c00741acb2501831c28ea8664de67818ae79c69c5908")

五、其它一些重要命令用法示例

spack config edit cuda
spack compiler find
spack compilers #查看
spack install zlib %gcc #以gcc安装
spack config edit compilers 
~/.spack/config.yaml 内容如下:
config:
  build_stage:
    - /home/littlebat/tmp/spack-stage
  test_stage: /home/littlebat/tmp/test
  keep_stage: True(貌似这行不起作用)
然后 spack install --keep-stage packagename 这样就不会删除临时构建目录了

发表评论

电子邮件地址不会被公开。 必填项已用*标注