HYPRE: BoomerAMG选项和优化

文章目录

  • BoomerAMG选项和优化
    • Overview
    • AMG Algorithm
    • Options
    • Turning on BoomerAMG
    • Strong Threshold
    • Going Deeper
    • Timing
    • More Options
      • Max Levels
      • Coarsen Type
      • Agressive Coarsening
      • Interpolation Type
      • P Max
      • Putting it All Together
    • Full List of Options

BoomerAMG选项和优化

Hypre / BoomerAMG


Overview

Hypre is a set of solvers/preconditioners from Lawrence Livermore National Laboratory. The main Hypre website can be found here. For MOOSE we mainly use Hypre’s algebraic multigrid (AMG) package: BoomerAMG.

AMG is a scalable, efficient algorithm for solution of PDEs that are fairly elliptic. Many different sets of PDEs fall into that category including heat conduction, solid mechanics, porous flow, species diffusion, etc.


AMG Algorithm

I hope to fill this out with some details about how AMG works - but I don’t have time right now.


Options

BoomerAMG has an incredible number of options, many of which can have a large impact on solve speed and convergence rate. The defaults, as set by PETSc, are ok for small two-dimensional problems. However, if solving in 3D or on over 32 processors you should take some time to familiarize yourself with these options. It can be daunting, so if you start to get too deep always turn to moose-users for help!

We specify options for Hypre using PETSc command-line option syntax. On the command-line these take the form of -option value. However, we also supply a way of setting these in the input file. In both the Executioner block and Preconditioner blocks you can set petsc_options_iname and petsc_options_value. These two hold the parameter names and values, respectively, that you would like to set.


Turning on BoomerAMG

To turn on Hypre-BoomerAMG preconditioning you would use this in your input file:

[Executioner]petsc_options_iname = '-pc_type -pc_hypre_type'petsc_options_value = 'hypre    boomeramg'
[]

This is equivalent to setting -pc_type hypre -pc_hypre_type boomeramg on the command-line.

Notice that it takes two options to turn on BoomerAMG: one to select Hypre - and one to select BoomerAMG from the Hypre package. Hypre technically contains many solvers and preconditioners, but many of them overlap with what PETSc already provides.

Strong Threshold

By far, the most important option is -pc_hypre_boomeramg_stong_threshold. This option controls the primary coarsening mechanism: removal of entries from the matrix by simply deciding they’re unimportant. What you’re setting here is a threshold: the (scaled) value (between 0 and 1) the entry in the matrix must be over to be kept. Everything below the threshold will be discarded. This means that setting this to a higher amount (between 0 and 1) will discard more of the matrix. Discarding more entries is generally good for iteration speed (i.e. how fast each trip through BoomerAMG is) but can dramatically impact the quality of the preconditioner so going too far will lead to overall worse performance by requiring a larger number of linear iterations.

By default this is set to 0.25. That generally works fine in 2D… but is nowhere close for 3D.

If MOOSE detects that you’re using Hypre BoomerAMG and running in 3D it will automatically assign -pc_hypre_boomeramg_strong_threshold to be 0.7. This was chosen by reading a lot of literature and doing some small-scale optimization tests by the MOOSE team. HOWEVER: 0.7 is NOT a golden number. Depending on your problem you may need more coarsening (0.8) or less (0.6 or 0.5 to help convergence). Be warned though: I highly recommend that you never set this below 0.5 for any 3D problem. The problem will explode with a huge amount of time and memory taken up by the preconditioner.

Going Deeper

If you’re reading this far, then you’ve probably run into a real problem. Either you’re not getting the speed/scalability you want, or you’re not getting convergence. I’ll try to put these in order of importance (in my opinion) and give you some guidance for each one.

In general, speeding up BoomerAMG or improving scalability typically comes from doing more coarsening. As a reminder: the first thing to do is make sure you have -pc_hypre_boomeramg_strong_threshold set appropriately for your problem (see above). Even if you have it set to 0.25 (for 2D) or 0.7 (for 3D) you might try increasing it some to try to find that sweet spot between effeciency and effectiveness.

Timing

Before venturing futher, you will definitely want to turn on the performance log (“perf log”). You do that by putting print_perf_log = true in the [Outputs] block in your input file. At the end of the solve it will print out a table showing times.

For preconditioning what you want to pay attention to is the Total Time With Sub column. The total time during the nonlinear solve is in the solve() row. Your objective should be to reduce that. solve() is mainly a combination of three things: compute_residual(), compute_jacobian() and the preconditioner (with a little going to the linear/nonlinear solver in PETSc).

The first thing to do is look at how much of the total time compute_jacobian() and compute_residual() are taking. When pushing BoomerAMG far (trying to scale a problem out to many cores) what will happen is that compute_jacobian() and compute_residual() will take smaller and smaller portions of the total solve() time. At any point you want to keep compute_residual() + compute_jacobian() to around 60%-70% of the total solve() time. The remainder of the solve() time is solver / preconditioning time - with the majority of that going into BoomerAMG. If BoomerAMG is taking more than 50% of the solve time then either you’ve scaled your problem too far (try to keep at least 5000 DoFs per processor - 10k is even better) or you need to start adjusting BoomerAMG options.

More Options

Let’s dive into some of the more advanced options.

Max Levels

The first option that I want to draw your attention to is also one you should probably leave alone for now, but I point it out because everyone wants to mess with it. -pc_hypre_boomeramg_max_levels controls the number of “levels” in the multigrid solve: i.e. the number of coarser problems that are produced. The default for this option is 25. You might think that you could save time by making this smaller or that you could make the algorithm more accurate by making this larger: actually neither of those are the truth!

One thing to understand about multigrid is that it’s trying to generate a really small “coarsest” problem that it will actually solve (usually using a direct/Gauss elimination solver). If you artificially limit the number of levels what you’re doing is not allowing the algorithm to reach the coarsest level, which means that you’re doing an expensive direct solve on a larger problem… which is slower. Yes, by reducing the number of levels you can actually slow down your solve!

What about increasing the levels? Well, the problem can only get so coarse. So increasing the number of levels typically has no effect. Even a problem with millions of DoFs will typically only need ~15 levels or so. You can see how many levels BoomerAMG is using by turning on -pc_hypre_boomeramg_print_statistics.

Coarsen Type

As mentioned before, speeding up Hypre is usually done by doing more coarsening. The main option that controls coarsening is -pc_hypre_boomeramg_coarsen_type. By default this is set to Falgout which is a good mix of efficiency and accuracy. However, there is typically some performance to be gained by using more aggressive options. In particular, when solving a 3D problem you should try using HMIS or PMIS (in that order). Both of these use very little parallel communication but do an excellent job at removing matrix entries to get to coarser problems.

There are a lot more options other than Falgout, HMIS and PMIS - but I’m not going to list them here because those are really the ones you will want to use.

Agressive Coarsening

Another option that can do a lot of coarsening is “Aggressive Coarsening”. BoomerAMG actually has many parameters surrounding this - but currently only 2 are available to us as PETSc options: -pc_hypre_boomeramg_agg_nl and -pc_hypre_boomeramg_agg_num_paths.

-pc_hypre_boomeramg_agg_nl is the number of coarsening levels to apply “aggressive coarsening” to. Aggressive coarsening does just what you think it does: it tries even harder to remove matrix entries. The way it does this is looking at “second-order” connections: does there exist a path from one important entry to another important entry through several other entries. By looking at these pathways the algorithm will decide whether or not to keep an entry. Doing more aggressive coarsening will result in less time spent in BoomerAMG (and a lot less communication done) but will also impact the effectiveness of the preconditioner by quite a lot - so it’s a balance.

-pc_hypre_boomeramg_agg_num_paths is the number of pathways to consider to find a connection and keep something. That means increasing this value will reduce the ammount of aggressive coarsening happening in each aggressive coarsening level. What this means is that a higher -pc_hypre_boomeramg_agg_num_paths will improve accuracy/effectiveness but slow things down. So it’s a balance.

By default aggressive coarsening is off (-pc_hypre_boomeramg_agg_nl 0), so to turn it on set -pc_hypre_boomeramg_agg_nl to something higher than zero. I recommend 2 or 3 to start with, but even 4 can be ok in 3D. -pc_hypre_boomeramg_agg_num_paths defaults to 1: which is the most aggressive setting. If the aggressive coarsening levels are causing too many linear iterations, try increasing the number of paths first. Go up to about 4,5 or 6 and see if it helps reduce the number of linear iterations. If it doesn’t, then you may need to back off on the number of aggressive coarsening levels you are doing. All a balancing act…

Interpolation Truncation
You can also coarsen during the interpolation operation. One way to do that is to set -pc_hypre_boomeramg_truncfactor. This value should be between 0 and 1 and works similarly to the strong threshold: the higher you set it the more entries are ignored. I recommend a value around 0.3 to start with. You can adjust this up (maybe 0.4 or 0.5) for some speed or adjust it down (0.2, etc.) for more accuracy. Balance.

Interpolation Type

Speaking of interpolation - it’s expensive and how it’s done can greatly effect the accuracy and efficiency of BoomerAMG. Ideally, you should choose an interpolation operation that matches your physics (check the Hypre manual and various Hypre papers for discussion about which interpolation operators are better for which physics) but there are some good rules of thumb.

To change it set -pc_hypre_boomeramg_interp_type. The default is classic. This tends to be really slow and unnecessary - especially for 3D problems. I recommend starting with ext+i (yes, that’s what the value of the option is). It stands for “extended+i”. This is a good all around option that has low communication overhead.

There are many more options here, but I’m not going to enumerate them for now.

P Max

I’m going to be honest: I don’t quite understand what -pc_hypre_boomeramg_P_max does exactly. I’ve read about it - but I still can’t quite get it. The description from PETSc is: “Max elements per row for interpolation operator”. Setting this low (~2) seems to do a good job. Setting it higher seems to make the solve less accurate. However: that goes against my intuition - which is why I don’t quite understand what’s going on. If someone knows please email moose-users with a good eplanation!

Putting it All Together

So - what does an “evolved” BoomerAMG options line look like? Here’s one I’m currently using for a 3D Laplacian solve with ~6M elements on ~500 cores:

-pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_strong_threshold 0.7  -pc_hypre_boomeramg_agg_nl 4 -pc_hypre_boomeramg_agg_num_paths 5 -pc_hypre_boomeramg_max_levels 25 -pc_hypre_boomeramg_coarsen_type HMIS -pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_P_max 2 -pc_hypre_boomeramg_truncfactor 0.3

This is a good place to start if you’re looking for advanced usage. If it’s not “strong” enough for you (takes too many linear iterations) then reduce the number of aggressive coarsening levels (-pc_hypre_boomeramg_agg_nl) - if it’s still too slow… reduce the number of aggressive coarsening paths or try a different coarsening type (like PMIS) or add a bit more truncation (maybe go to 0.4 or 0.5). It’s all about balance

Full List of Options

Here is the full list of Hypre options present in PETSc 3.7.6 (found using -help | grep -C 5 -i hypre):

HYPRE preconditioner options-pc_hypre_type <boomeramg> (choose one of) pilut parasails boomeramg ams (PCHYPRESetType)
HYPRE BoomerAMG Options-pc_hypre_boomeramg_cycle_type <V> (choose one of) V W (None)-pc_hypre_boomeramg_max_levels <25>: Number of levels (of grids) allowed (None)-pc_hypre_boomeramg_max_iter <1>: Maximum iterations used PER hypre call (None)-pc_hypre_boomeramg_tol <0.>: Convergence tolerance PER hypre call (0.0 = use a fixed number of iterations) (None)-pc_hypre_boomeramg_truncfactor <0.>: Truncation factor for interpolation (0=no truncation) (None)-pc_hypre_boomeramg_P_max <0>: Max elements per row for interpolation operator (0=unlimited) (None)-pc_hypre_boomeramg_agg_nl <0>: Number of levels of aggressive coarsening (None)-pc_hypre_boomeramg_agg_num_paths <1>: Number of paths for aggressive coarsening (None)-pc_hypre_boomeramg_strong_threshold <0.25>: Threshold for being strongly connected (None)-pc_hypre_boomeramg_max_row_sum <0.9>: Maximum row sum (None)-pc_hypre_boomeramg_grid_sweeps_all <1>: Number of sweeps for the up and down grid levels (None)-pc_hypre_boomeramg_nodal_coarsen <0>: Use a nodal based coarsening 1-6 (HYPRE_BoomerAMGSetNodal)-pc_hypre_boomeramg_vec_interp_variant <0>: Variant of algorithm 1-3 (HYPRE_BoomerAMGSetInterpVecVariant)-pc_hypre_boomeramg_grid_sweeps_down <1>: Number of sweeps for the down cycles (None)-pc_hypre_boomeramg_grid_sweeps_up <1>: Number of sweeps for the up cycles (None)-pc_hypre_boomeramg_grid_sweeps_coarse <1>: Number of sweeps for the coarse level (None)-pc_hypre_boomeramg_smooth_type <Schwarz-smoothers> (choose one of) Schwarz-smoothers Pilut ParaSails Euclid (None)-pc_hypre_boomeramg_smooth_num_levels <25>: Number of levels on which more complex smoothers are used (None)-pc_hypre_boomeramg_eu_level <0>: Number of levels for ILU(k) in Euclid smoother (None)-pc_hypre_boomeramg_eu_droptolerance <0.>: Drop tolerance for ILU(k) in Euclid smoother (None)-pc_hypre_boomeramg_eu_bj: <FALSE> Use Block Jacobi for ILU in Euclid smoother? (None)-pc_hypre_boomeramg_relax_type_all <symmetric-SOR/Jacobi> (choose one of) Jacobi sequential-Gauss-Seidel seqboundary-Gauss-Seidel SOR/Jacobi backward-SOR/Jacobi  symmetric-SOR/Jacobi  l1scaled-SOR/Jacobi Gaussian-elimination      CG Chebyshev FCF-Jacobi l1scaled-Jacobi (None)-pc_hypre_boomeramg_relax_type_down <symmetric-SOR/Jacobi> (choose one of) Jacobi sequential-Gauss-Seidel seqboundary-Gauss-Seidel SOR/Jacobi backward-SOR/Jacobi  symmetric-SOR/Jacobi  l1scaled-SOR/Jacobi Gaussian-elimination      CG Chebyshev FCF-Jacobi l1scaled-Jacobi (None)-pc_hypre_boomeramg_relax_type_up <symmetric-SOR/Jacobi> (choose one of) Jacobi sequential-Gauss-Seidel seqboundary-Gauss-Seidel SOR/Jacobi backward-SOR/Jacobi  symmetric-SOR/Jacobi  l1scaled-SOR/Jacobi Gaussian-elimination      CG Chebyshev FCF-Jacobi l1scaled-Jacobi (None)-pc_hypre_boomeramg_relax_type_coarse <Gaussian-elimination> (choose one of) Jacobi sequential-Gauss-Seidel seqboundary-Gauss-Seidel SOR/Jacobi backward-SOR/Jacobi  symmetric-SOR/Jacobi  l1scaled-SOR/Jacobi Gaussian-elimination      CG Chebyshev FCF-Jacobi l1scaled-Jacobi (None)-pc_hypre_boomeramg_relax_weight_all <1.>: Relaxation weight for all levels (0 = hypre estimates, -k = determined with k CG steps) (None)-pc_hypre_boomeramg_relax_weight_level <1.>: Set the relaxation weight for a particular level (weight,level) (None)-pc_hypre_boomeramg_outer_relax_weight_all <1.>: Outer relaxation weight for all levels (-k = determined with k CG steps) (None)-pc_hypre_boomeramg_outer_relax_weight_level <1.>: Set the outer relaxation weight for a particular level (weight,level) (None)-pc_hypre_boomeramg_no_CF: <FALSE> Do not use CF-relaxation (None)-pc_hypre_boomeramg_measure_type <local> (choose one of) local global (None)-pc_hypre_boomeramg_coarsen_type <Falgout> (choose one of) CLJP Ruge-Stueben  modifiedRuge-Stueben   Falgout  PMIS  HMIS (None)-pc_hypre_boomeramg_interp_type <classical> (choose one of) classical   direct multipass multipass-wts ext+i ext+i-cc standard standard-wts   FF FF1 (None)-pc_hypre_boomeramg_print_statistics: Print statistics (None)-pc_hypre_boomeramg_print_statistics <3>: Print statistics (None)-pc_hypre_boomeramg_print_debug: Print debug information (None)-pc_hypre_boomeramg_nodal_relaxation: <FALSE> Nodal relaxation via Schwarz (None)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/64673.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

WPF 依赖属性和附加属性

除了普通的 CLR 属性&#xff0c; WPF 还有一套自己的属性系统。这个系统中的属性称为依赖属性。 1. 依赖属性 为啥叫依赖属性&#xff1f;不叫阿猫阿狗属性&#xff1f; 通常我们定义一个普通 CLR 属性&#xff0c;其实就是获取和设置一个私有字段的值。假设声明了 100 个 …

java 根据路径下载文件转换为MultipartFile,并且上传到服务器

直接上代码 controller层 GetMapping("/downloadAndUploadAttachment")UpdateOperationLogging(msg "根据路径下载文件转换为MultipartFile,并且上传到服务器")Operation(summary "根据路径下载文件转换为MultipartFile,并且上传到服务器", de…

在linux系统的docker中安装GitLab

一、安装GitLab&#xff1a; 在安装了docker之后就是下载安装GitLab了&#xff0c;在linux系统中输入命令&#xff1a;docker search gitlab就可以看到很多项目&#xff0c;一般安装第一个&#xff0c;它是英文版的&#xff0c;如果英文不好可以安装twang2218/gitlab-ce-zh。 …

2024最新CF罗技鼠标宏

使用效果&#xff1a; 支持的功能 M4 7发一个点HK417 连点瞬狙炼狱加特林一个圈 下载链接 点击下载

matlab的一些时间函数【转】

看到就记下来&#xff0c;感觉挺好玩的。 原文&#xff1a;MATLAB-一些时间函数 - 简书 (jianshu.com) 注明出处了&#xff0c;原文是公开的&#xff0c;应该不算侵权。若有侵权请告知删除谢谢。

《基于 Python 的网页爬虫详细教程》

一、引言 在当今信息时代&#xff0c;从互联网上获取大量有价值的数据对于许多领域的研究和分析至关重要。网页爬虫是一种自动化程序&#xff0c;可以从网页上抓取所需的数据。Python 作为一种强大的编程语言&#xff0c;拥有丰富的库和工具&#xff0c;使得网页爬虫的开发变得…

JS CSS HTML 的代码如何快速封装

我们为什么要封装代码&#xff0c;是因为封装后的代码&#xff0c;会显得非常美观&#xff0c;减少代码的复用&#xff0c;方便我们更好的去维护代码&#xff0c;不用一个一个页面的去找去改&#xff0c;直接封装好的代码里面去改就可以了 目录 1.html代码封装 2.CSS代码封装…

曲面的共形变换

共形变换 曲面 S , S ~ S,\tilde{S} S,S~, σ : S → S ~ \sigma:S\to\tilde{S} σ:S→S~是光滑双射。如果对于 S S S上任意两条相交曲线&#xff0c; σ \sigma σ保持两线夹角&#xff0c;则称 σ \sigma σ为 S → S ~ S\to\tilde{S} S→S~的共形变换。 设曲面有参数化表…

使用docker拉取镜像很慢或者总是超时的问题

在拉取镜像的时候比如说mysql镜像&#xff0c;在拉取 时总是失败&#xff1a; 像这种就是网络的原因&#xff0c;因为你是连接到了外网去进行下载的&#xff0c;这个时候可以添加你的访问镜像源。也就是daemon.json文件&#xff0c;如果你没有这个文件可以输入 vim /etc/dock…

sql server 查询对象的修改时间

sql server 不能查询索引的最后修改时间&#xff0c;可以查询表&#xff0c;存储过程&#xff0c;函数&#xff0c;pk 的最后修改时间使用以下语句 select * from sys.all_objects ob order by ob.modify_date desc 但可以参考一下统计信息的最后修改时间&#xff0c;因为索…

使用 esrally race 测试 Elasticsearch 性能:实践指南

在 Elasticsearch 性能优化和容量规划中&#xff0c;使用 esrally 进行基准测试是官方推荐的方式。通过 esrally race 命令&#xff0c;您可以针对不同的数据集与挑战类型&#xff0c;对 Elasticsearch 集群进行精确的性能评估。本文将简要介绍常用的数据集与挑战类型&#xff…

MySQL复制问题和解决

目录 环境介绍 一&#xff0c;主库执行delete&#xff0c;从库没有该数据 模拟故障 修复故障 二&#xff0c;主库执行insert&#xff0c;从库已存在该数据 模拟故障 故障恢复 三&#xff0c;主库执行update&#xff0c;从库没有该数据 模拟故障 故障恢复 四&#xf…

提供详细的步骤指导,如何正确地安装和设置易支付服务?

要正确安装和设置易支付服务&#xff0c;您可以按照以下步骤进行操作&#xff1a; 下载易支付服务软件&#xff1a;首先&#xff0c;您需要从易支付官方网站或其他可信来源下载易支付服务软件的安装包。确保下载的软件版本是最新的。 安装易支付服务&#xff1a;双击安装包&am…

[RocketMQ] 发送重试机制与消费重试机制~

发送重试 RocketMQ 客户端发送消息时&#xff0c;由于网络故障等因素导致消息发送失败&#xff0c;这时客户端SDK会触发重试机制&#xff0c;尝试重新发送以达到调用成功的效果。 触发条件 客户端消息发送请求失败或超时。服务端节点处于重启或下线状态。服务端运行慢造成请…

mfc140u.dll是什么文件?如何解决mfc140u.dll丢失的相关问题

遇到“mfc140u.dll文件丢失”的错误通常影响应用程序的运行&#xff0c;这个问题主要出现在使用Microsoft Visual C环境开发的软件中。mfc140u.dll是一个重要的系统文件&#xff0c;如果它丢失或损坏&#xff0c;会导致相关程序无法启动。本文将简要介绍几种快速有效的方法来恢…

前端使用xlsx.js实现 Excel 文件的导入与导出功能

前端使用xlsx.js实现 Excel 文件的导入与导出功能 在现代的 Web 开发中&#xff0c;处理文件上传和导出功能已经变得越来越常见&#xff0c;尤其是 Excel 文件的导入与导出。 我们将使用 Vue.js 和 XLSX.js 库来处理 Excel 文件的读取和生成。XLSX.js 是一个强大的 JavaScrip…

02-9.python入门基础一Python模块与包(一)

一、Python 模块的概念 &#xff08;一&#xff09;模块的基本定义 在 Python 中&#xff0c;模块&#xff08;Module&#xff09;是一种组织代码的基本单元&#xff0c;简单来说&#xff0c;一个以 .py 结尾的 Python 文件就是一个模块。例如&#xff0c;我们创建一个名为 ex…

当我用影刀AI Power做了一个旅游攻略小助手

在线体验地址&#xff1a;旅游攻略小助手https://power.yingdao.com/assistant/ca1dfe1c-9451-450e-a5f1-d270e938a3ad/share 运行效果图展示&#xff1a; 话不多说一起看下效果图&#xff1a; 智能体的截图&#xff1a; 工作流截图&#xff1a; 搭建逻辑&#xff1a; 其实这…

Apache Tomcat 漏洞CVE-2024-50379条件竞争文件上传漏洞 servlet readonly spring boot 修复方式

1&#xff0c;关于漏洞 Apache Tomcat是一个流行的开源 Web 服务器和 Java Servlet 容器。 二、 漏洞描述 Apache Tomcat中修复了个 TOCTOU 竞争条件远程代码执行漏洞 (CVE-2024-50379)&#xff0c;该漏洞的 CVSS 评分为 9.8。Apache Tomcat 中 JSP 编译期间存在检查时间使用时…

AI Agent与MEME:技术与文化融合驱动Web3创新

AI Agent如何引领Web3新时代&#xff1f; 随着Web3与区块链技术的迅速发展&#xff0c;AI Agent作为人工智能与区块链的交汇点&#xff0c;正在逐步成为推动去中心化生态的重要力量。同时&#xff0c;MEME文化凭借其强大的社区驱动力和文化渗透力&#xff0c;在链上生态中扮演着…