介绍
JDBC长期以来一直为DML语句批处理提供支持。 默认情况下,所有语句都一个接一个地发送,每个语句都在单独的网络往返中发送。 批处理使我们能够一次性发送多个语句,从而节省了不必要的套接字流刷新。
Hibernate将数据库语句隐藏在事务后写抽象层的后面 。 中间层允许我们从持久层逻辑中隐藏JDBC批处理语义。 这样,我们可以更改JDBC批处理策略,而无需更改数据访问代码。
配置Hibernate来支持JDBC批处理并不是那么容易,所以我将解释为使其工作所需要做的一切。
测试时间
我们将从以下实体模型开始:
该帖子与Comment实体具有一对多关联:
@OneToMany(cascade = CascadeType.ALL, mappedBy = "post", orphanRemoval = true)
private List<Comment> comments = new ArrayList<>();
或者测试场景同时发出INSERT和UPDATE语句,因此我们可以验证是否正在使用JDBC批处理:
LOGGER.info("Test batch insert");
long startNanos = System.nanoTime();
doInTransaction(session -> {int batchSize = batchSize();for(int i = 0; i < itemsCount(); i++) {Post post = new Post(String.format("Post no. %d", i));int j = 0;post.addComment(new Comment(String.format("Post comment %d:%d", i, j++)));post.addComment(new Comment(String.format("Post comment %d:%d", i, j++)));session.persist(post);if(i % batchSize == 0 && i > 0) {session.flush();session.clear();}}
});
LOGGER.info("{}.testInsert took {} millis",getClass().getSimpleName(),TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startNanos));LOGGER.info("Test batch update");
startNanos = System.nanoTime();doInTransaction(session -> {List<Post> posts = session.createQuery("select distinct p " +"from Post p " +"join fetch p.comments c").list();for(Post post : posts) {post.title = "Blog " + post.title;for(Comment comment : post.comments) {comment.review = "Blog " + comment.review;}}
});LOGGER.info("{}.testUpdate took {} millis",getClass().getSimpleName(),TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startNanos));
该测试将保留可配置数量的Post实体,每个实体包含两个Comment 。 为了简洁起见,我们将保留3个帖子和方言的默认批处理大小:
protected int itemsCount() {return 3;
}protected int batchSize() {return Integer.valueOf(Dialect.DEFAULT_BATCH_SIZE);
}
默认批处理支持
Hibernate不会隐式使用JDBC批处理,并且每个INSERT和UPDATE语句都是分别执行的:
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
配置
要启用JDBC批处理,我们必须配置hibernate.jdbc.batch_size属性:
非零值允许Hibernate使用JDBC2批处理更新(例如,建议值介于5到30之间)
我们将设置此属性并重新运行测试:
properties.put("hibernate.jdbc.batch_size", String.valueOf(batchSize()));
这次,批处理Comment INSERT语句,而UPDATE语句保持不变:
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
JDBC批处理只能针对一个表,因此,针对不同表的每个新DML语句都会终止当前的批处理并启动一个新的。 因此,在使用SQL批处理时,不希望混合使用不同的表语句。
订购说明
Hibernate可以使用以下配置选项对INSERT和UPDATE语句进行排序:
properties.put("hibernate.order_inserts", "true");
properties.put("hibernate.order_updates", "true");
尽管对Post和Comment INSERT语句进行了相应的批处理,但UPDATE语句仍单独执行:
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
添加版本数据批处理支持
我们需要设置hibernate.jdbc.batch_versioned_data配置属性,以启用UPDATE批处理:
如果您的JDBC驱动程序从executeBatch()返回正确的行数,则将此属性设置为true。 通常可以安全地打开此选项。 然后,Hibernate将使用批处理的DML来自动版本化数据。 默认为false。
我们也将使用此属性集重新运行测试:
properties.put("hibernate.jdbc.batch_versioned_data", "true");
现在, INSERT和UPDATE语句均已正确批处理:
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
基准测试
既然我们已经为JDBC批处理配置了Hibernate,我们就可以对语句分组的性能提升进行基准测试。
- 测试用例使用与当前正在运行的JVM安装在同一台机器上的PostgreSQL数据库
- 选择了50的批量,并且每次测试迭代都会将语句计数增加一个数量级
- 所有持续时间均以毫秒表示
报表数量 | 无批次插入持续时间 | 无批次更新持续时间 | 批量插入时间 | 批量更新持续时间 |
---|---|---|---|---|
30 | 218 | 178 | 191 | 144 |
300 | 311 | 327 | 208 | 217 |
3000 | 1047 | 1089 | 556 | 478 |
30000 | 5889 | 6032 | 2640 | 2301 |
300000 | 51785 | 57869 | 16052 | 20954 |
我们执行INSERT或UPDATE的行越多,从JDBC批处理中受益越多。 对于最写的应用程序(例如企业级企业批处理程序 ),我们绝对应该启用JDBC批处理,因为其性能优势可能是惊人的 。
- 代码可在GitHub上获得 。
翻译自: https://www.javacodegeeks.com/2015/03/how-to-batch-insert-and-update-statements-with-hibernate.html