批量更新:
PreparedStatement并不能减少sql的执行数目. 参数传入多少次,就会执行多少条sql.
在一个事务中,由于Statement与PrestatedStatement都使用的同一个连接. Statement不会再象过去每次都去获取连接. 这种情况下,反而可以利用Statement优化PreparedStatement. 减少sql的执行条数. 但是每条sql会编译,获得执行计划.
示例1:质量度更新:
比如关键词在审核的过程中先去查询A表,得到该词的历史质量度, 如果有,就更新B表中的质量度值.
public void checkQuality() {
// 先取得词的历史质量度
Map<String, Long> qhmap = getQualityWithAccountidAndKey(
needCheckQualityKeys, accountid);
// QualityHistory表中能根据groupid-key查到历史质量度.则需要更新keyspend的值.否则什么都不做.
if (qhmap != null && qhmap.size() > 0) {
List<KeySpend> kslist = new ArrayList<KeySpend>();
for (CpcKeyDTO key : needCheckQualityKeys) {
if (qhmap.get(key.getKey()) != null) {
KeySpend keyspend = new KeySpend();
keyspend.setCpcid(key.getCpcid());
keyspend.setQuality(qhmap.get(key.getKey()));
kslist.add(keyspend);
}
}
//会在server端执行kslist.size()条sql.
batchProcessKeySpend(kslist);
}
}
调优后的方法:
if (qhmap != null && qhmap.size() > 0) {
HashMap<Long, ArrayList<Long>> keySpendMap = new HashMap<Long, ArrayList<Long>>();
//质量度有5个值, 所以每个值对应一条sql. 将来只需要5条sql
for (CpcKeyDTO key : needCheckQualityKeys) {
if (qhmap.get(key.getKey()) != null) {
Long degree = qhmap.get(key.getKey());
ArrayList<Long> list = keySpendMap.get(degree);
if(list == null)
{
list = new ArrayList<Long>();
keySpendMap.put(degree, list);
}
list.add(key.getCpcid());
}
}
//使用IN进行优化。
for(Long degree : keySpendMap.keySet())
{
auditPersistentService.batchProcessKeySpend(degree, keySpendMap.get(degree));
}
}
上面调优之后,由以前的kslist条sql变成了现在的5条sql,但是这5条sql都会重新编译.
经测试:10个Key感觉差别不是特别明显. 都在16ms. 结论待定.
示例2: 在广告系统中配对的批量插入与批量更新的一个优化示例:
批量广告插入最初使用的是merge into.但是即使使用的PreparedStatement的sql,仍然要执行25W次.
比如应用程序中形成的了一个配对的集合,包含了25W个元素.通过jdbcTemplate的batchUpdate批量插入这些元素,并且还需要判重,使用的sql如下:
/* Formatted on 2011-8-2 18:54:00 (QP5 v5.114.809.3010) */
MERGE INTO CPC.CPCPARTNERAUDIT cpa
USING (SELECT ? AS OPID,? AS ACCOUNTID,? AS GROUPID,? AS IDEAID,? AS KEYID,? AS CHECKSTATUS,? AS CREATEDATE,? AS REFUSEREASON, ? AS ADMINUSERID,? AS ADMINUSERNAME,? AS AUDITREASON,? AS BACKUPIDEAID FROM DUAL) cpai
ON (cpa.keyid=cpai.keyid AND cpa.ideaid=cpai.ideaid)
WHEN NOT MATCHED THEN
INSERT
VALUES (cpai.OPID, cpai.ACCOUNTID, cpai.GROUPID, cpai.IDEAID, cpai.KEYID,
cpai.CHECKSTATUS, cpai.CREATEDATE, cpai.REFUSEREASON, cpai.ADMINUSERID, cpai.ADMINUSERNAME,
cpai.AUDITREASON,cpai.BACKUPIDEAID)
为了在同时使用PreparedStatement的时候,将这25W条sql降下来,重构应用程序如下:在应用程序中形成ideaid-keylist的集合,针对每个idea对应的key做下面的操作,执行sql数目=idea的数目.
INSERT INTO CPC.CPCPARTNERAUDIT (OPID,
ACCOUNTID,
GROUPID,
IDEAID,
KEYID,
CHECKSTATUS,
CREATEDATE,
REFUSEREASON,
ADMINUSERID,
ADMINUSERNAME,
AUDITREASON,
BACKUPIDEAID)
(SELECT A.*
FROM (SELECT GREATEST (I.OPID, C.OPID) AS OPID,
I.ACCOUNTID,
I.CPCGRPID,
I.CPCIDEAID,
C.CPCID,
1 AS CHECKSTATUS,
SYSDATE AS CREATEDATE,
'自动审核通过' AS REFUSEREASON,
0 AS ADMINUSERID,
'自动审核' AS ADMINUSERNAME,
'自动审核通过' AS AUDITREASON,
NULL AS BACKUPIDEAID
FROM CPC.CPCIDEA I, CPC.CPC C
WHERE I.CPCGRPID = C.CPCGRPID
AND I.CPCIDEAID = ?
AND C.CPCID IN (?)
UNION ALL
SELECT GREATEST (I.OPID, C.OPID) AS OPID,
C.ACCOUNTID,
I.GROUPID,
I.AUDITIDEAID AS CPCIDEAID,
C.CPCID,
1 AS CHECKSTATUS,
SYSDATE AS CREATEDATE,
'自动审核通过' AS REFUSEREASON,
0 AS ADMINUSERID,
'自动审核' AS ADMINUSERNAME,
'自动审核通过' AS AUDITREASON,
I.IDEAID AS BACKUPIDEAID
FROM CPC.CPCIDEAMODI I, CPC.CPC C
WHERE I.GROUPID = C.CPCGRPID
AND I.AUDITIDEAID = ?
AND C.CPCID IN (?)) A
WHERE NOT EXISTS
(SELECT P.KEYID, P.IDEAID
FROM CPC.CPCPARTNERAUDIT P
WHERE A.CPCIDEAID = P.IDEAID AND A.CPCID = P.KEYID))
大数据量查询的二种方式:
1. 一种全部查询到内存,然后使用subList, subList的场景比如in参数的限制.
.....
int group = passedPartners.size() % IDS_PER_BATCH == 0 ? passedPartners
.size()
/ IDS_PER_BATCH
: (passedPartners.size() / IDS_PER_BATCH + 1);
for (int i = 0; i < group; i++) {
final List<CpcPartnerDTO> batchPassedPartners;
if ((i + 1) * IDS_PER_BATCH > passedPartners.size()) {
batchPassedPartners = passedPartners.subList(i
* IDS_PER_BATCH, passedPartners.size());
} else {
batchPassedPartners = passedPartners.subList(i
* IDS_PER_BATCH, (i + 1) * IDS_PER_BATCH);
}
doInFacade(batchPassedPartners, AuditElement.PARTNER,
Action.INS_AUDIT_ELEMENT);
}
......
2. 直接使用数据库的分页sql获得子集.而不在内存中操作. 这样的好处是将子集放到一个中间结果里,每次再将中间结果合并到目标结果集.避免内存中同时出现二个大的结果集.
for (int pageNo = 1; pageNo <= totalPageCount; pageNo++) {
doPage(sql, pageNo, set);
}
......
private void doPage(String sql, int pageNo, Set<String> set) {
long t1 = System.currentTimeMillis();
final int startIndex = PageUtil.getStartOfPage(pageNo, pageSize);
String sqlLimit = PageUtil.getLimitString(sql, true);
Object[] obj = new Object[] { startIndex, startIndex + pageSize };
List<String> list = jdbcTemplateCpc.query(sqlLimit, obj,
new RowMapper() {
@Override
public Object mapRow(ResultSet rs, int arg1)
throws SQLException {
return "" + rs.getString(2) + rs.getString(1);
}
});
set.addAll(list);
long t2 = System.currentTimeMillis();
logger.info("第" + pageNo + "次查询数据量" + list.size() + ",set中现有"
+ set.size() + "条记录,耗时" + ((t2 - t1) / 1000) + "秒");
}