SQL Server 2005 Paging – The Holy Grail

SQLServer2005高效分页
本文探讨了SQL Server 2005中实现高效分页查询的方法,对比了多种分页技术,包括两次查询法、临时表法、COUNT(*) OVER()法等,并提出了一种接近理想状态的解决方案,该方案能在查询单页数据的同时获得总记录数而几乎不增加额外开销。

 

SQL Server 2005 Paging – The Holy Grail

By Robert Cary, 2009/03/11

Introduction

The paging and ranking functions introduced in 2005 are old news by now, but the typical ROW_NUMBER OVER() implementation only solves part of the problem.

Nearly every application that uses paging gives some indication of how many pages (or total records) are in the total result set. The challenge is to query the total number of rows, and return only the desired records with a minimum of overhead? The holy grail solution would allow you to return one page of the results and the total number of rows with no additional I/O overhead.

In this article, we're going to explore four approaches to this problem and discuss their relative strengths and weaknesses. For the purposes of comparison, we'll be using I/O as a relative benchmark.

The 'two-bites' approach

The most basic approach is the 'two-bites' approach. In this approach you, effectively, run your query twice; querying the total rows in one pass, and querying your result set in the second. The code is pretty straightforward:

DECLARE @startRow INT ; SET @startrow = 50
SELECTCOUNT(*) AS TotRows
FROM [INFORMATION_SCHEMA].columns

;WITH cols
AS
(
    SELECT table_name, column_name,
        ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq

    FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDERBY seq

It gives the desired results, but this approach doubles the cost of the query because you query your underlying tables twice:

(1 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(50 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The temp table approach

The 'two-bites' approach is especially undesirable if your paged query is very expensive and complex. A common workaround is to write the superset into a temporary table, then query out the subset. This is also the most common way to implement paging pre-2005 (in this case, ROW_NUMBER is superfluous).

DECLARE @startRow INT ; SET @startrow = 50
CREATETABLE #pgeResults(
    id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    table_name VARCHAR(255),
    column_name VARCHAR(255)
)
INSERTINTO #pgeResults(Table_name, column_name)
SELECT table_name, column_name
FROM [INFORMATION_SCHEMA].columns
ORDERBY [table_name], [column_name]

SELECT@@ROWCOUNT AS TotRows
SELECT Table_Name, Column_Name
FROM #pgeResults
WHERE id between @startrow and @startrow + 49
ORDERBY id

DROPTABLE #pgeResults

Looking at the query plan, you can see that your underlying tables are queried only once but the I/O stats show us that you take an even bigger hit populating the temporary table.

Table '#pgeResults_________________________________________________________________________________________________________000000001A9F'. Scan count 0, logical reads 5599, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 39, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(2762 row(s) affected)

(1 row(s) affected)

(50 row(s) affected)
Table '#pgeResults_________________________________________________________________________________________________________000000001A9F'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In this case, it would be better to query the tables twice. Maybe some new 2005 functionality can yield a better solution.

The COUNT(*) OVER() Approach

OVER() can also be used with Aggregate Window Functions. For our purposes this means we can do a COUNT(*) without the need for a GROUP BY clause, returning the total count in our result set. The code definitely looks much cleaner and, if your application permits it, you can simply return one dataset (eliminating the overhead of writing to a temp table).

DECLARE @startRow INT ; SET @startrow = 50

;WITH cols
AS
(
    SELECT table_name, column_name,
    ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq,
    COUNT(*) OVER() AS totrows
    FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name, totrows
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDERBY seq

Unfortunately this approach has it's own hidden overhead:

Table 'Worktable'. Scan count 3, logical reads 5724, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Where did that come from? In this case, SQL Server implements the COUNT(*) OVER() by dumping all the data into a hidden spool table, which it then aggregates and joins back to your main output. It does this to avoid re scanning the underlying tables. Although this approach looks the cleanest, it introduces the most overhead. 

I've spent most of today cleaning up and old data-paging proc that is both very inefficient and frequently called enough for me to notice it. I've explored probably a dozen other approaches to solving this problem before I came up with the solution below. For the sake of brevity—and because they rest are pretty obscure and equally inefficient­—we'll now skip to the best solution.

The Holy Grail

In theory, ROW_NUMBER() gives you all the information you need because it assigns a sequential number to every single row in your result set. It all falls down, of course, when you only return a subset of your results that don't include the highest sequential number. The solution is to return a 2nd column of sequential numbers, in the reverse order. The total number of the records will always be the sum of the two fields on any given row minus 1 (unless one of your sequences is zero-bound).

DECLARE @startRow INT ; SET @startrow = 50
;WITH cols
AS
(
    SELECT table_name, column_name,
        ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq,
        ROW_NUMBER() OVER(ORDER BY table_name DESC, column_name desc) AS totrows
    FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name, totrows + seq -1 as TotRows
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDERBY seq

This approach gives us our page of data and the total number of rows with zero additional overhead! (well, maybe one or two ms of CPU time, but that's it) The I/O statistics are identical to querying just the subset of records.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Compare the stats above with the stats and query below (just returning one page of data).

;WITH cols
AS
(
    SELECT table_name, column_name,
        ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq
    FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDERBY seq

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Conclusion

I have found this approach to be best suited for smaller resultsets from complex queries where I/O is the primary bottleneck. Jeff Moden, Peso and others here have pointed out that with larger resultsets, the I/O cost you save is more than outweighed by the CPU cost. You definitly want to compare different approches to find the best solution for your problem.

My real goal here was to try and figure out a way to avoid unnecessary I/O overhead. I am sure that this solution is not the last word on the subject and I greatly look forward to hearing your thoughts, experiences and ideas on this topic. Thank you all for reading and for your feedback.

By Robert Cary, 2009/03/11

本项目采用C++编程语言结合ROS框架构建了完整的双机械臂控制系统,实现了Gazebo仿真环境下的协同运动模拟,并完成了两台实体UR10工业机器人的联动控制。该毕业设计在答辩环节获得98分的优异成绩,所有程序代码均通过系统性调试验证,保证可直接部署运行。 系统架构包含三个核心模块:基于ROS通信架构的双臂协调控制器、Gazebo物理引擎下的动力学仿真环境、以及真实UR10机器人的硬件接口层。在仿真验证阶段,开发了双臂碰撞检测算法和轨迹规划模块,通过ROS控制包实现了末端执行器的同步轨迹跟踪。硬件集成方面,建立了基于TCP/IP协议的实时通信链路,解决了双机数据同步和运动指令分发等关键技术问题。 本资源适用于自动化、机械电子、人工智能等专业方向的课程实践,可作为高年级课程设计、毕业课题的重要参考案例。系统采用模块化设计理念,控制核心与硬件接口分离架构便于功能扩展,具备工程实践能力的学习者可在现有框架基础上进行二次开发,例如集成视觉感知模块或优化运动规划算法。 项目文档详细记录了环境配置流程、参数调试方法和实验验证数据,特别说明了双机协同作业时的时序同步解决方案。所有功能模块均提供完整的API接口说明,便于使用者快速理解系统架构并进行定制化修改。 资源来源于网络分享,仅用于学习交流使用,请勿用于商业,如有侵权请联系我删除!
【微电网】【创新点】基于非支配排序的蜣螂优化算法NSDBO求解微电网多目标优化调度研究(Matlab代码实现)内容概要:本文围绕基于非支配排序的蜣螂优化算法(NSDBO)在微电网多目标优化调度中的应用展开研究,提出了一种改进的智能优化算法以解决微电网系统中经济性、环保性和能源效率等多重目标之间的权衡问题。通过引入非支配排序机制,NSDBO能够有效处理多目标优化中的帕累托前沿搜索,提升解的多样性和收敛性,并结合Matlab代码实现仿真验证,展示了该算法在微电网调度中的优越性能和实际可行性。研究涵盖了微电网典型结构建模、目标函数构建及约束条件处理,实现了对风、光、储能及传统机组的协同优化调度。; 适合人群:具备一定电力系统基础知识和Matlab编程能力的研究生、科研人员及从事微电网、智能优化算法应用的工程技术人员;熟悉优化算法与能源系统调度的高年级本科生亦可参考。; 使用场景及目标:①应用于微电网多目标优化调度问题的研究与仿真,如成本最小化、碳排放最低与供电可靠性最高之间的平衡;②为新型智能优化算法(如蜣螂优化算法及其改进版本)的设计与验证提供实践案例,推动其在能源系统中的推广应用;③服务于学术论文复现、课题研究或毕业设计中的算法对比与性能测试。; 阅读建议:建议读者结合文中提供的Matlab代码进行实践操作,重点关注NSDBO算法的核心实现步骤与微电网模型的构建逻辑,同时可对比其他多目标算法(如NSGA-II、MOPSO)以深入理解其优势与局限,进一步开展算法改进或应用场景拓展。
内容概要:本文详细介绍了使用ENVI与SARscape软件进行DInSAR(差分干涉合成孔径雷达)技术处理的完整流程,涵盖从数据导入、预处理、干涉图生成、相位滤波与相干性分析、相位解缠、轨道精炼与重去平,到最终相位转形变及结果可视化在内的全部关键步骤。文中以Sentinel-1数据为例,系统阐述了各环节的操作方法与参数设置,特别强调了DEM的获取与处理、基线估算、自适应滤波算法选择、解缠算法优化及轨道精炼中GCP点的应用,确保最终获得高精度的地表形变信息。同时提供了常见问题的解决方案与实用技巧,增强了流程的可操作性和可靠性。; 适合人群:具备遥感与GIS基础知识,熟悉ENVI/SARscape软件操作,从事地质灾害监测、地表形变分析等相关领域的科研人员与技术人员;适合研究生及以上学历或具有相关项目经验的专业人员; 使用场景及目标:①掌握DInSAR技术全流程处理方法,用于地表沉降、地震形变、滑坡等地质灾害监测;②提升对InSAR数据处理中关键技术环节(如相位解缠、轨道精炼)的理解与实操能力;③实现高精度形变图的生成与Google Earth可视化表达; 阅读建议:建议结合实际数据边学边练,重点关注各步骤间的逻辑衔接与参数设置依据,遇到DEM下载失败等问题时可参照文中提供的多种替代方案(如手动下载SRTM切片),并对关键结果(如相干性图、解缠图)进行质量检查以确保处理精度。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值