JOIN Rows Produced vs. Rows Read

JOIN Rows Produced vs. Rows Read

It's common knowledge that the number of rows a JOIN can produce is the product of the matching rows for each table. A three table JOIN with 500, 10, and 1 matching rows can produce 5,000 rows. WHERE conditions will hopefully reduce this number and cause fewer rows to be returned. While working on Optimizing Queries Case 3 I came across a situation where more rows produced caused fewer rows to be read. Rather than just feeling lucky, I wanted to see how this worked, and verify I wasn't imagining it. I asked around and got no response so I set out to prove it rationally if nothing else.

Test Data and Plan P-T-S

   Table P
      p_id =  1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12
      t_id =  1 / 2 / 7 / 4 / 7 / 7 / 8 / 1 / 8 /  2 /  4 /  1
      s_id =  1 / 1 / 3 / 7 / 3 / 3 / 5 / 1 / 5 /  5 /  7 /  3

   Table T
      t_id =  1 / 2 / 4 / 7 / 8 / 10

   Table S
      s_id =  1 / 3 / 5 / 7
   
Rows are listed vertically (so P row 1 has values 1, 1, and 1) to save space. Each _id is unique and the PRIMARY key. The original JOIN order from Case 3 was P-T-S, doing a full table scan on P and matching rows in T and S using the respective PRIMARY keys. For this test data this results in a join with 12, 1, and 1 matching rows. However the rows read is not 12 but 36. Quoting from the MySQL manual section on EXPLAIN:
The tables are listed in the output in the order that MySQL would read them while processing the query. MySQL resolves all joins using a single-sweep multi-join method. This means that MySQL reads a row from the first table, then finds a matching row in the second table, then in the third table, and so on. When all tables are processed, it outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table.
Reading this quite carefully, the JOIN process for the P-T-S order reads the first row from P (p_id = 1), then finds a matching row in T (t_id = 1), then finds a matching row in S (s_id = 1). Three reads: Read from P where p_id = 1, read from T where t_id = 1, and read from S where s_id = 1. Stick all these values together, output the row, and get the next row in P. In short: 3 rows read for each row in P; with 12 rows in P this equals 36 rows read. The result set of this JOIN is identical to table P (in a perfect world where every P.t_id and P.s_id actually has a matching T.t_id and S.s_id). Notice also there are 15 duplicate reads, rows like p_id = 1 and 2 both read S.s_id = 1, and p_id = 5 and 6 both read T.t_id = 7 and S.s_id = 3. You would think for T and S MySQL would just read each row with _id in distinct set P.t_id and P.s_id. However the manual says about eq_ref:
One row will be read from this table for each combination of rows from the previous tables.
The wording is not singular, like "for each row from the previous table," but plural. For T there are 12 "combinations" of P. For S there are 12 combinations of T and P since p_id is unique there will be no duplicate combinations to reduce. Therefore 12 reads for P, 12 for T, and 12 for S; 36 total reads.

Plan T-P-S

After running ANALYZE on all three tables in Case 3, MySQL changed the JOIN order from P-T-S to T-P-S because it avoided a table scan and filesort (due to that query using GROUP BY, ORDER BY, and LIMIT). Although this join plan produced 10,000 more rows, it read 175,000 less. For this test data this results in a join with 6, 2, and 1 matching rows; 12 produced (if the test data were larger and more diverse this would produce more rows) and 30 read. The result set of the JOIN is a little different because we read one row from T, then two rows from P, and two rows from S for each of P. It looks something like (read T, P, S):
      1, 1, 1     2, 2, 1     4, 4, 7     7, 3, 3     8, 7, 5
         8, 1       10, 5       11, 7        5, 3        9, 5
        12, 3                                6, 3
   
That's 29 reads actually and only 8 duplicate reads (all on S).

What About Plan S-P-T?

An over-simplified reason why plan T-P-S is better than P-T-S is because the initial set of keys, T.t_id, used to find all common rows is smaller than P.p_id. In other words, T.t_id is a smaller common denominator. The smallest common denominator is S.s_id. In Case 3 MySQL could not use a S-P-T join plan because this requires an index on P.s_id for it to be practical, and in that case there is no such index. For this test data there is which results in a join with 4, 3, and 1 matching rows; 12 produced and 28 read. This join looks like (read S, P, T):
      1, 1, 1     3, 7, 3     5, 7, 8     7, 4, 4
         2, 2        5, 7        9, 8       11, 4
         8, 1        6, 7       10, 2
                    12, 1
   
27 reads actually and 6 duplicates.

Summary

Plan: reads, duplicates
P-T-S: 36, 15
T-P-S: 29,  8
S-P-T: 27,  6
I tested plan S-P-T on the production server from Case 3: It reads about 12,000 less rows than T-P-S and 211,000 less than P-T-S. However it's 86% slower than T-P-S because it causes a filesort again, but still 33% faster than P-T-S (T-P-S is 62% faster than P-T-S).
<think>好的,我需要仔细分析用户提供的JavaScript函数updateTotals(table),然后分步骤解释它的功能。首先,我应该理解每一行代码的作用,特别是处理表格行、计算合计以及更新本月和本年累计的部分。 用户给出的代码是一个用于更新表格中借方和贷方总计的函数。让我先通读一遍代码,了解整体结构。函数接收一个table参数,然后从表格中提取行,排除前两行和最后两行。接着计算总借方和总贷方,更新本月合计,再更新本年累计。 接下来,我需要拆解每个部分。首先,const rows = Array.from(table.rows).slice(2, -2); 这里将表格的行转换为数组,并截取从第2行开始到倒数第2行之前的部分。这可能是因为前两行是表头或其他不需要的数据,最后两行可能是本月合计和本年累计,需要排除在计算之外。 然后是计算totalDebit和totalCredit,使用reduce方法累加每行的第4和第5个单元格的内容。这里要注意数组索引是从0开始的,所以cells[3]和cells[4]对应第4和第5列。parseFloat将文本内容转换为数值,确保正确相加。 接下来,更新本月合计行。通过table.rows[table.rows.length-2]找到倒数第二行,将计算的总借方和总贷方填入相应的单元格。 最后处理本年累计,这里需要将当前月的合计加到已有的累计值上。使用parseFloat转换现有内容,如果转换失败(比如为空或非数字),则默认为0,然后加上本月的totalDebit或totalCredit,再更新回去。 在解释过程中,需要指出可能的注意事项,例如表格结构的假设,比如行数和列的位置是否固定。如果表格结构变化,代码可能需要调整。还要提到parseFloat的潜在问题,比如如果单元格内容不是有效的数字,可能会导致NaN,但代码中使用了||0来避免,这是一个好的做法。 另外,需要说明为什么使用Array.from和slice方法,以及reduce的作用,确保读者理解这些JavaScript方法的用途。还要强调函数如何动态地更新合计值,而不仅仅是静态计算,这对于维护累计数据很重要。 最后,总结整个函数的工作流程,确保用户能够清晰地理解每个步骤的目的和实现方式。同时,指出可能的改进点或注意事项,帮助用户在实际应用中避免问题。</think>以下是对该函数的逐步解析: 1. **函数定义** ```javascript function updateTotals(table) { ... } ``` 这是一个用于更新表格合计的JavaScript函数,接受`table` DOM元素作为参数。 2. **获取有效数据行** ```javascript const rows = Array.from(table.rows).slice(2, -2); ``` - `Array.from(table.rows)`:将HTMLCollection转换为数组 - `.slice(2, -2)`:排除前2行(表头等)和最后2行(合计行) 3. **计算当月合计** ```javascript const totalDebit = rows.reduce((sum, row) => sum + parseFloat(row.cells[3].textContent), 0); const totalCredit = rows.reduce((sum, row) => sum + parseFloat(row.cells[4].textContent), 0); ``` - 假设表格结构: - 第4列(cells[3]):借方金额 - 第5列(cells[4]):贷方金额 - 使用`reduce`进行累加计算 4. **更新本月合计** ```javascript table.rows[table.rows.length-2].cells[3].textContent = totalDebit; table.rows[table.rows.length-2].cells[4].textContent = totalCredit; ``` - 倒数第二行是"本月合计"行 - 直接替换数值 5. **更新本年累计** ```javascript table.rows[table.rows.length-1].cells[3].textContent = (parseFloat(table.rows[table.rows.length-1].cells[3].textContent) || 0) + totalDebit; ``` - 最后一行是"本年累计"行 - `|| 0`处理空值或非数字内容 - 累加模式保留历史数据 **注意事项**: 1. 表格结构依赖严格的行列定位 2. 要求数值列必须包含可解析的数字 3. 使用`textContent`直接操作DOM 4. 没有错误处理机制 该函数实现了会计系统中常见的双栏式(借方/贷方)累计计算逻辑,适用于需要动态更新统计数据的表格场景。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值