10 Tips for Optimizing MySQL Queries (That don'...

本文批判性地分析了流行的MySQL查询优化建议,提出了一套更实用的优化策略,包括正确理解SQL和数据库基础、合理设计数据库模式、使用合适的存储引擎等功能,以及如何通过基准测试和性能剖析来找出并解决瓶颈。

10 Tips for Optimizing MySQL Queries (That don't suck)

by Jesse Farmer on Monday, April 7, 2008

Justin Silverton at Jaslabs has a supposed list of 10 tips for optimizing MySQL queries. I couldn't read this and let it stand because this list is really, really bad. Some guy named Mike noted this, too. So in this entry I'll do two things: first, I'll explain why his list is bad; second, I'll present my own list which, hopefully, is much better. Onward, intrepid readers!

Why That List Sucks

  1. He's swinging for the top of the trees

    The rule in any situation where you want to opimize some code is that you first profile it and then find the bottlenecks. Mr. Silverton, however, aims right for the tippy top of the trees. I'd say 60% of database optimization is properly understanding SQL and the basics of databases. You need to understand joins vs. subselects, column indices, how to normalize data, etc. The next 35% is understanding the performance characteristics of your database of choice. COUNT(*) in MySQL, for example, can either be almost-free or painfully slow depending on which storage engine you're using. Other things to consider: under what conditions does your database invalidate caches, when does it sort on disk rather than in memory, when does it need to create temporary tables, etc. The final 5%, where few ever need venture, is where Mr. Silverton spends most of his time. Never once in my life have I usedSQL_SMALL_RESULT.

  2. Good problems, bad solutions

    There are cases when Mr. Silverton does note a good problem. MySQL will indeed use a dynamic row format if it contains variable length fields like TEXT or BLOB, which, in this case, means sorting needs to be done on disk. The solution is not to eschew these datatypes, but rather to split off such fields into an associated table. The following schema represents this idea:

    CREATE TABLE posts (
    	id int unsigned not null auto_increment,
    	author_id int unsigned not null,
    	created timestamp not null,
    	PRIMARY KEY(id)
    );
    
    CREATE TABLE posts_data (
    	post_id int unsigned not null.
    	body text,
    	PRIMARY KEY(post_id)
    );
  3. That's just...yeah

    Some of his suggestions are just mind-boggling, e.g., "remove unnecessary paratheses." It really doesn't matter whether you do SELECT * FROM posts WHERE (author_id = 5 AND published = 1) or SELECT * FROM posts WHERE author_id = 5 AND published = 1. None. Any decent DBMS is going to optimize these away. This level of detail is akin to wondering when writing a C program whether the post-increment or pre-increment operator is faster. Really, if that's where you're spending your energy, it's a surprise you've written any code at all

My list

Let's see if I fare any better. I'm going to start from the most general.

  1. Benchmark, benchmark, benchmark!

    You're going to need numbers if you want to make a good decision. What queries are the worst? Where are the bottlenecks? Under what circumstances am I generating bad queries? Benchmarking is will let you simulate high-stress situations and, with the aid of profiling tools, expose the cracks in your database configuration. Tools of the trade include supersmackab, and SysBench. These tools either hit your database directly (e.g., supersmack) or simulate web traffic (e.g., ab).

  2. Profile, profile, profile!

    So, you're able to generate high-stress situations, but now you need to find the cracks. This is what profiling is for. Profiling enables you to find the bottlenecks in your configuration, whether they be in memory, CPU, network, disk I/O, or, what is more likely, some combination of all of them.

    The very first thing you should do is turn on the MySQL slow query log and installmtop. This will give you access to information about the absolute worst offenders. Have a ten-second query ruining your web application? These guys will show you the query right off.

    After you've identified the slow queries you should learn about the MySQL internal tools, like EXPLAINSHOW STATUS, and SHOW PROCESSLIST. These will tell you what resources are being spent where, and what side effects your queries are having, e.g., whether your heinous triple-join subselect query is sorting in memory or on disk. Of course, you should also be using your usual array of command-line profiling tools like top, procinfo, vmstat, etc. to get more general system performance information.

  3. Tighten Up Your Schema

    Before you even start writing queries you have to design a schema. Remember that the memory requirements for a table are going to be around #entries * size of a row. Unless you expect every person on the planet to register 2.8 trillion times on your website you do not in fact need to make your user_id column a BIGINT. Likewise, if a text field will always be a fixed length (e.g., a US zipcode, which always has a canonical representation of the form "XXXXX-XXXX") then a VARCHARdeclaration just adds a superfluous byte for every row.

    Some people poo-poo database normalization, saying it produces unecessarily complex schema. However, proper normalization results in a minimization of redundant data. Fundamentally that means a smaller overall footprint at the cost of performance — the usual performance/memory tradeoff found everywhere in computer science. The best approach, IMO, is to normalize first and denormalize where performance demands it. Your schema will be more logical and you won't be optimizing prematurely.

  4. Partition Your Tables

    Often you have a table in which only a few columns are accessed frequently. On a blog, for example, one might display entry titles in many places (e.g., a list of recent posts) but only ever display teasers or the full post bodies once on a given page. Horizontal  vertical partitioning helps:

    CREATE TABLE posts (
    	id int unsigned not null auto_increment,
    	author_id int unsigned not null,
    	title varchar(128),
    	created timestamp not null,
    	PRIMARY KEY(id)
    );
    
    CREATE TABLE posts_data (
    	post_id int unsigned not null,
    	teaser text,
    	body text,
    	PRIMARY KEY(post_id)
    );

    The above represents a situation where one is optimizing for reading. Frequently accessed data is kept in one table while infrequently accessed data is kept in another. Since the data is now partitioned the infrequently access data takes up less memory. You can also optimize for writing: frequently changed data can be kept in one table, while infrequently changed data can be kept in another. This allows more efficient caching since MySQL no longer needs to expire the cache for data which probably hasn't changed.

  5. Don't Overuse Artificial Primary Keys

    Artificial primary keys are nice because they can make the schema less volatile. If we stored geography information in the US based on zip code, say, and the zip code system suddenly changed we'd be in a bit of trouble. On the other hand, many times there are perfectly fine natural keys. One example would be a join table for many-to-many relationships. What not to do:

    CREATE TABLE posts_tags (
    	relation_id int unsigned not null auto_increment,
    	post_id int unsigned not null,
    	tag_id int unsigned not null,
    	PRIMARY KEY(relation_id),
    	UNIQUE INDEX(post_id, tag_id)
    );

    Not only is the artificial key entirely redundant given the column constraints, but the number of post-tag relations are now limited by the system-size of an integer. Instead one should do:

    CREATE TABLE posts_tags (
    	post_id int unsigned not null,
    	tag_id int unsigned not null,
    	PRIMARY KEY(post_id, tag_id)
    );
  6. Learn Your Indices

    Often your choice of indices will make or break your database. For those who haven't progressed this far in their database studies, an index is a sort of hash. If we issue the query SELECT * FROM users WHERE last_name = 'Goldstein' and last_name has no index then your DBMS must scan every row of the table and compare it to the string 'Goldstein.' An index is usually a B-tree (though there are other options) which speeds up this comparison considerably.

    You should probably create indices for any field on which you are selecting, grouping, ordering, or joining. Obviously each index requires space proportional to the number of rows in your table, so too many indices winds up taking more memory. You also incur a performance hit on write operations, since every write now requires that the corresponding index be updated. There is a balance point which you can uncover by profiling your code. This varies from system to system and implementation to implementation.

  7. SQL is Not C

    C is the canonical procedural programming language and the greatest pitfall for a programmer looking to show off his database-fu is that he fails to realize that SQL is not procedural (nor is it functional or object-oriented, for that matter). Rather than thinking in terms of data and operations on data one must think of sets of data and relationships among those sets. This usually crops up with the improper use of a subquery:

    SELECT a.id, 
    	(SELECT MAX(created) 
    	FROM posts 
    	WHERE author_id = a.id) 
    AS latest_post
    FROM authors a

    Since this subquery is correlated, i.e., references a table in the outer query, one should convert the subquery to a join.

    SELECT a.id, MAX(p.created) AS latest_post
    FROM authors a
    INNER JOIN posts p
    	ON (a.id = p.author_id)
    GROUP BY a.id
  8. Understand your engines

    MySQL has two primary storange engines: MyISAM and InnoDB. Each has its own performance characteristics and considerations. In the broadest sense MyISAM is good for read-heavy data and InnoDB is good for write-heavy data, though there are cases where the opposite is true. The biggest gotcha is how the two differ with respect to the COUNT function.

    MyISAM keeps an internal cache of table meta-data like the number of rows. This means that, generally, COUNT(*) incurs no additional cost for a well-structured query. InnoDB, however, has no such cache. For a concrete example, let's say we're trying to paginate a query. If you have a query SELECT * FROM users LIMIT 5,10, let's say, running SELECT COUNT(*) FROM users LIMIT 5,10 is essentially free with MyISAM but takes the same amount of time as the first query with InnoDB. MySQL has aSQL_CALC_FOUND_ROWS option which tells InnoDB to calculate the number of rows as it runs the query, which can then be retreived by executing SELECT FOUND_ROWS(). This is very MySQL-specific, but can be necessary in certain situations, particularly if you use InnoDB for its other features (e.g., row-level locking, stored procedures, etc.).

  9. MySQL specific shortcuts

    MySQL provides many extentions to SQL which help performance in many common use scenarios. Among these are INSERT ... SELECTINSERT ... ON DUPLICATE KEY UPDATE, and REPLACE.

    I rarely hesitate to use the above since they are so convenient and provide real performance benefits in many situations. MySQL has other keywords which are more dangerous, however, and should be used sparingly. These include INSERT DELAYED, which tells MySQL that it is not important to insert the data immediately (say, e.g., in a logging situation). The problem with this is that under high load situations the insert might be delayed indefinitely, causing the insert queue to baloon. You can also give MySQL index hints about which indices to use. MySQL gets it right most of the time and when it doesn't it is usually because of a bad scheme or poorly written query.

  10. And one for the road...
    Last, but not least, read Peter Zaitsev's MySQL Performance Blog if you're into the nitty-gritty of MySQL performance. He covers many of the finer aspects of database administration and performance.

转载于:https://my.oschina.net/jsyang888/blog/110955

<think>嗯,用户问的是关于Microsoft (R) C/C++ Optimizing Compiler Version 19.XX.XXXXX for x64的东西。首先,我需要确认这个编译器的基本信息。Microsoft的C/C++编译器版本号中的19对应的是Visual Studio的版本,比如VS2015是14.00,但有时候版本号可能有变化。可能需要查证一下版本号的对应关系。比如,VS2013是12,VS2015是14,VS2017是15,VS2019是16,VS2022是17,所以19可能不是对应的VS版本,可能用户提到的版本号是旧版的,或者有其他含义。 接下来,用户可能想知道这个编译器的用途,比如它是用来编译x64架构的程序,属于Microsoft Visual Studio的一部分。需要说明它的功能,比如优化编译器,支持C/C++标准,生成64位代码,适用于Windows平台等。 然后,用户可能需要知道如何获取和使用这个编译器。通常,这个编译器是随着Visual Studio安装的,用户需要安装相应的Visual Studio版本,并在安装时选择C++开发工作负载。另外,也可以单独安装Build Tools,不需要完整的IDE。 另外,版本号的格式通常是主版本号.次版本号.构建号.修订号,比如19.00.24215.1,其中19是主版本号,对应特定的VS版本。需要确认不同版本号对应的VS版本,可能需要指出19对应的VS版本,但根据之前的了解,19可能对应的是某个特定版本,但实际可能需要纠正用户的版本号,比如VS2015是14,而版本号19可能属于较新的版本,或者用户可能记错了版本号。这个时候可能需要说明版本号的对应关系,或者指出可能的混淆点。 另外,用户可能想知道如何查看自己安装的编译器版本,可以通过命令行运行cl.exe /Bv来查看详细版本信息。这可能是用户需要的操作步骤。 可能用户在使用过程中遇到了问题,比如编译错误,或者想确认编译器是否安装正确,因此需要指导如何验证编译器版本和安装情况。 还需要提到编译器支持的C++标准,比如C++11/14/17/20等,不同版本的编译器支持程度不同,比如VS2019开始支持C++20的部分特性,而更早的版本可能支持较少。 此外,优化编译器的特性,比如优化选项/O1、/O2等,如何影响代码性能,可能也是用户关心的内容。 总结下来,回答的结构应该包括:编译器简介、版本号与Visual Studio的对应关系、主要功能、安装与配置方法、常用命令和选项、如何查看版本信息,以及支持的标准和优化特性。需要注意使用中文,结构清晰,分点说明,同时确保技术细节准确,比如版本号的对应可能需要再次确认,避免误导用户。</think>以下是对 Microsoft C/C++ 优化编译器(x64 版本号 19.XX.XXXXX)的逐步解析: --- ### 1. **编译器简介** - **官方名称**:Microsoft (R) C/C++ Optimizing Compiler(简称 MSVC) - **定位**:微软 Visual Studio 套件中的核心编译工具链,专为 Windows 平台开发原生 x64 程序。 - **功能**: - 支持 C/C++ 代码编译、链接和优化 - 生成 64 位 (x64) 可执行文件(如 `.exe`、`.dll`) - 提供调试信息生成(`.pdb` 文件) --- ### 2. **版本号含义** - **格式**:`19.XX.XXXXX` - `19`:主版本号,对应 Visual Studio 版本(例如:VS 2019 对应版本号 `19.XX`) - `XX.XXXXX`:次版本号和构建号,表示具体更新版本(如安全补丁或功能增强) - **示例**:`19.30.30706` 表示 VS 2022 的某个更新版本(注:需结合 Visual Studio 实际版本核对) --- ### 3. **获取与安装** - **方式一**:通过 Visual Studio 安装 1. 下载 Visual Studio(社区版/专业版/企业版) 2. 选择工作负载 **使用 C++ 的桌面开发** 3. 勾选 **MSVC v143 - VS 2022 C++ x64/x86 生成工具**(版本号可能因 VS 版本而异) - **方式二**:独立安装 Build Tools - 访问 [Visual Studio Build Tools](https://visualstudio.microsoft.com/zh-hans/downloads/#build-tools-for-visual-studio-2022) 单独下载编译器工具链。 --- ### 4. **验证编译器** 在命令行中执行以下命令查看版本: ```cmd cl.exe /Bv ``` 输出示例: ``` Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30706 for x64 Copyright (C) Microsoft Corporation. All rights reserved. ...(其他编译选项信息) ``` --- ### 5. **常用编译命令** - **基础编译**: ```cmd cl /EHsc /Fe:output.exe input.cpp ``` - `/EHsc`:启用 C++ 异常处理 - `/Fe`:指定输出文件名 - **优化选项**: - `/O1`:最小化代码体积 - `/O2`:最大化代码速度 - `/Ox`:综合优化(推荐) --- ### 6. **支持标准** - **C 语言**:C11(部分支持) - **C++ 语言**:C++20(逐步支持,需启用 `/std:c++20` 编译选项) - **平台特性**:Windows API、DirectX、COM 组件等原生集成。 --- ### 7. **典型应用场景** - 开发 Windows 桌面应用、驱动程序、游戏引擎 - 高性能计算(HPC)或算法密集型程序的优化 - 与 Visual Studio IDE 无缝协作,支持调试和性能分析工具(如 Profiler) --- ### 8. **注意事项** - **兼容性**:生成的二进制文件仅支持 Windows x64 系统。 - **依赖管理**:需配置 Windows SDK 和运行时库(如 `ucrt.lib`、`kernel32.lib`)。 - **跨平台**:若需跨平台编译,建议结合 CMake 或 LLVM/Clang 工具链。 --- 如有具体使用问题(如代码优化、链接错误),可进一步提供代码片段或错误信息,以便针对性解答。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值