这里是一些通用的优化项
If you have control over the format of the data to be loaded, then you can use the following hints to improve load performance:
- Make logical record processing efficient.
- Use one-to-one mapping of physical records to logical records (avoid using CONTINUEIF and CONCATENATE).
- Make it easy for the software to identify physical record boundaries. Use the file processing option string "FIX nnn" or "VAR". If you use the default (stream mode), then on most platforms (for example, UNIX and NT) the loader must scan each physical record for the record terminator (newline character).
- Make field setting efficient. Field setting is the process of mapping fields in the data file to their corresponding columns in the table being loaded. The mapping function is controlled by the description of the fields in the control file. Field setting (along with data conversion) is the biggest consumer of CPU cycles for most loads.
- Avoid delimited fields; use positional fields. If you use delimited fields, then the loader must scan the input data to find the delimiters. If you use positional fields, then field setting becomes simple pointer arithmetic (very fast).
- Do not trim whitespace if you do not need to (use PRESERVE BLANKS).
- Make conversions efficient. SQL*Loader performs character set conversion and data type conversion for you. Of course, the quickest conversion is no conversion.
- Use single-byte character sets if you can.
- Avoid character set conversions if you can. SQL*Loader supports four character sets:
Client character set (NLS_LANG of the client sqlldr process)
Data file character set (usually the same as the client character set)
Database character set
Database national character set
Performance is optimized if all character sets are the same. For direct path loads, it is best if the data file character set and the database character set are the same. If the character sets are the same, then character set conversion buffers are not allocated.
- Use direct path loads.
- Use the SORTED INDEXES clause.
- Avoid unnecessary NULLIF and DEFAULTIF clauses. Each clause must be evaluated on each column that has a clause associated with it for every row loaded.
- Use parallel direct path loads and parallel index creation when you can.
- Be aware of the effect on performance when you have large values for both the CONCATENATE clause and the COLUMNARRAYROWS clause. See "Using CONCATENATE to Assemble Logical Records".
Optimizing Performance of Direct Path Loads
You can control the time and temporary storage used during direct path loads.
To minimize time:
- Preallocate storage space
- Presort the data
- Perform infrequent data saves
- Minimize use of the redo log
- Specify the number of column array rows and the size of the stream buffer
- Specify a date cache value
- Set DB_UNRECOVERABLE_SCN_TRACKING=FALSE. Unrecoverable (nologging) direct writes are tracked in the control file by periodically storing the SCN and Time of the last direct write. If these updates to the control file are adversely affecting performance, then setting the DB_UNRECOVERABLE_SCN_TRACKING parameter to FALSE may improve performance.
To minimize space:
- When sorting data before the load, sort data on the index that requires the most temporary storage space
- Avoid index maintenance during the load
- Preallocating Storage for Faster Loading
即创建导入表时使用INITIAL预分配更多的空间。也可以使用更大的extent size
要导入数据按表的索引先进行排序,这样会使用更少的临时空间
此选项只用于direct path, 表示数据已是哪个索引排序了(这里指定的索引必须在导入表中已创建),但如果实际数据并未排序则在导入后索引后仍为unusable状态,需要重建
如果是多列索引数据排序的顺序一定要按列顺序排序,如索引为(col1,col2),则排序应order by col1,col2
如果表有多个索引,你要选择数据与数据量大的索引保持一致以达到提高性能
Rows设置较小值影响性能及造成空间浪费,rows指定的行数要最少15分钟才完成。设置前你先在计算平均每分钟导入多少行,For example, if you can load 20,000 rows per minute, and you do not want to repeat more than 15 minutes of work after an interruption, then set ROWS to be 300,000 (20,000 rows/minute * 15 minutes).
- Minimizing Use of the Redo Log
三种方式:
- 关闭归档
- 使用UNRECOVERABLE clause
An unrecoverable load does not record loaded data in the redo log file; instead, it generates invalidation redo. you may want to make a backup of the data after loading
UNRECOVERABLE
LOAD DATA
INFILE 'sample.dat'
INTO TABLE emp
(ename VARCHAR2(10), empno NUMBER(4));
- 使用nologging
Use the COLUMNARRAYROWS parameter to specify a value for the number of column array rows.
Use the STREAMSIZE parameter to specify the size for direct path stream buffers.
The optimal values for these parameters vary, depending on the system, input data types, and Oracle column data types used. When you are using optimal values for your particular configuration, the elapsed time in the SQL*Loader log file should go down.
You should monitor process paging activity, because if paging becomes excessive, then performance can be significantly degraded. You may need to lower the values for READSIZE, STREAMSIZE, and COLUMNARRAYROWS to avoid excessive paging.
It can be particularly useful to specify the number of column array rows and size of the stream buffer when you perform direct path loads on multiple-CPU systems.
- Specifying a Value for DATE_CACHE
The date cache reduces the number of date conversions done when many duplicate values are present in the input data. It enables you to specify the number of unique dates anticipated during the load.
The default date cache size is 1000 elements. If the default is used and the number of unique input values loaded exceeds 1000, then the date cache is automatically disabled for that table. This prevents excessive and unnecessary lookup times that could affect performance.
The date cache can be associated with only one table. No date cache sharing can take place across tables.
Date cache statistics are written to the log file. You can use those statistics to improve direct path load performance as follows:
- If the number of cache entries is less than the cache size and there are no cache misses, then the cache size could safely be set to a smaller value.
- If the number of cache hits (entries for which there are duplicate values) is small and the number of cache misses is large, then the cache size should be increased. Be aware that if the cache size is increased too much, then it may cause other problems, such as excessive paging or too much memory usage.
- If most of the input date values are unique, then the date cache will not enhance performance and therefore should not be used.
Date cache statistics are not written to the SQL*Loader log file if the cache was active by default and disabled because the maximum was exceeded.
If increasing the cache size does not improve performance, then revert to the default behavior or set the cache size to 0. The overall performance improvement also depends on the data types of the other columns being loaded. Improvement will be greater for cases in which the total number of date columns loaded is large compared to other types of data loaded.
- Optimizing Direct Path Loads on Multiple-CPU Systems
If you are performing direct path loads on a multiple-CPU system, then SQL*Loader uses multithreading by default. A multiple-CPU system in this case is defined as a single system that has two or more CPUs.
Multithreaded loading means that, when possible, conversion of the column arrays to stream buffers and stream buffer loading are performed in parallel. This optimization works best when:
- Column arrays are large enough to generate multiple direct path stream buffers for loads
- Data conversions are required from input field data types to Oracle column data types
- The conversions are performed in parallel with stream buffer loading.
The status of this process is recorded in the SQL*Loader log file, as shown in the following sample portion of a log:
Total stream buffers loaded by SQL*Loader main thread: 47
Total stream buffers loaded by SQL*Loader load thread: 180
Column array rows: 1000
Stream buffer bytes: 256000
In this example, the SQL*Loader load thread has offloaded the SQL*Loader main thread, allowing the main thread to build the next stream buffer while the load thread loads the current stream on the server.
The goal is to have the load thread perform as many stream buffer loads as possible. This can be accomplished by increasing the number of column array rows, decreasing the stream buffer size, or both. You can monitor the elapsed time in the SQL*Loader log file to determine whether your changes are having the desired effect.
On single-CPU systems, optimization is turned off by default. When the server is on another system, performance may improve if you manually turn on multithreading.
- Avoiding Index Maintenance
对于conventional path与direct path都会对索引进行维护,可以使用以下方式避免维护:
- Drop the indexes before beginning of the load.
- Mark selected indexes or index partitions as Index Unusable before beginning the load and use the SKIP_UNUSABLE_INDEXES parameter.
- Use the SKIP_INDEX_MAINTENANCE parameter (direct path only, use with caution).
这样做的好处:
- You can build indexes one at a time, reducing the amount of sort (temporary) segment space that would otherwise be needed for each index.
- Only one index segment exists when an index is built, instead of the three segments that temporarily exist when the new keys are merged into the old index to make the new index.
对于导入大数少量数据时要使用conventional path
Avoiding index maintenance is quite reasonable when the number of rows to be loaded is large compared to the size of the table. But if relatively few rows are added to a large table, then the time required to resort the indexes may be excessive. In such cases, it is usually better to use the conventional path load method, or to use the SINGLEROW parameter of SQL*Loader.
本文详细介绍了如何通过优化数据格式、字段设置、转换效率、使用直接路径加载、预分配存储、排序数据、减少重做日志使用等方法提升SQL*Loader的数据加载性能。强调了字符集匹配、避免不必要的NULLIF和DEFAULTIF子句、使用单字节字符集以及利用多线程加载在多CPU系统上的优势。同时,建议在适当情况下关闭索引维护以加速加载过程。
15万+

被折叠的 条评论
为什么被折叠?



