SolrCloud performance test

本文介绍了一个SolrCloud集群的性能测试案例,包括环境配置、索引数据准备、测试脚本设计及结果分析。测试结果显示,在进行搜索的同时进行索引操作会显著影响索引性能。

 

 

  • environment|

 

                SolrCloud servers:  X.X.X.71, X.X.X.72, X.X.X.73.     72 and 73 with Mem:16G CPU:16 core 2.4GHz; 71 with Mem:8G CPU:4 core 2.27GHz

 

                zookeeper servers: X.X.X.22, X.X.X.23, X.X.X.24    

 

                OS: Linux  x86_64 GNU/Linux

 

                tools: jmeter2.6 , youykit11.0.5

  • config & start service

      zookeeper run as default parameters and config(zookeeper start)

 

             SlorCloud configeration refer to (solrloud start)

 

           1. for X.X.X.71 add  $SOLRCLOUD_HOME/example/solr/conf/schema.xmlwith fields as:

 

       

<fields>
   <!-- Valid attributes for fields:
     name: mandatory - the name for the field
     type: mandatory - the name of a previously defined type from the
       <types> section
     indexed: true if this field should be indexed (searchable or sortable)
     stored: true if this field should be retrievable
     multiValued: true if this field may contain multiple values per document
     omitNorms: (expert) set to true to omit the norms associated with
       this field (this disables length normalization and index-time
       boosting for the field, and saves some memory).  Only full-text
       fields or fields that need an index-time boost need norms.
       Norms are omitted for primitive (non-analyzed) types by default.
     termVectors: [false] set to true to store the term vector for a
       given field.
       When using MoreLikeThis, fields used for similarity should be
       stored for best performance.
     termPositions: Store position information with the term vector.
       This will increase storage costs.
     termOffsets: Store offset information with the term vector. This
       will increase storage costs.
     required: The field is required.  It will throw an error if the
       value does not exist
     default: a value that should be used if no value is specified
       when adding a document.
   -->

        <field name="id" type="string" indexed="true" stored="true" required="true" />
        <field name="ts" type="text_general" indexed="true" stored="true"/>
        <field name="name" type="text_general" indexed="true" stored="true"/>
        <field name="age" type="text_general" indexed="true" stored="true"/>
        <field name="company" type="text_general" indexed="true" stored="true"/>
        <field name="branch" type="text_general" indexed="true" stored="true"/>
        <field name="mail" type="text_general" indexed="true" stored="true"/>
        <field name="interest" type="text_general" indexed="true" stored="true"/>
        <field name="address" type="text_general" indexed="true" stored="true"/>
        <field name="text_general" type="text_general" indexed="true" stored="false" multiValued="true" />

 </fields>

           

 

             change the jetty threads limit:

 

             open  $SOLR_HOME/example/etc/jetty.xml ,change maxThreads to 10000

<!-- =========================================================== -->
    <!-- Server Thread Pool                                          -->
    <!-- =========================================================== -->
    <Set name="ThreadPool">
      <!-- Default queued blocking threadpool -->
      <New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
        <Set name="minThreads">10</Set>
        <Set name="maxThreads">10000</Set>
        <Set name="detailedDump">false</Set>
      </New>
    </Set>

         

 

{note}

this threads num will effect the pressure test. so a big vaule is necessary


{note}

          2. cd to $SOLR_HOME/example

 

           3. start SolrCloud with command  on X.X.X.71:

java -Xmx6g -Xms6g -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -Djetty.port=8900 -DzkHost=X.X.X.22:2181,X.X.X.23:2181,X.X.X.24:2181 -DnumShards=1 -jar start.jar

           4. start SolrCloud with command on 72 and 73 

java -Xmx6g -Xms6g -Djetty.port=8900 -DzkHost=X.X.X.22:2181,X.X.X.23:2181,X.X.X.24:2181 -jar start.jar

          5. access solr admin tool http://X.X.X.71:8900/solr/   and will see follow graph:

 

 


 
           so finally got one shard and tow replicas

  • preper jmeter script & index data

 

            1.  the index request for preper index-data to solrcloud cluster look like :

 

POST http://X.X.X.72:8900/solr/collection1/update

POST data:
<add><doc>
<field name="id">4al6q90c-8ouj-1255-Sind-201206282rmo</field>
<field name="ts">1340859312956</field>
<field name="name">byan</field>
<field name="age">21</field>
<field name="company">Ciscobyan Systemsbyan, Incbyan</field>
<field name="branch">Cloudbyan Applicationbyan Servicesbyan</field>
<field name="mail">byan@cisco.com</field>
<field name="interest">Have intensive interest in Internet-surfingbyan,singingbyan, writingbyan and readingbyan </field>
<field name="address">abyan, Gatebyan Buidingbyan Streetbyan Provincebyan Contrybyan</field>
</doc></add>

[no cookies]

Request Headers:
Content-Length: 598
Connection: keep-alive
Content-Type: application/xml


 

      document for indexing:

<add><doc>
  <field name="id">c7${id_counter}</field>
  <field name="ts">1342463467567</field>
  <field name="name">person</field>
  <field name="age">10</field>
  <field name="company">Cisco Systems, Inc.</field>
  <field name="branch">Cloud Application Services</field>
  <field name="mail">person@cisco.com</field>
  <field name="interest">Have intensive interest in Internet-surfing,singing, writing and reading.</field>
  <field name="address">address,The Golden Gate Bridge,Wall Street.</field>
</doc></add>

     each doc have a size about 300 bytes

 


             the detail of this script can find here:solr_prepare_data_cluster.jmx 

 

         2. after got 15G index data with more than 20 million docs, we are ready to test the performance ofSolrCloud

 

         
 

  •  Test & Result

 

{note}

all test with NRT availible


{note}

    one or more clients to request to X.X.X.71-73 randomly.  this test case with 1 shard and tow replicas 
      1. index performance test (without search): index jmeter script solr_index_cluster.jmx

 

        index result:

request thread numsavg response timethroughput
50104606
150324612
300654445

      2. search performance test(without index):search jmeter script  solr_search_cluster.jmx 

request thread numsavg response timethroughput
50251870
150741923
3001561709

      3. index performance test (with search)

 

     index  result:

request thread numsavg response timethroughput
50261854
150741920
3001511856

 at the same time the search result:

request thread numsavg response timethroughput
50281481
150761605
3001391470


  •  Summary

 

      the performance of indexing  without searching is not bad. But the performance of indexing while searching running is not good. It's look like the searching effect strongly on the performance of indexing. we are digging deeply to find why .

 

<think>好的,用户现在需要在performance_test这个schema下建表。之前他们已经成功修改了这个schema的所有者为insight-gp7_vops,现在可能是在进行后续的数据库操作。我需要先回顾之前的对话,确保了解上下文。 首先,用户之前执行了修改schema所有者的操作,现在可能需要进行实际的建表操作来测试或应用他们的数据库结构。用户可能是开发人员或者数据库管理员,他们的需求可能涉及到在特定schema下创建表格以进行性能测试或其他用途。 接下来,我需要考虑用户可能的需求。建表的基本语法他们可能已经知道,但可能需要确认具体的步骤,比如如何指定schema,确保权限正确,以及可能的注意事项。此外,用户可能还关心如何验证表是否创建成功,或者如何设置存储参数优化性能,尤其是在Greenplum这样的分布式数据库中。 用户可能没有明确提到的深层需求包括:是否需要分布式表的分布键选择,是否涉及分区表,或者是否需要使用特定的存储格式(如列存储)。此外,权限问题也是一个关键点,特别是当schema的所有者变更后,新用户是否有足够的权限来创建表。 我需要确保回答覆盖这些方面,包括基础建表命令、指定schema的方法、权限检查、验证步骤,以及高级选项如分布键和分区表。同时,提供常见问题的解决方法,比如权限不足或表已存在的错误处理,这样用户在实际操作中遇到问题时可以快速解决。 另外,用户可能是在使用Greenplum或PostgreSQL,因此需要确保语法和示例符合这些数据库的特点。例如,Greenplum中的DISTRIBUTED BY子句和PostgreSQL的语法差异需要注意。同时,强调在分布式环境中选择合适分布键的重要性,以避免数据倾斜和性能问题。 最后,考虑到用户之前已经修改了schema的所有者,可能需要提醒他们在建表后检查表的owner是否正确,以及是否需要调整其他对象的权限。这有助于确保整个数据库结构的权限一致性和安全性。</think>以下是在 `performance_test` schema 下创建表的完整指南,包含基础操作、语法规范和进阶技巧: --- ### 一、基础建表操作 #### 1. 显式指定schema ```sql CREATE TABLE performance_test.table_name ( id INT PRIMARY KEY, create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP, data VARCHAR(255) ) DISTRIBUTED BY (id); -- Greenplum专用分布键设置 ``` #### 2. 临时设置默认schema ```sql SET search_path = performance_test; CREATE TABLE table_name (...); ``` --- ### 二、关键语法元素 | 组件 | 示例 | 说明 | |---------------|-------------------------------|----------------------------------------------------------------------| | 列定义 | `price NUMERIC(10,2) NOT NULL` | 支持所有标准数据类型 | | 约束 | `CHECK (price > 0)` | 可添加主键、外键、检查约束等 | | 分布策略 | `DISTRIBUTED RANDOMLY` | Greenplum特有,可选`BY (col)`或`RANDOMLY` | | 分区策略 | `PARTITION BY RANGE (date)` | 支持范围分区、列表分区 | | 存储参数 | `WITH (APPENDONLY=true)` | 可设置压缩、行列存储等 | --- ### 三、进阶建表示例 #### 1. 带分区的分布式表(Greenplum) ```sql CREATE TABLE performance_test.sensor_data ( sensor_id INT, log_time TIMESTAMP, reading FLOAT ) DISTRIBUTED BY (sensor_id) PARTITION BY RANGE (log_time) ( START (&#39;2024-01-01&#39;) INCLUSIVE, END (&#39;2025-01-01&#39;) EXCLUSIVE, EVERY (INTERVAL &#39;1 month&#39;) ); ``` #### 2. 列存压缩表 ```sql CREATE TABLE performance_test.large_table ( id BIGINT, json_data TEXT ) WITH (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zstd) DISTRIBUTED RANDOMLY; ``` --- ### 四、权限验证 #### 1. 检查当前用户权限 ```sql SELECT has_schema_privilege(&#39;performance_test&#39;, &#39;CREATE&#39;); -- 返回 t 表示有建表权限 ``` #### 2. 授权语句(需管理员执行) ```sql GRANT CREATE ON SCHEMA performance_test TO current_user; GRANT ALL ON ALL TABLES IN SCHEMA performance_test TO role_name; ``` --- ### 五、验证操作 #### 1. 查看schema下所有表 ```sql \dt performance_test.* ``` #### 2. 查看表结构 ```sql \d+ performance_test.table_name ``` #### 3. 查看分布信息(Greenplum) ```sql SELECT localoid::regclass, policytype FROM gp_distribution_policy WHERE localoid = &#39;performance_test.table_name&#39;::regclass; ``` --- ### 六、常见错误处理 #### 错误1:权限不足 ```sql ERROR: permission denied for schema performance_test ``` **解决方案**: ```sql -- 管理员执行 GRANT USAGE, CREATE ON SCHEMA performance_test TO user_name; ``` #### 错误2:表已存在 ```sql ERROR: relation "table_name" already exists ``` **解决方案**: ```sql CREATE TABLE IF NOT EXISTS performance_test.table_name (...); -- 或 DROP TABLE IF EXISTS performance_test.table_name; ``` --- ### 七、最佳实践建议 1. **分布键选择** - 优先选择高基数列 - 避免选择可能产生数据倾斜的列 - 常用在JOIN操作的关联列上 2. **存储策略** | 场景 | 推荐配置 | |-----------------------|----------------------------------| | 高频插入 | 行存储 + 不压缩 | | 分析型查询 | 列存储 + Zstandard压缩 | | 混合负载 | 行存储 + 轻量级压缩(zlib) | 3. **性能调优参数** ```sql CREATE TABLE ... WITH ( FILLFACTOR = 70, -- 预留更新空间 PARALLEL_WORKERS = 8, -- 并行度 AUTOVACUUM_ENABLED = false -- 关闭自动维护 ); ``` --- 通过以上步骤可安全创建表结构,建议结合`EXPLAIN ANALYZE`进行执行计划分析。对于超大规模表(10TB+),建议采用分阶段建表: ```sql -- 第一阶段:创建基础结构 CREATE TABLE performance_test.big_table (...) WITH (AUTOVACUUM_ENABLED=false); -- 第二阶段:批量加载数据 COPY ...; -- 第三阶段:创建索引/约束 CREATE INDEX ...; ALTER TABLE ... ADD CONSTRAINT ...; ```
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值