SolrCloud Capability Test

最新推荐文章于 2021-09-07 21:52:05 发布

amongdata

最新推荐文章于 2021-09-07 21:52:05 发布

阅读量1.5k

点赞数

分类专栏：搜索引擎文章标签： solrcloud 性能测试

本文链接：https://blog.youkuaiyun.com/porui/article/details/9255279

版权

搜索引擎专栏收录该内容

16 篇文章

订阅专栏

solrcloud 测试的一些中间产物。原记录在内部wiki上，现share出来

environment|

SolrCloud servers: X.X.X.251, X.X.X.252, X.X.X.253. each with Mem:16G CPU:8 core 2.57GHz;

zookeeper servers: X.X.X.22, X.X.X.23, X.X.X.24

OS: Linux x86_64 GNU/Linux

tools: jmeter2.6 , youykit11.0.5

config & start service

zookeeper run as default parameters and config(zookeeper start)

1. SlorCloud configeration refer to (solrloud start)

a). for X.X.X.253 add $SOLRCLOUD_HOME/example/solr/conf/schema.xml with fields as:

<fields>

<!−− Valid attributes for fields:

name: mandatory − the name for the field

type: mandatory − the name of a previously defined type from the

<types> section

indexed: true if this field should be indexed (searchable or sortable)

stored: true if this field should be retrievable

multiValued: true if this field may contain multiple values per document

omitNorms: (expert) set to true to omit the norms associated with

this field ( this disables length normalization and index−time

boosting for the field, and saves some memory). Only full−text

fields or fields that need an index−time boost need norms.

Norms are omitted for primitive (non−analyzed) types by default .

termVectors: false set to true to store the term vector for a

given field.

When using MoreLikeThis, fields used for similarity should be

stored for best performance.

termPositions: Store position information with the term vector.

This will increase storage costs.

termOffsets: Store offset information with the term vector. This

will increase storage costs.

required: The field is required. It will throw an error if the

value does not exist

default : a value that should be used if no value is specified

when adding a document.

−−>

<field name= "id" type= "string" indexed= "true" stored= "true" required= "true" />

<field name= "ts" type= "text_general" indexed= "true" stored= "true" />

<field name= "name" type= "text_general" indexed= "true" stored= "true" />

<field name= "age" type= "text_general" indexed= "true" stored= "true" />

<field name= "company" type= "text_general" indexed= "true" stored= "true" />

<field name= "branch" type= "text_general" indexed= "true" stored= "true" />

<field name= "mail" type= "text_general" indexed= "true" stored= "true" />

<field name= "interest" type= "text_general" indexed= "true" stored= "true" />

<field name= "address" type= "text_general" indexed= "true" stored= "true" />

<field name= "text_general" type= "text_general" indexed= "true" stored= "false" multiValued= "true" />

</fields>

b). change the jetty threads limit:

open $SOLR_HOME/example/etc/jetty.xml ,change maxThreads to 10000

<!−− =========================================================== −−>

<!−− Server Thread Pool −−>

<!−− =========================================================== −−>

<Set name= "ThreadPool" ><!−− Default queued blocking threadpool −−>

<New>

<Set name= "minThreads" > 10 </Set>

<Set name= "maxThreads" > 10000 </Set>

<Set name= "detailedDump" > false </Set>

</New></Set>

this threads num will effect the pressure test. so a big vaule is necessary

c). config the $SOLR_HOME/example/solr/solr.xml on 3 servers for core.

change the last default lines

<cores adminPath= "/admin/cores" defaultCoreName= "collection1" host= "${host:}" hostPort= "${jetty.port:}" >

<core name= "collection1" instanceDir= "." />

</cores>

to:

<cores defaultCoreName= "collection1" adminPath= "/admin/cores" hostPort= "${jetty.port:}" >

<core schema= "schema.xml" shard= "shard1" instanceDir= "core1/" name= "core1" collection= "collection1" conf= "solrconfig.xml" />

<core schema= "schema.xml" shard= "shard2" instanceDir= "core2/" name= "core2" collection= "collection1" conf= "solrconfig.xml" />

<core schema= "schema.xml" shard= "shard3" instanceDir= "core3/" name= "core3" collection= "collection1" conf= "solrconfig.xml" />

<core schema= "schema.xml" shard= "shard4" instanceDir= "core4/" name= "core4" collection= "collection1" conf= "solrconfig.xml" />

<core schema= "schema.xml" shard= "shard5" instanceDir= "core5/" name= "core5" collection= "collection1" conf= "solrconfig.xml" />

<core schema= "schema.xml" shard= "shard6" instanceDir= "core6/" name= "core6" collection= "collection1" conf= "solrconfig.xml" />

<core schema= "schema.xml" shard= "shard7" instanceDir= "core7/" name= "core7" collection= "collection1" conf= "solrconfig.xml" />

<core schema= "schema.xml" shard= "shard8" instanceDir= "core8/" name= "core8" collection= "collection1" conf= "solrconfig.xml" />

<core schema= "schema.xml" shard= "shard9" instanceDir= "core9/" name= "core9" collection= "collection1" conf= "solrconfig.xml" />

<core schema= "schema.xml" shard= "shard10" instanceDir= "core10/" name= "core10" collection= "collection1" conf= "solrconfig.xml" />

</cores>

2. cd to $SOLR_HOME/example

3. start SolrCloud on X.X.X.253 first with command:

nohup java −Dcom.sun.management.jmxremote −Dcom.sun.management.jmxremote.port= 3000 −Dcom.sun.management.jmxremote.ssl= false −Dcom.sun.management.jmxremote.authenticate= false

−Xmx10g −Xms10g −Dbootstrap_confdir=./solr/conf −Dcollection.configName=myconf

−Djetty.port= 8900 −DzkHost=X.X.X. 22 : 2181 ,X.X.X. 23 : 2181 ,X.X.X. 24 : 2181 −jar start.jar >> solr.log 2 >& 1 &

JVM param :

−Dcom.sun.management.jmxremote −Dcom.sun.management.jmxremote.port= 3000 −Dcom.sun.management.jmxremote.ssl= false −Dcom.sun.management.jmxremote.authenticate= false

make solr JMX available. We can use jvisualvm.exe connect to X.X.X.253 to view the status of Solr.

4. start SolrCloud on X.X.X.251 and X.X.X.252 with command:

nohup java −agentpath:/home/hans/yjp− 11.0 . 7 /bin/linux−x86− 64 /libyjpagent.so=disablestacktelemetry,disableexceptiontelemetry,builtinprobes=none,delay= 10000 ,sessionname=JBoss

−Xmx10g −Xms10g

−Djetty.port= 8900 −DzkHost=X.X.X. 22 : 2181 ,X.X.X. 23 : 2181 ,X.X.X. 24 : 2181 −jar start.jar >> solr.log 2 >& 1 &

JVM param:

−agentpath:/home/hans/yjp− 11.0 . 7 /bin/linux−x86− 64 /libyjpagent.so=disablestacktelemetry,disableexceptiontelemetry,builtinprobes=none,delay= 10000 ,sessionname=JBoss

make yourkit can connect to X.X.X.251 and X.X.X.252 to profiling. How to use yourkit?

5. access solr admin tool http://X.X.X.252:8900/solr/#/~cloud?view=graph will see follow graph:

so finally we got 10 shards and each with tow replicas------total 30 slices

preper jmeter script & index data

1. the index request for preper index-data to solrcloud cluster look like :

POST http: //X.X.X.252:8900/solr/collection1/update

POST data:

<add><doc>

<field name= "id" >4al6q90c−8ouj− 1255 −Sind−201206282rmo</field>

<field name= "ts" > 1340859312956 </field>

<field name= "name" >byan</field>

<field name= "age" > 21 </field>

<field name= "company" >Ciscobyan Systemsbyan, Incbyan</field>

<field name= "branch" >Cloudbyan Applicationbyan Servicesbyan</field>

<field name= "mail" >byan @cisco .com</field>

<field name= "interest" >Have intensive interest in Internet−surfingbyan,singingbyan, writingbyan and readingbyan </field>

<field name= "address" >abyan, Gatebyan Buidingbyan Streetbyan Provincebyan Contrybyan</field>

</doc></add>

no cookies

Request Headers:

Content−Length: 598

Connection: keep−alive

Content−Type: application/xml

document for indexing:<add><doc>

<field name= "id" >c7$

{id_counter}

</field>

<field name= "ts" > 1342463467567 </field>

<field name= "name" >person</field>

<field name= "age" > 10 </field>

<field name= "company" >Cisco Systems, Inc.</field>

<field name= "branch" >Cloud Application Services</field>

<field name= "mail" >person @cisco .com</field>

<field name= "interest" >Have intensive interest in Internet−surfing,singing, writing and reading.</field>

<field name= "address" >address,The Golden Gate Bridge,Wall Street.</field>

</doc></add>

each doc have a size about 300 bytes

the detail of this script can find here

2. we plan index about 50G data for each shard of SolrCloud for this test

The First Issue

We open 10 jemeter script for indexing. it about 2M data was indexed per second. after we got about 40G index data in all shards. the old GC region can't be collected though it is full. and some other data post down:

Solr Admin of X.X.X.252:

GC status:

heap dump by yourkit:

1. class list and objects

2. object

3. the biggest Objets:

4. The corresponding source code:

the class TokenizerChain

IndexSchema hava two members

private Analyzer analyzer;

private Analyzer queryAnalyzer;

a). Solr class IndexSchema's constructor load the FieldTypePluginLoader then set SolrIndexAnalyzer as the indexSchema's Analyzer with Function refreshAnalyzers().

public void refreshAnalyzers()

{

analyzer = new SolrIndexAnalyzer();

queryAnalyzer = new SolrQueryAnalyzer();

}

private class SolrIndexAnalyzer extends AnalyzerWrapper {

protected final HashMap<String, Analyzer> analyzers;

SolrIndexAnalyzer() {

analyzers = analyzerCache();

}

protected HashMap<String, Analyzer> analyzerCache() {

HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();

for (SchemaField f : getFields().values()) {

Analyzer analyzer = f.getType().getAnalyzer();

cache.put(f.getName(), analyzer);

}

return cache;

}

……

}

but for AnalyzerWrapper

public abstract class AnalyzerWrapper extends Analyzer {

/**

* Creates a new AnalyzerWrapper. Since the {@link Analyzer.ReuseStrategy} of

* the wrapped Analyzers are unknown, {@link Analyzer.PerFieldReuseStrategy} is assumed

*/

protected AnalyzerWrapper() {

super ( new PerFieldReuseStrategy());

}

……

}

this PerFieldReuseStrategy catch annalyzer by FieldName rather than FieldType. this policy would cause redundant resouce be used.

change to catch analyzer by FieldType

We see the class TokenStreamComponents hold most of the memory. there are more than 1000 TokenStreamComponents object . By reviewing the code we find luncene Analyzier create this objects and cache them for reuse.