solrcloud 测试的一些中间产物。原记录在内部wiki上,现share出来
-
environment|
SolrCloud servers: X.X.X.251, X.X.X.252, X.X.X.253. each with Mem:16G CPU:8 core 2.57GHz;
zookeeper servers: X.X.X.22, X.X.X.23, X.X.X.24
OS: Linux x86_64 GNU/Linux
tools: jmeter2.6 , youykit11.0.5
1. SlorCloud configeration refer to (solrloud start)
a). for X.X.X.253 add $SOLRCLOUD_HOME/example/solr/conf/schema.xml with fields as:
<fields>
<!−− Valid attributes
for
fields:
name: mandatory − the name
for
the field
type: mandatory − the name of a previously defined type from the
<types> section
indexed:
true
if
this
field should be indexed (searchable or sortable)
stored:
true
if
this
field should be retrievable
multiValued:
true
if
this
field may contain multiple values per document
omitNorms: (expert) set to
true
to omit the norms associated with
this
field (
this
disables length normalization and index−time
boosting
for
the field, and saves some memory). Only full−text
fields or fields that need an index−time boost need norms.
Norms are omitted
for
primitive (non−analyzed) types by
default
.
termVectors:
false
set to
true
to store the term vector
for
a
given field.
When using MoreLikeThis, fields used
for
similarity should be
stored
for
best performance.
termPositions: Store position information with the term vector.
This will increase storage costs.
termOffsets: Store offset information with the term vector. This
will increase storage costs.
required: The field is required. It will
throw
an error
if
the
value does not exist
default
: a value that should be used
if
no value is specified
when adding a document.
−−>
<field name=
"id"
type=
"string"
indexed=
"true"
stored=
"true"
required=
"true"
/>
<field name=
"ts"
type=
"text_general"
indexed=
"true"
stored=
"true"
/>
<field name=
"name"
type=
"text_general"
indexed=
"true"
stored=
"true"
/>
<field name=
"age"
type=
"text_general"
indexed=
"true"
stored=
"true"
/>
<field name=
"company"
type=
"text_general"
indexed=
"true"
stored=
"true"
/>
<field name=
"branch"
type=
"text_general"
indexed=
"true"
stored=
"true"
/>
<field name=
"mail"
type=
"text_general"
indexed=
"true"
stored=
"true"
/>
<field name=
"interest"
type=
"text_general"
indexed=
"true"
stored=
"true"
/>
<field name=
"address"
type=
"text_general"
indexed=
"true"
stored=
"true"
/>
<field name=
"text_general"
type=
"text_general"
indexed=
"true"
stored=
"false"
multiValued=
"true"
/>
</fields>
|
b). change the jetty threads limit:
open $SOLR_HOME/example/etc/jetty.xml ,change maxThreads to 10000
<!−− =========================================================== −−>
<!−− Server Thread Pool −−>
<!−− =========================================================== −−>
<Set name=
"ThreadPool"
><!−− Default queued blocking threadpool −−>
<New>
<Set name=
"minThreads"
>
10
</Set>
<Set name=
"maxThreads"
>
10000
</Set>
<Set name=
"detailedDump"
>
false
</Set>
</New></Set>
|
![]() | this threads num will effect the pressure test. so a big vaule is necessary |
c). config the $SOLR_HOME/example/solr/solr.xml on 3 servers for core.
change the last default lines
<cores adminPath=
"/admin/cores"
defaultCoreName=
"collection1"
host=
"${host:}"
hostPort=
"${jetty.port:}"
>
<core name=
"collection1"
instanceDir=
"."
/>
</cores>
|
to:
<cores defaultCoreName=
"collection1"
adminPath=
"/admin/cores"
hostPort=
"${jetty.port:}"
>
<core schema=
"schema.xml"
shard=
"shard1"
instanceDir=
"core1/"
name=
"core1"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
<core schema=
"schema.xml"
shard=
"shard2"
instanceDir=
"core2/"
name=
"core2"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
<core schema=
"schema.xml"
shard=
"shard3"
instanceDir=
"core3/"
name=
"core3"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
<core schema=
"schema.xml"
shard=
"shard4"
instanceDir=
"core4/"
name=
"core4"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
<core schema=
"schema.xml"
shard=
"shard5"
instanceDir=
"core5/"
name=
"core5"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
<core schema=
"schema.xml"
shard=
"shard6"
instanceDir=
"core6/"
name=
"core6"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
<core schema=
"schema.xml"
shard=
"shard7"
instanceDir=
"core7/"
name=
"core7"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
<core schema=
"schema.xml"
shard=
"shard8"
instanceDir=
"core8/"
name=
"core8"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
<core schema=
"schema.xml"
shard=
"shard9"
instanceDir=
"core9/"
name=
"core9"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
<core schema=
"schema.xml"
shard=
"shard10"
instanceDir=
"core10/"
name=
"core10"
collection=
"collection1"
conf=
"solrconfig.xml"
/>
</cores>
|
2. cd to $SOLR_HOME/example
3. start SolrCloud on X.X.X.253 first with command:
nohup java −Dcom.sun.management.jmxremote −Dcom.sun.management.jmxremote.port=
3000
−Dcom.sun.management.jmxremote.ssl=
false
−Dcom.sun.management.jmxremote.authenticate=
false
−Xmx10g −Xms10g −Dbootstrap_confdir=./solr/conf −Dcollection.configName=myconf
−Djetty.port=
8900
−DzkHost=X.X.X.
22
:
2181
,X.X.X.
23
:
2181
,X.X.X.
24
:
2181
−jar start.jar >> solr.log
2
>&
1
&
|
JVM param :
−Dcom.sun.management.jmxremote −Dcom.sun.management.jmxremote.port=
3000
−Dcom.sun.management.jmxremote.ssl=
false
−Dcom.sun.management.jmxremote.authenticate=
false
|
make solr JMX available. We can use jvisualvm.exe connect to X.X.X.253 to view the status of Solr.
4. start SolrCloud on X.X.X.251 and X.X.X.252 with command:
nohup java −agentpath:/home/hans/yjp−
11.0
.
7
/bin/linux−x86−
64
/libyjpagent.so=disablestacktelemetry,disableexceptiontelemetry,builtinprobes=none,delay=
10000
,sessionname=JBoss
−Xmx10g −Xms10g
−Djetty.port=
8900
−DzkHost=X.X.X.
22
:
2181
,X.X.X.
23
:
2181
,X.X.X.
24
:
2181
−jar start.jar >> solr.log
2
>&
1
&
|
JVM param:
−agentpath:/home/hans/yjp−
11.0
.
7
/bin/linux−x86−
64
/libyjpagent.so=disablestacktelemetry,disableexceptiontelemetry,builtinprobes=none,delay=
10000
,sessionname=JBoss
|
make yourkit can connect to X.X.X.251 and X.X.X.252 to profiling. How to use yourkit?
5. access solr admin tool http://X.X.X.252:8900/solr/#/~cloud?view=graph will see follow graph:
so finally we got 10 shards and each with tow replicas------total 30 slices
1. the index request for preper index-data to solrcloud cluster look like :
POST http:
//X.X.X.252:8900/solr/collection1/update
POST data:
<add><doc>
<field name=
"id"
>4al6q90c−8ouj−
1255
−Sind−201206282rmo</field>
<field name=
"ts"
>
1340859312956
</field>
<field name=
"name"
>byan</field>
<field name=
"age"
>
21
</field>
<field name=
"company"
>Ciscobyan Systemsbyan, Incbyan</field>
<field name=
"branch"
>Cloudbyan Applicationbyan Servicesbyan</field>
<field name=
"mail"
>byan
@cisco
.com</field>
<field name=
"interest"
>Have intensive interest in Internet−surfingbyan,singingbyan, writingbyan and readingbyan </field>
<field name=
"address"
>abyan, Gatebyan Buidingbyan Streetbyan Provincebyan Contrybyan</field>
</doc></add>
no cookies
Request Headers:
Content−Length:
598
Connection: keep−alive
Content−Type: application/xml
document
for
indexing:<add><doc>
<field name=
"id"
>c7$
{id_counter}
</field>
<field name=
"ts"
>
1342463467567
</field>
<field name=
"name"
>person</field>
<field name=
"age"
>
10
</field>
<field name=
"company"
>Cisco Systems, Inc.</field>
<field name=
"branch"
>Cloud Application Services</field>
<field name=
"mail"
>person
@cisco
.com</field>
<field name=
"interest"
>Have intensive interest in Internet−surfing,singing, writing and reading.</field>
<field name=
"address"
>address,The Golden Gate Bridge,Wall Street.</field>
</doc></add>
|
each doc have a size about 300 bytes
the detail of this script can find here
2. we plan index about 50G data for each shard of SolrCloud for this test
We open 10 jemeter script for indexing. it about 2M data was indexed per second. after we got about 40G index data in all shards. the old GC region can't be collected though it is full. and some other data post down:
Solr Admin of X.X.X.252:
GC status:
heap dump by yourkit:
1. class list and objects
2. object
3. the biggest Objets:
4. The corresponding source code:
the class TokenizerChain
IndexSchema hava two members
private
Analyzer analyzer;
private
Analyzer queryAnalyzer;
|
a). Solr class IndexSchema's constructor load the FieldTypePluginLoader then set SolrIndexAnalyzer as the indexSchema's Analyzer with Function refreshAnalyzers().
public
void
refreshAnalyzers()
{
analyzer =
new
SolrIndexAnalyzer();
queryAnalyzer =
new
SolrQueryAnalyzer();
}
|
private
class
SolrIndexAnalyzer
extends
AnalyzerWrapper {
protected
final
HashMap<String, Analyzer> analyzers;
SolrIndexAnalyzer() {
analyzers = analyzerCache();
}
protected
HashMap<String, Analyzer> analyzerCache() {
HashMap<String, Analyzer> cache =
new
HashMap<String, Analyzer>();
for
(SchemaField f : getFields().values()) {
Analyzer analyzer = f.getType().getAnalyzer();
cache.put(f.getName(), analyzer);
}
return
cache;
}
……
……
……
}
|
but for AnalyzerWrapper
public
abstract
class
AnalyzerWrapper
extends
Analyzer {
/**
* Creates a new AnalyzerWrapper. Since the {@link Analyzer.ReuseStrategy} of
* the wrapped Analyzers are unknown, {@link Analyzer.PerFieldReuseStrategy} is assumed
*/
protected
AnalyzerWrapper() {
super
(
new
PerFieldReuseStrategy());
}
……
……
……
}
|
![]() | this PerFieldReuseStrategy catch annalyzer by FieldName rather than FieldType. this policy would cause redundant resouce be used. |
![]() | change to catch analyzer by FieldType |
We see the class TokenStreamComponents hold most of the memory. there are more than 1000 TokenStreamComponents object . By reviewing the code we find luncene Analyzier create this objects and cache them for reuse.