第二部分 Documents,
Fields, and Schema Design
|
Putting the Pieces Together
将上面讲到的部分拼接到一起就是一个良好的schema.xml了
Choosing Appropriate Numeric Types
如果频繁的查询数字类型,可以使用precisionStep="8" 或者默认值,来提高查询速度,但是索引大小或增加
Working
With Text
两种方式
使用copyfield复制要检索的字段,作为默认检索
使用copyfield复制一个字段,使用不同的处理方式.分词 排序
Related Topics
SchemaXML |
DocValues
在内存中纪录索引的内容,使操作更加高效,像sorting和faceting.
Why DocValues?
平时我们使用倒排序索引来建立一个文档的索引,这样,普通的搜索是很快的,但是对于solr中别的功能such as sorting, faceting, and highlighting,而且,使用缓慢加载的方式,将其加载到内容中. 在lucene4中引入docvalues来结果这个问题.能提高效率和减少内存的使用.
DocValue fields are now column-oriented fields with a document-to-value
mapping built at index time.
Enabling DocValues 在定义字段类型或者字段时使 docValues="true"
<field name="manu_exact" type="string" indexed="false" stored="false" docValues="true"
/>
可以使用docvalue属性的相关类型:
StrField and UUIDField.
If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. If the field is multi-valued, Lucene will use the SORTED_SET type. Any Trie* numeric fields and EnumField. If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. If the field is multi-valued, Lucene will use the SORTED_SET type. you could choose to keep everything in memory by specifying docValuesFormat="Memory" on afield
type:
<fieldType name="string_in_mem_dv" class="solr.StrField" docValues="true"
docValuesFormat="Memory" /> docValuesFormat属性在将来版本经可能会改变
Using
DocValues
理解还不够,以后可以继续加深下 Sorting,
Faceting & Functions
If docValues="true" for a field, then DocValues will automatically be used any time the field is used for sorting, faceting or Function Queries. Retrieving DocValues During Search
docvalues的返回与否取决于 useDocValuesAsStored="true"
的属性及查询条件 fl 指定 还是*
docvalues和正常的检索的区别点:
1.返回值 正常是插入的顺序,docvalues是排序后的顺序
2.多值的话,会去除重复的值
|
Schemaless Mode
These Solr features, all controlled via solrconfig.xml, Using
the Schemaless Example
curl http://localhost:8983/solr/gettingstarted/schema/fields Configuring Schemaless Mode
Enable Managed Schema
默认是支持对schema进行管理的,除非你自己实现了ClassicIndexSchemaFactory被使用.要配置如下的例子: <schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool> <str name="managedSchemaResourceName">managed-schema</str> </schemaFactory> Define an UpdateRequestProcessorChain UpdateRequestProcessorChain
能够让solr猜测一个字段的类型,你也能自己定义一个默认的字段类型.在solrconfig.xml 例子如下;
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> <!-- UUIDUpdateProcessorFactory will generate an id if none is present in the incoming document --> <processor class="solr.UUIDUpdateProcessorFactory" /> <processor class="solr.LogUpdateProcessorFactory"/> <processor class="solr.DistributedUpdateProcessorFactory"/> <processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/> <processor class="solr.FieldNameMutatingUpdateProcessorFactory"> <str name="pattern">[^\w-\.]</str> <str name="replacement">_</str> </processor> <processor class="solr.ParseBooleanFieldUpdateProcessorFactory"/> <processor class="solr.ParseLongFieldUpdateProcessorFactory"/> <processor class="solr.ParseDoubleFieldUpdateProcessorFactory"/> <processor class="solr.ParseDateFieldUpdateProcessorFactory"> <arr name="format"> <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str> <str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str> <str>yyyy-MM-dd'T'HH:mm:ss.SSS</str> <str>yyyy-MM-dd'T'HH:mm:ss,SSS</str> <str>yyyy-MM-dd'T'HH:mm:ssZ</str> <str>yyyy-MM-dd'T'HH:mm:ss</str> <str>yyyy-MM-dd'T'HH:mmZ</str> <str>yyyy-MM-dd'T'HH:mm</str> <str>yyyy-MM-dd HH:mm:ss.SSSZ</str> <str>yyyy-MM-dd HH:mm:ss,SSSZ</str> <str>yyyy-MM-dd HH:mm:ss.SSS</str> <str>yyyy-MM-dd HH:mm:ss,SSS</str> <str>yyyy-MM-dd HH:mm:ssZ</str> <str>yyyy-MM-dd HH:mm:ss</str> <str>yyyy-MM-dd HH:mmZ</str> <str>yyyy-MM-dd HH:mm</str> <str>yyyy-MM-dd</str> </arr> </processor> <processor class="solr.AddSchemaFieldsUpdateProcessorFactory"> <str name="defaultFieldType">strings</str> <lst name="typeMapping"> <str name="valueClass">java.lang.Boolean</str> <str name="fieldType">booleans</str> </lst> <lst name="typeMapping"> <str name="valueClass">java.util.Date</str> <str name="fieldType">tdates</str> </lst> <lst name="typeMapping"> <str name="valueClass">java.lang.Long</str> <str name="valueClass">java.lang.Integer</str> <str name="fieldType">tlongs</str> </lst> <lst name="typeMapping"> <str name="valueClass">java.lang.Number</str> <str name="fieldType">tdoubles</str> </lst> </processor> <processor class="solr.RunUpdateProcessorFactory"/> </updateRequestProcessorChain> 上文中的 update processor factories
UUIDUpdateProcessorFactory
RemoveBlankFieldUpdateProcessorFactory FieldNameMutatingUpdateProcessorFactory ParseBooleanFieldUpdateProcessorFactory ParseLongFieldUpdateProcessorFactory ParseDoubleFieldUpdateProcessorFactory ParseDateFieldUpdateProcessorFactory AddSchemaFieldsUpdateProcessorFactory Make
the UpdateRequestProcessorChain the Default for the UpdateRequestHandler
在solrconfig.xml中配置,是上文定义的更新处理过程链被UpdateRequestHandlers使用:
<initParams path="/update/**">
<lst name="defaults"> <str name="update.chain">add-unknown-fields-to-the-schema</str> </lst> </initParams> Examples
of Indexed Documents
solr官方给出一个配置实现 solr6\server\solr\configsets\data_driven_schema_configs\conf
curl "http://localhost:8983/solr/gettingstarted/update?commit=true"
-H
"Content-type:application/csv" -d ' id,Artist,Album,Released,Rating,FromDistributor,Sold 44C,Old Shews,Mead for Walking,1988-08-13,0.01,14,0' {
"responseHeader":{ "status":0, "QTime":1}, "fields":[{ "name":"Album", "type":"strings"}, // Field value guessed as String -> strings fieldType { "name":"Artist", "type":"strings"}, // Field value guessed as String -> strings fieldType { "name":"FromDistributor", "type":"tlongs"}, // Field value guessed as Long -> tlongs fieldType { "name":"Rating", "type":"tdoubles"}, // Field value guessed as Double -> tdoubles fieldType { "name":"Released", "type":"tdates"}, // Field value guessed as Date -> tdates fieldType { "name":"Sold", "type":"tlongs"}, // Field value guessed as Long -> tlongs fieldType { "name":"_text_", ... }, { "name":"_version_", ... }, { "name":"id", ... }]} 这是动态的猜测添加字段,实际上来说,我们还是需要更加的可控才好.这个不太重要.
|
好,到这里,第二部分的内容,就算是简单的看完了,现在回忆一下大概的要点:
Documents, Fields, and Schema Design
Solr Field Types: 这部分讲述的是字段类型的定义及其属性有哪些,及默认值,其中可能会定义分词器,过滤器等(这个应该后面详细会说)
Defining Fields:讲述的是如何定义字段及字段的属性描述
Copying Fields: 如何定义复制字段,和具体如何使用他
Dynamic Fields:动态字段的作用和定义,属性等
Schema API: 使用 http REST API来操作schema
Other Schema Elements: 其他的元素, 唯一主键,默认操作符,默认搜索域及打分相关的类配置
Putting the Pieces Together: 组合成一个schmea约束文件
DocValues: 什么是docvalues,工作原理及使用的场景
Schemaless Mode: 动态的添加字段类型根据配置的猜测更新请求链
|
Solr入门之官方文档6.0阅读笔记系列(五) 第二部分结束
最新推荐文章于 2019-09-04 23:05:28 发布