SolrCloud Transaction Log 是如何工作的？

最新推荐文章于 2022-03-22 12:26:28 发布

amongdata

最新推荐文章于 2022-03-22 12:26:28 发布

阅读量2.1k

点赞数

分类专栏：搜索引擎文章标签： solrcloud 事务日志 Transaction Log

本文链接：https://blog.youkuaiyun.com/porui/article/details/9255327

版权

搜索引擎专栏收录该内容

16 篇文章

订阅专栏

本内容由我的同事Hans Tan 提供，在此感谢Hans 分享如下内容

WHY TRANSACTION LOG?

A transaction log records all operations performed on an Index between two hard commits
Each hard commit starts a new transaction log because a commit guarantees durability of operations performed before it
With transaction log, we can benefit with realtime-get feature. In some case, NRT(near real time) search is still not acceptable, for example, we need to get latest version of the document in concurrent updates.
One can recovery from transaction log in case of JVM crash and "Kill -9" scenario.
It also allows for a peer to ask "give me the list of the last update events you know about".

IMPLEMENTATION

UpdateLog will be initialized when SolrCore startup or reload

The add(), delete(), deleteByQuery() will be called each time has such request come in, and then followed by finish().
The preCommit(), postCommit(), preSoftCommit() and postSoftCommit() will be called when has commit/soft commit request or want to close indexWriter.
UpdateLog has 4 state
- REPLAYING -----This core is replaying from log, should do this replaying before register in Zookeeper
- BUFFERING ---- When core do recovery from leader, all request will buffering and wait to replay later. When in BUFFERING state, all commands will be marked with a flag (FLAG_GAP = 0x10)
- APPLYING_BUFFERED ---- After recovery have finished replicate step, will start to replay buffered documents
- ACTIVE ---- In this state, core can receive and handle request normally.

UpdateLog has 3 flush strategy
- NONE - do nothing
- FLUSH - only flush buffer for the buffered streaming, but not flush for underlying stream
- FSYNC - return when data is write into device

UpdateLog use a LinkedList "logs" to keep recent log files, newest first. Each time in postCommit(), the previous tlog will add into this list, and then to check if the numRecords > 100 or log file size > 10, if true, the oldest one will be removed from this list.

LOG START PROCESS DIAGRAM

WHAT HAPPENED WHEN COMMIT()?

1. use new map to store log, this is used for RealTimegetComponent, set preTlog=tlog, and tlog=null;

2. commit index writer, maybe open new searcher

3. add commit flag to preTlog, and add preTlog in logs list.

WHAT HAPPENED WHEN SOFTCOMMIT()?

1. use new map to store log

2. open new searcher

3. clear old map data

TRANSACTION LOG FORMAT FOR SOLRINPUTDOCUMENT

Following is the code in JavaBinCodec to write a SolrInputDocument into log file:

public void writeSolrInputDocument(SolrInputDocument sdoc) throws IOException {

writeTag(SOLRINPUTDOC, sdoc.size());

//SOLRINPUTDOC=16 is the tag to indicate following value should be size of key−value pair in the solr document

writeFloat(sdoc.getDocumentBoost()); //document boost

for (SolrInputField inputField : sdoc.values()) {

if (inputField.getBoost() != 1 .0f) {

writeFloat(inputField.getBoost()); //field boost if any

}

writeExternString(inputField.getName()); //field name

writeVal(inputField.getValue()); //field value

}

For writeVal(), please check following code:

//if the object type is known, will write using this type related method and then return,

//else if have given resolver, using this resolver to decode the object

//otherwise, write class name and toString() value to log file.

public void writeVal(Object val) throws IOException {

if (writeKnownType(val)) {

return ;

} else {

Object tmpVal = val;

if (resolver != null ) {

tmpVal = resolver.resolve(val, this );

if (tmpVal == null ) return ; // null means the resolver took care of it fully

if (writeKnownType(tmpVal)) return ;

}

writeVal(val.getClass().getName() + ':' + val.toString());

}

For wirteKnownType(), it has following known type: primitive, SolrDocumentList, NamedList, Collection, Object[], SolrDocument, SolrInputDocument, Map, Iterator, Iterable

Let use Long type as an example:

public void writeLong( long val) throws IOException {

if ((val & 0xff00000000000000L) == 0 ) {

//Any value that small than 0xff00000000000000L(only highest 8 bit is 1, other bit all 0) will be treat as small long

int b = SLONG | (( int ) val & 0x0f ); //SLONG=96( or in binary 01100000), this line used to get lowest 4 bit

if (val >= 0x0f ) { //if val>=15

b |= 0x10 ;

daos.writeByte(b); //write 01110000|(val lowest 4 bit), this used to mark that has data later, need to read continuously.

writeVLong(val >>> 4 , daos);

//right shift 4 bits, and use variable algorithm to write long, int this algorithm, each byte's highest bit used to as mark that

//shows if have additional byte later, other 7 bits store the value

} else { //if val<15, write tag and value together

daos.writeByte(b);

}

} else { //really large long value

daos.writeByte(LONG); //write tag firstly

daos.writeLong(val); //write value byte by byte

}

public long readSmallLong(FastInputStream dis) throws IOException {

long v = tagByte & 0x0F ;

if ((tagByte & 0x10 ) != 0 ) //in this case, the value should >=15

v = (readVLong(dis) << 4 ) | v;

return v;

}

Example: How to write a value < 15 and >= 15

long val= 13 ; // 00001101

00001101 & 0xff00000000000000L= 0

b = 01100000 |( 00001101 & 00001111 ) = 01100000 | 00001101 = 01101101

val< 15

daos.writeByte( 01101101 )

long val= 287 ; // 0001 00011111

val & 0xff00000000000000L= 0

b= 0110 0000 |( 0001 00011111 & 00001111 ) = 01101111

val> 15

b= b| 0x10 = 00010000 | 01101111 = 01111111

writeByte( 01111111 )

writeVlong( 0001 00011111 >>> 4 )=writeVong( 00010001 )