本内容由我的同事Hans Tan 提供,在此感谢Hans 分享如下内容
WHY TRANSACTION LOG?
- A transaction log records all operations performed on an Index between two hard commits
- Each hard commit starts a new transaction log because a commit guarantees durability of operations performed before it
- With transaction log, we can benefit with realtime-get feature. In some case, NRT(near real time) search is still not acceptable, for example, we need to get latest version of the document in concurrent updates.
- One can recovery from transaction log in case of JVM crash and "Kill -9" scenario.
- It also allows for a peer to ask "give me the list of the last update events you know about".
IMPLEMENTATION
UpdateLog will be initialized when SolrCore startup or reload
- The add(), delete(), deleteByQuery() will be called each time has such request come in, and then followed by finish().
- The preCommit(), postCommit(), preSoftCommit() and postSoftCommit() will be called when has commit/soft commit request or want to close indexWriter.
- UpdateLog has 4 state
- REPLAYING -----This core is replaying from log, should do this replaying before register in Zookeeper
- BUFFERING ---- When core do recovery from leader, all request will buffering and wait to replay later. When in BUFFERING state, all commands will be marked with a flag (FLAG_GAP = 0x10)
- APPLYING_BUFFERED ---- After recovery have finished replicate step, will start to replay buffered documents
- ACTIVE ---- In this state, core can receive and handle request normally.
- UpdateLog has 3 flush strategy
- NONE - do nothing
- FLUSH - only flush buffer for the buffered streaming, but not flush for underlying stream
- FSYNC - return when data is write into device
- UpdateLog use a LinkedList "logs" to keep recent log files, newest first. Each time in postCommit(), the previous tlog will add into this list, and then to check if the numRecords > 100 or log file size > 10, if true, the oldest one will be removed from this list.
LOG START PROCESS DIAGRAM

WHAT HAPPENED WHEN COMMIT()?
1. use new map to store log, this is used for RealTimegetComponent, set preTlog=tlog, and tlog=null;
2. commit index writer, maybe open new searcher
3. add commit flag to preTlog, and add preTlog in logs list.
WHAT HAPPENED WHEN SOFTCOMMIT()?
1. use new map to store log
2. open new searcher
3. clear old map data
TRANSACTION LOG FORMAT FOR SOLRINPUTDOCUMENT
Following is the code in JavaBinCodec to write a SolrInputDocument into log file:
public
void
writeSolrInputDocument(SolrInputDocument sdoc)
throws
IOException {
writeTag(SOLRINPUTDOC, sdoc.size());
//SOLRINPUTDOC=16 is the tag to indicate following value should be size of key−value pair in the solr document
writeFloat(sdoc.getDocumentBoost());
//document boost
for
(SolrInputField inputField : sdoc.values()) {
if
(inputField.getBoost() !=
1
.0f) {
writeFloat(inputField.getBoost());
//field boost if any
}
writeExternString(inputField.getName());
//field name
writeVal(inputField.getValue());
//field value
}
}
|
For writeVal(), please check following code:
//if the object type is known, will write using this type related method and then return,
//else if have given resolver, using this resolver to decode the object
//otherwise, write class name and toString() value to log file.
public
void
writeVal(Object val)
throws
IOException {
if
(writeKnownType(val)) {
return
;
}
else
{
Object tmpVal = val;
if
(resolver !=
null
) {
tmpVal = resolver.resolve(val,
this
);
if
(tmpVal ==
null
)
return
;
// null means the resolver took care of it fully
if
(writeKnownType(tmpVal))
return
;
}
}
writeVal(val.getClass().getName() +
':'
+ val.toString());
}
|
For wirteKnownType(), it has following known type: primitive, SolrDocumentList, NamedList, Collection, Object[], SolrDocument, SolrInputDocument, Map, Iterator, Iterable
Let use Long type as an example:
public
void
writeLong(
long
val)
throws
IOException {
if
((val & 0xff00000000000000L) ==
0
) {
//Any value that small than 0xff00000000000000L(only highest 8 bit is 1, other bit all 0) will be treat as small long
int
b = SLONG | ((
int
) val &
0x0f
);
//SLONG=96( or in binary 01100000), this line used to get lowest 4 bit
if
(val >=
0x0f
) {
//if val>=15
b |=
0x10
;
daos.writeByte(b);
//write 01110000|(val lowest 4 bit), this used to mark that has data later, need to read continuously.
writeVLong(val >>>
4
, daos);
//right shift 4 bits, and use variable algorithm to write long, int this algorithm, each byte's highest bit used to as mark that
//shows if have additional byte later, other 7 bits store the value
}
else
{
//if val<15, write tag and value together
daos.writeByte(b);
}
}
else
{
//really large long value
daos.writeByte(LONG);
//write tag firstly
daos.writeLong(val);
//write value byte by byte
}
}
public
long
readSmallLong(FastInputStream dis)
throws
IOException {
long
v = tagByte &
0x0F
;
if
((tagByte &
0x10
) !=
0
)
//in this case, the value should >=15
v = (readVLong(dis) <<
4
) | v;
return
v;
}
|
Example: How to write a value <
15
and >=
15
long
val=
13
;
// 00001101
00001101
& 0xff00000000000000L=
0
b =
01100000
|(
00001101
&
00001111
) =
01100000
|
00001101
=
01101101
val<
15
daos.writeByte(
01101101
)
long
val=
287
;
// 0001 00011111
val & 0xff00000000000000L=
0
b=
0110
0000
|(
0001
00011111
&
00001111
) =
01101111
val>
15
b= b|
0x10
=
00010000
|
01101111
=
01111111
writeByte(
01111111
)
writeVlong(
0001
00011111
>>>
4
)=writeVong(
00010001
)
|