摘要:
进行insert 操作的时候,其中的 ha_innobase::write_row的参数是table->record[0], 那么此时record[0]中是什么时候将各个field中的数据写入到该record[0]缓存中呢?
本文进行详细的分析。
数据从SQL层传递到innodb引擎层分析
发现很多人在问:
我在进行insert 操作的时候,其中的 ha_innobase::write_row的参数是table->record[0], 那么此时record[0]中是什么时候将各个field中的数据写入到该record[0]缓存中呢?下面是insert 操作时候的call stack
#0 row_mysql_store_col_in_innobase_format (dfield=0x7fda5c1923f8, buf=0x7fda5c192369 "\200", row_format_col=true, mysql_data=0x7fda5c19dc41 "\r", col_len=4, comp=1) at /home/lihao/workshop/stonedb-ver-2.0/storage/innobase/row/row0mysql.cc:405
#1 0x000055accf3c374a in row_mysql_convert_row_to_innobase (row=0x7fda5c1923b8, prebuilt=0x7fda5c191a98, mysql_rec=0x7fda5c19dc40 "\371\r", heap=0x7fdb186f2b20) at /home/lihao/workshop/stonedb-ver-2.0/storage/innobase/row/row0mysql.cc:622
#2 0x000055accf3c6150 in row_insert_for_mysql_using_ins_graph (mysql_rec=0x7fda5c19dc40 "\371\r", prebuilt=0x7fda5c191a98) at /home/lihao/workshop/stonedb-ver-2.0/storage/innobase/row/row0mysql.cc:1566
#3 0x000055accf3c689a in row_insert_for_mysql (mysql_rec=0x7fda5c19dc40 "\371\r", prebuilt=0x7fda5c191a98) at /home/lihao/workshop/stonedb-ver-2.0/storage/innobase/row/row0mysql.cc:1715
#4 0x000055accf15f2bc in ha_innobase::write_row (this=0x7fda5c17c0f0, record=0x7fda5c19dc40 "\371\r") at /home/lihao/workshop/stonedb-ver-2.0/storage/innobase/handler/ha_innodb.cc:9064
#5 0x000055accdc53f68 in handler::ha_write_row (this=0x7fda5c17c0f0, buf=0x7fda5c19dc40 "\371\r") at /home/lihao/workshop/stonedb-ver-2.0/sql/handler.cc:8607
#6 0x000055accdfc3c78 in write_record (thd=0x7fda5c001040, table=0x7fda5c189060, info=0x7fdb186f3320, update=0x7fdb186f33a0) at /home/lihao/workshop/stonedb-ver-2.0/sql/sql_insert.cc:2175
#7 0x000055accdfbf237 in Sql_cmd_insert_values::execute_inner (this=0x7fda5c0f76f0, thd=0x7fda5c001040) at /home/lihao/workshop/stonedb-ver-2.0/sql/sql_insert.cc:640
#8 0x000055accd8cd0b6 in Sql_cmd_dml::execute (this=0x7fda5c0f76f0, thd=0x7fda5c001040) at /home/lihao/workshop/stonedb-ver-2.0/sql/sql_select.cc:598
#9 0x000055accd8409ae in mysql_execute_command (thd=0x7fda5c001040, first_level=true) at /home/lihao/workshop/stonedb-ver-2.0/sql/sql_parse.cc:3577
#10 0x000055accd84628a in dispatch_sql_command (thd=0x7fda5c001040, parser_state=0x7fdb186f4a90) at /home/lihao/workshop/stonedb-ver-2.0/sql/sql_parse.cc:5239
#11 0x000055accd83bd09 in dispatch_command (thd=0x7fda5c001040, com_data=0x7fdb186f53e0, command=COM_QUERY) at /home/lihao/workshop/stonedb-ver-2.0/sql/sql_parse.cc:1959
#12 0x000055accd839d80 in do_command (thd=0x7fda5c001040) at /home/lihao/workshop/stonedb-ver-2.0/sql/sql_parse.cc:1362
#13 0x000055accda7666d in handle_connection (arg=0x55acd5ced050) at /home/lihao/workshop/stonedb-ver-2.0/sql/conn_handler/connection_handler_per_thread.cc:302
#14 0x000055accf99ae7c in pfs_spawn_thread (arg=0x55acd5d440f0) at /home/lihao/workshop/stonedb-ver-2.0/storage/perfschema/pfs.cc:2942
#15 0x00007fdb48e53609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#16 0x00007fdb48573133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
从上面可以看出,handler::ha_write_row的时候其会调用innodb的write_row来进行真正的数据写人。同时,在Innodb内部会将由每个field所表示的数据的逻辑格式转为innodb接受的物理格式。该项操作由row_mysql_store_col_in_innobase_format 函数来完成。
在fill_record_n_invoke_before_triggers函数中会变量每个Field和Item 从而将由Item所标识的插入的值保存至其对应的Field中。该项功能由fill_record函数来完成。以int类型为例
insert into xxx values (1,xxx);
此时会将1所对应的Item_long这个对象中获取到其值,int_val,并将该值保存至其Field类型的对象中。
#0 Field_long::store (this=0x7fda5c19dcc0, nr=13, unsigned_val=false) at /home/lihao/workshop/stonedb-ver-2.0/sql/field.cc:3713
#1 0x000055accdca9198 in save_int_value_in_field (field=0x7fda5c19dcc0, nr=13, null_value=false, unsigned_flag=false) at /home/lihao/workshop/stonedb-ver-2.0/sql/item.cc:6736
#2 0x000055accdca9202 in Item_int::save_in_field_inner (this=0x7fda5c0f6010, field=0x7fda5c19dcc0, no_conversions=false) at /home/lihao/workshop/stonedb-ver-2.0/sql/item.cc:6759
#3 0x000055accdca8364 in Item::save_in_field (this=0x7fda5c0f6010, field=0x7fda5c19dcc0, no_conversions=false) at /home/lihao/workshop/stonedb-ver-2.0/sql/item.cc:6551
#4 0x000055accd73c04e in fill_record (thd=0x7fda5c001040, table=0x7fda5c189060, fields=..., values=..., bitmap=0x0, insert_into_fields_bitmap=0x0, raise_autoinc_has_expl_non_null_val=true) at /home/lihao/workshop/stonedb-ver-2.0/sql/sql_base.cc:9567
#5 0x000055accd73cbba in fill_record_n_invoke_before_triggers (thd=0x7fda5c001040, optype_info=0x7fdb186f3320, fields=..., values=..., table=0x7fda5c189060, event=TRG_EVENT_INSERT, num_fields=2, raise_autoinc_has_expl_non_null_val=true, is_row_changed=0x0)
at /home/lihao/workshop/stonedb-ver-2.0/sql/sql_base.cc:9894
而经过fill_record函数后,那么此时table->record[0]中的数据已经发生变化,变为由各个列中的数据构成了。
bool fill_record(THD *thd, TABLE *table, const mem_root_deque<Item *> &fields,
const mem_root_deque<Item *> &values, MY_BITMAP *bitmap,
MY_BITMAP *insert_into_fields_bitmap,
bool raise_autoinc_has_expl_non_null_val) {
DBUG_TRACE;
assert(CountVisibleFields(fields) == CountVisibleFields(values));
/*
In case when TABLE object comes to fill_record() from Table Cache it
should have autoinc_field_has_explicit_non_null_value flag set to false.
In case when TABLE object comes to fill_record() after processing
previous row this flag should be reset to false by caller.
Code which implements LOAD DATA is the exception to the above rule
as it calls fill_record() to handle SET clause, after values for
the columns directly coming from loaded from file are set and thus
autoinc_field_has_explicit_non_null_value possibly set to true.
*/
assert(table->autoinc_field_has_explicit_non_null_value == false ||
(raise_autoinc_has_expl_non_null_val &&
thd->lex->sql_command == SQLCOM_LOAD));
auto value_it = VisibleFields(values).begin();
for (Item *fld : VisibleFields(fields)) {
Item_field *const field = fld->field_for_view_update();
assert(field != nullptr && field->table_ref->table == table);
Field *const rfield = field->field;
Item *value = *value_it++;
/* If bitmap over wanted fields are set, skip non marked fields. */
if (bitmap && !bitmap_is_set(bitmap, rfield->field_index())) continue;
bitmap_set_bit(table->fields_set_during_insert, rfield->field_index());
if (insert_into_fields_bitmap)
bitmap_set_bit(insert_into_fields_bitmap, rfield->field_index());
/* Generated columns will be filled after all base columns are done. */
if (rfield->is_gcol()) continue;
if (raise_autoinc_has_expl_non_null_val &&
rfield == table->next_number_field)
table->autoinc_field_has_explicit_non_null_value = true;
/*
We handle errors from save_in_field() by first checking the return
value and then testing thd->is_error(). thd->is_error() can be set
even when save_in_field() does not return a negative value.
@todo save_in_field returns an enum which should never be a negative
value. We should change this test to check for correct enum value.
The below call can reset TABLE::autoinc_field_has_explicit_non_null_value
flag depending on value provided (for details please see
set_field_to_null_with_conversions()). So evaluation of this flag can't
be moved outside of fill_record(), to be done once per statement.
*/
if (value->save_in_field(rfield, false) < 0) {
my_error(ER_UNKNOWN_ERROR, MYF(0));
return true;
}
if (thd->is_error()) return true;
}
if (table->has_gcol() &&
update_generated_write_fields(bitmap ? bitmap : table->write_set, table))
return true;
/*
TABLE::autoinc_field_has_explicit_non_null_value should not be set to
true in raise_autoinc_has_expl_non_null_val == false mode.
*/
assert(table->autoinc_field_has_explicit_non_null_value == false ||
raise_autoinc_has_expl_non_null_val);
return thd->is_error();
}
由fill_record函数中,我们并未发现其有操作table->record[0]的代码,那么Field又是如何与table->record[0]发生关联呢?使得我们value->save_in_field函数中的操作可以之间写人到table->record[0]中呢?总要又一个地方使得Field和Table中的record[0]发生关联?从而使得对应Field中的ptr的操作可以直接反映到对Table中的record[0]的操作。那么这个地方在哪里呢?答案在:
int open_table_from_share(THD *thd, TABLE_SHARE *share, const char *alias,
uint db_stat, uint prgflag, uint ha_open_flags,
TABLE *outparam, bool is_create_table,
const dd::Table *table_def_param)
这个函数中。下面我们就看看其是如何发生关联的。
首先在该函数中会在outparam创建一个新的Table对象
new (outparam) TABLE();
outparam->in_use = thd;
outparam->s = share;
outparam->db_stat = db_stat;
outparam->write_row_record = nullptr;
然后对该outparam进行初始,其中一项是分配record内存并将其赋值给outparam中的record.
record = root->ArrayAlloc<uchar>(share->rec_buff_length * records +
share->null_bytes);
if (record == nullptr) goto err; /* purecov: inspected */
if (records == 0) {
/* We are probably in hard repair, and the buffers should not be used */
outparam->record[0] = outparam->record[1] = share->default_values;
has_default_values = true;
} else {
outparam->record[0] = record;
if (records > 1)
outparam->record[1] = record + share->rec_buff_length;
else
outparam->record[1] = outparam->record[0]; // Safety
}
outparam->null_flags_saved = record + (records * share->rec_buff_length);
memset(outparam->null_flags_saved, '\0', share->null_bytes);
而后,创建N+1个Field对象并将其复制给outparam中的field:
if (!(field_ptr = root->ArrayAlloc<Field *>(share->fields + 1)))
goto err; /* purecov: inspected */
outparam->field = field_ptr;
record = (uchar *)outparam->record[0] - 1; /* Fieldstart = 1 */
outparam->null_flags = (uchar *)record + 1;
然后对outparam中的field进行复制(由table cache中)mysql中的注释已经说明了一切:
/*
We will create fields by cloning TABLE_SHARE's fields; then we will need
to make all new fields' pointers point into the new TABLE's record[0], by
applying an offset to them.
Calculate the "source" offset depending on table type:
- For non-internal temporary tables, source is share->default_values
- For internal tables, source is first TABLE's record[0], which
happens to be created in same memory block as share->default_values, with
offset 2 * share->rec_buff_length (see create_tmp_table()).
*/
"make all new fields' pointers point into the new TABLE's record[0]" 这样就把每个field指向了record[0],因为可以获取到每个field的类型和数据的长度,因而在后续的fill_record函数中我们对Field的store操作,其实质上也是对record[0]的操作。当我们完成对应所有Field的store操作后,那么我们在reocd[0]中或获得了每个列的数据。
record[0] memory layout:
-----------------------------------------------------
Field1 | Field2 | ... | FieldN
-----------------------------------------------------
/* Setup copy of fields from share, but use the right alias and record */
for (i = 0; i < share->fields; i++, field_ptr++) {
Field *new_field = share->field[i]->clone(root);
*field_ptr = new_field;
if (new_field == nullptr) goto err;
new_field->init(outparam);
new_field->move_field_offset(move_offset);
/*
Initialize Field::pack_length() number of bytes for new_field->ptr
only if there are no default values for the field.
*/
if (!has_default_values) new_field->reset();
/* Check if FTS_DOC_ID column is present in the table */
if (outparam->file &&
(outparam->file->ha_table_flags() & HA_CAN_FULLTEXT_EXT) &&
!strcmp(outparam->field[i]->field_name, FTS_DOC_ID_COL_NAME))
fts_doc_id_field = new_field;
}
结语:
原文发表于 MySQL中如何讲数据添加至table->record[0]
欢迎关注公众号 DatabaseHacker, 不定时分享MySQL,PostgreSQL等数据库最新内核分析及相关知识。