偶遇串行化 Serializer
- MD doCumEnT: 3/13/2016 6:06:17 AM by Jimbowhy
我其实是挺理解老外大胡子编程人士为何爱用“f**king code”来描述那种状态,因为有时候眼手一起码上了劲,就会真有那种感觉的啊! - by Jimbowhy, 3/13/2016 7:55:10 AM
偶遇串行化
玩 LeetCode 的过程中,无意打开了模拟面试的功能 Mock Interview,出现一条60分钟的试题,喔!串行化,作为一个编程领域必备技术,串行化的功能最能体现威力的就有远程对象传输,也就是说通过网络,将本机运程的程序对象发送给另一个正在运行的程序,是不是很棒!作为 MFC 六大核心机制之一的串行化,也用于 MFC 体系中的文件存储,总之串行化和解串行是令我兴奋的技术之一。今天就要来 LeetCode 解一解串行化的题目,原题内容:
No. 297 Serialize and Deserialize Binary Tree My Submissions Question
Total Accepted: 15172 Total Submissions: 56198 Difficulty: HardRemaining time: 38 minutes, 29 seconds.
Serialize and Deserialize Binary Tree
Difficulty: HardSerialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file or memory buffer, or transmitted across a network connection link to be reconstructed later in the same or another computer environment.
Design an algorithm to serialize and deserialize a binary tree. There is no restriction on how your serialization/deserialization algorithm should work. You just need to ensure that a binary tree can be serialized to a string and this string can be deserialized to the original tree structure.
For example, you may serialize the following tree
1
/ \
2 3
/ \
4 5
as “[1,2,3,null,null,4,5]”, just the same as how LeetCode OJ serializes a binary tree. You do not necessarily need to follow this format, so please be creative and come up with different approaches yourself.
Note: Do not use class member/global/static variables to store states. Your serialize and deserialize algorithms should be stateless.Credits:
Special thanks to @Louis1992 for adding this problem and creating all test cases.
题目大意是,提供一个二叉树对象,实现它的串行化与解串行。边带之前的通配符试题,还有 Cross Self 等等有趣的题目,在 LeetCode 上玩的这几个星期确实让我过足了瘾!之前还用关系 Forth、Mathematica图像处理的文章未完成,做完 LeetCode 这道题就要暂停一阵了。
还有优快云的问答,动不却就封,太不近情理了,我连因为什么原因被封的都不知道!
编码过程
这道同样是 Hard 的题目花了不少时间,借此机会熟习了一翻C++的一些常用基础类,包含用 string 来处理二进制数据,用 deque 的双向堆栈结构来处理解串行遇到的问题。deque和vector还有list组成了STL的三大链接结构对象,这些是必需的工具类,但要用好它们可不简单,不过掌握了相应的数据结构,它们也不难。其它教材总是爱用容器来形容它们,我却太接收不了了,链表数据结构好像也跟容器扯不上多大关系吧!期间还使用了 stringstream 来尝试打印二叉树结构图,可惜不成功。以下就是用来测试 string 处理二进制数据的代码,还挺管用:
void test(){
char data[] = {'A','b',0x00,'C','D'};
string a(data), b(data,sizeof(data)), c(5, 0x00);
cout<< "string a: " << a << "\t" << a.size() <<endl;
cout<< "string b: " << b << "\t" << b.size() <<endl;
int a1st = (int)a.data();
int a2nd = (int)a.data()+1;
memcpy( (void*)c.data(), data, 5);
cout << "string a+b: " << b+c << "\t" << (b+c).length() << endl;
cout<< c << "\t" << c.size() << "\taddress: " << a1st << "\t" << a2nd <<endl;
//cout<< (void*)a.data()<< "\t" << (void*)a.data()+1 <<endl;
}
在开始编码的时候,竟然还遇到了 NULL 无定义问题,在C++的头文件 cstdio 中是这样定义的,以后就算没有头文件也可以手动定义 NULL 指针:
#ifndef NULL
#ifdef __cplusplus
#define NULL 0
#else
#define NULL ((void *)0)
#endif
#endif
题目给定了这样的一个链表结构体,由它构成题目输入数据中出现的二叉树对象:
typedef struct TreeNode{
int val;
TreeNode *left;
TreeNode *right;
TreeNode(int x) : val(x), left( NULL ), right( NULL) {}
} TreeNode;
对于二叉树的处理,一开始就想到用递归来串行化,虽然前面的文章在批递归怎么坏,但是这种问题递归才是最有效率的,不用递归来串行化二叉树那就是给自己找事。为了实现串行,考虑二叉树可能是不完全的树,肯定有枝点缺失的情况。因此定义串行化数据结构时,使用了一个幻数,只要一个字节就可以表示节点包含左右节点的情况。这个幻数就和二叉树的节点数据保存为一个数据单元,通过递归将这些数据单元拼接为一个整体。整个串行化的代码编写显得相当得心就手,所少用到的就是用 memcpy 来拷贝整型数值到字符串类中。幻数定义了三个值,其实是两个比特位,最低位表示右节点的状态,第二位表示左节点的状态,两个比特位可以表示四种状态,只要对应位置位,就表示拥有某子节点:
string serialize(TreeNode* root) {
if( root==NULL ) return string("");
TreeNode &rt = *root;
int msize = sizeof( ((TreeNode)0).val );
char magic = 0x00;
string s(msize+1, 'x');
if( rt.left != NULL ){
magic = magic | 0x02;
s += serialize(rt.left);
}
if( rt.right!= NULL){
magic = magic | 0x01;
s += serialize(rt.right);
}
char * pd = (char *)s.data();
char * pval = pd + 1;
memcpy( pd, &magic, 1);
memcpy( pval, &rt.val, 4);
return s;
}
在解串行化时问题就来了,题目给定的函数定义是这样的:
// Decodes your encoded data to tree.
TreeNode* deserialize(string data) {
//...
}
只接收一个参数,没有多余的施展空间,由于串行化数据是递归构造的,而且大左侧的节点要先于右侧节点输入输出,如果按现有的解串行方法定义肯定行不通,我也在想就按题目给定的函数定义能不能做呢?想来想去还真的头痛的,没门路。要么和 LeetCode 演示的那样按二叉树的层级进行解串行,可是串行函数已经是按递归设计了,不用递归也不太对。好吧,另定义一个函数来做解串行化的工作吧,保持题目给的函数,最简单有效的办法就是重载一个解串行化函数:
TreeNode* deserialize( string &data, int &index ){
int val, msize = 4; //sizeof( ((TreeNode)0).val );
memcpy( &val, (void *)(data.data()+index+1), sizeof(int) );
char magic = data[index];
TreeNode *root = new TreeNode( val );
TreeNode np = *root;
index += msize + 1;
if( magic & 0x02 ){
root->left = deserialize( data, index );
}
if( magic & 0x01 ){
root->right = deserialize( data, index );
}
return root;
}
有了这个函数,就是基本已经实现题目的要求了,对于给定的解串行化函数,只需要添加几行预备代码就可以运行了:
// Decodes your encoded data to tree.
TreeNode* deserialize(string data) {
if(data.length()==0) return NULL;
int index = 0;
return deserialize(data, index);
}
在编写代码的过程中,还提出想要实现二叉树的打印功能,希望可以打印出以线条连接子节点的字符图形,但几次尝试下来,还是做不到。只好退而求次,通过分层打印的方法来罗列各节点元素,用左右箭括号来代码是否含有左、右子节点:
void print(TreeNode root){
deque<TreeNode*> vn;
int l=0, n=0, max_loop = 0xffff;
vn.push_back(&root);
while(--max_loop){
n = vn.size();
if( n<=0 ) break;
while(n--){
TreeNode &tn = *vn.front();
vn.pop_front();
if( tn.left ) cout << "<";
cout << (char)tn.val << hex << "[" << tn.val << "]";
if( tn.right ) cout << ">";
cout << " ";
if( tn.left!=NULL ) vn.push_back(tn.left);
if( tn.right!=NULL ) vn.push_back(tn.right);
}
cout << endl;
}
}
这个方法就使用了 deque 双向堆栈结构,每扫描二叉树层时,同时又在做打印输出,所以输入输出是同时处理的,这就相对高效率一点。测试的数据输出如下:
Source TreeNode:
<A[41]>
<x[78]> <x[78]>
<y[79]> <y[79]> <y[79]> <y[79]>
z[7a] z[7a] z[7a] z[7a] z[7a] z[7a] z[7a] z[7a]
Serialized:
A x y z z y z z x y z z y z z
And this deserialized:
<A[41]>
<x[78]> <x[78]>
<y[79]> <y[79]> <y[79]> <y[79]>
z[7a] z[7a] z[7a] z[7a] z[7a] z[7a] z[7a] z[7a]
完整程序代码
/*
* Serialize & deserialize demo by Jimbowhy
* 3/13/2016 7:19:17 AM
* compile: cls && g++ -o Serializer Serializer.cpp && Serializer.exe
*/
#include <iostream>
#include <string>
#include <deque>
#include <cstdio>
using namespace std;
/**
* Definition for a binary tree node.
*/
typedef struct TreeNode{
int val;
TreeNode *left;
TreeNode *right;
TreeNode(int x) : val(x), left( NULL ), right( NULL) {}
} TreeNode;
class Codec {
public:
/*
* Encodes a tree to a single string.
* DATA FORMAT:
* Byte+Left+Right+Byte+Left+Right,
* BYTE FORMAT:
* 0x01 has right, 0x02 has left, 0x3 both left & right
*/
string serialize(TreeNode* root) {
if( root==NULL ) return string("");
TreeNode &rt = *root;
int msize = sizeof( ((TreeNode)0).val );
char magic = 0x00;
string s(msize+1, 'x');
if( rt.left != NULL ){
magic = magic | 0x02;
s += serialize(rt.left);
}
if( rt.right!= NULL){
magic = magic | 0x01;
s += serialize(rt.right);
}
char * pd = (char *)s.data();
char * pval = pd + 1;
memcpy( pd, &magic, 1);
memcpy( pval, &rt.val, 4);
return s;
}
// Decodes your encoded data to tree.
TreeNode* deserialize(string data) {
if(data.length()==0) return NULL;
int index = 0;
return deserialize(data, index);
}
TreeNode* deserialize( string &data, int &index ){
int val, msize = 4; //sizeof( ((TreeNode)0).val );
//int val = (int)(data[index+1]); // it's not working
memcpy( &val, (void *)(data.data()+index+1), sizeof(int) );
char magic = data[index];
TreeNode *root = new TreeNode( val );
TreeNode np = *root;
index += msize + 1;
if( magic & 0x02 ){
root->left = deserialize( data, index );
}
if( magic & 0x01 ){
root->right = deserialize( data, index );
}
return root;
}
// build tree.
TreeNode * build(int i, int e, TreeNode * root) {
TreeNode *r = new TreeNode(i);
TreeNode *l = new TreeNode(i);
root->left = l;
root->right = r;
if(i<e){
build(i+1, e, l);
build(i+1, e, r);
}
return root;
}
void print(TreeNode root){
deque<TreeNode*> vn;
int l=0, n=0, max_loop = 0xffff;
vn.push_back(&root);
while(--max_loop){
n = vn.size();
if( n<=0 ) break;
while(n--){
TreeNode &tn = *vn.front();
vn.pop_front();
if( tn.left ) cout << "<";
cout << (char)tn.val << hex << "[" << tn.val << "]";
if( tn.right ) cout << ">";
cout << " ";
if( tn.left!=NULL ) vn.push_back(tn.left);
if( tn.right!=NULL ) vn.push_back(tn.right);
}
cout << endl;
}
}
};
int main(){
TreeNode root('A');
Codec cd;
cd.build('x','z', &root);
string s(cd.serialize(&root));
cout << "Source TreeNode: \n";
cd.print(root);
cout << "Serialized: \n" << s << endl;
cout << "And this deserialized: \n";
root = *cd.deserialize(s);
cd.print(root);
TreeNode t1(1+'a');
t1.left = new TreeNode(2+'a');
t1.left->left = new TreeNode(255+256);
string t1s( cd.serialize(&t1) );
t1 = *cd.deserialize(t1s);
cout << "Test t1:\n" << t1s << endl;
cd.print(t1);
return 0;
}
提交程序通过全部 47 项测试,LeetCode 确认接收,运行时间为 36ms,运行效率成绩为98%。