做数据转换的时候,什么样的脏数据都有可能发生,不要期待一切都如你所愿。
1. 写文件的时候一定要注意传来字段的制表符问题
读文件我们readline 然后用\t来读数据
写文件的时候,我们用\n来换行。
如果遇到下面的情况就有些会出现问题了,字段中包含制表符,这样做数据转换的时候就会发生错位。
{"code":"CUXZJS","refer":"\r\nDV8HFI","referPid":null,"people":[],"iosPushToken":""}
2. 用java的小伙伴们,如果用split函数的时候,要注意
如果一条数据是这样的
A\tB\tC\t\t 注意这是五个字段A,B,C,D, E 但是D,E传来的是空字符串
String a = "A B C ";
String[] arrStrings = a.split("\t");
这样简单的split,不是完全匹配,最后数组里只有[A,B,C]三个元素
所以要完全匹配需要使用split(regex,-1)
String a = "A B C ";
String[] arrStrings = a.split("\t",-1);
这样数组会匹配到[A, B, C, , ]
查看源码定义
public String[] split(String regex, int limit)
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limitn is greater than zero then the pattern will be applied at mostn - 1 times, the array's length
will be no greater thann, and the array's last entry will contain all input beyond the last matched delimiter. Ifn is non-positive then the pattern will be applied as many times as possible and the array can have any length. Ifn
is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.