上篇文章是实现了csv ScannableTable, 现在在那基础上增加使用FilterableTable实现
先说一下流程
- 创建MyFilterableTable, 实现FilterableTable
- 修改CsvSchema, 增加判断创建FilterableTable类型的表
- 修改my_csv_model.json的配置
1 MyFilterableTable
与Scannable代码基本一致, 不过实现的是FilterableTable, 重写的scan方法多了一个参数(这个参数暂时不用管)
public class MyFilterableTable extends AbstractTable implements FilterableTable {
private String path;
private Source source;
public MyFilterableTable(String path) throws MalformedURLException {
this.path = path;
this.source = Sources.of(new URL(path));
}
@Override
public RelDataType getRowType(RelDataTypeFactory relDataTypeFactory) {
JavaTypeFactory typeFactory = (JavaTypeFactory) relDataTypeFactory;
//字段名
List<String> names = new ArrayList<>();
//类型
List<RelDataType> types = new ArrayList<>();
try {
// 读取文件内容
BufferedReader br = new BufferedReader(new FileReader(source.file()));
// 读取第一行,规定第一行是字段信息, 如 ID:VARCHAR,NAME1:VARCHAR,NAME2:VARCHAR
String line = br.readLine();
String[] lines = line.split(",");
for (String column : lines) {
// 字段名
String name = column.split(":")[0];
// 字段类型
String type = column.split(":")[1];
names.add(name);
types.add(typeFactory.createSqlType(SqlTypeName.get(type)));
}
} catch (Exception e) {
e.printStackTrace();
}
return typeFactory.createStructType(Pair.zip(names,types));
}
@Override
public Enumerable<Object[]> scan(DataContext root, List<RexNode> list) {
JavaTypeFactory typeFactory = requireNonNull(root.getTypeFactory(), "typeFactory");
List<String> selfFilter = new ArrayList<>();
return new AbstractEnumerable<Object[]>() {
@Override
public Enumerator<Object[]> enumerator() {
return new CsvEnumerator(source);
}
};
}
}
2 CsvSchema
CsvSchema的getTableMap方法, 增加判断FILTERABLE类型
public class CsvSchema extends AbstractSchema {
private Map<String, Table> tableMap;
private Map<String, Object> operand;
public CsvSchema(Map<String, Object> operand) {
this.operand = operand;
}
@Override
protected Map<String, Table> getTableMap() {
if (tableMap == null) {
tableMap = new HashMap<>();
List<Map<String, String>> tables = (ArrayList<Map<String, String>>) operand.get("tables");
for (Map<String, String> table : tables) {
String type = table.get("type");
String path = table.get("path");
String name = table.get("name");
try {
if("SCANNABLE".equals(type)){
tableMap.put(name, new MyScannableTable(path));
}else if("FILTERABLE".equals(type)){
tableMap.put(name, new MyFilterableTable(path));
}
} catch (MalformedURLException e) {
e.printStackTrace();
}
}
}
return tableMap;
}
}
3 my_csv_model.json
把csv_user的type改成FILTERABLE
{ "version": "1.0", "defaultSchema": "csv", "schemas": [ { "name": "csv", "type": "custom", "factory": "com.cxydevelop.calcite.csv.CsvSchemaFactory", "operand": { "tables": [ { "name": "csv_user", "path": "file:///D:\\cxy\\idea_workspace\\myproject\\calcite\\src\\main\\resources\\csv_dir\\csv_user.csv", "type": "FILTERABLE" },{ "name": "csv_detail", "path": "file:///D:\\cxy\\idea_workspace\\myproject\\calcite\\src\\main\\resources\\csv_dir\\csv_detail.csv", "type": "SCANNABLE" } ] } } ] }
4 测试
@Test
public void query() throws Exception {
List<String> sqls = new ArrayList<>();
sqls.add("select * from csv.csv_user");
sqls.add("select * from csv.csv_user where (id ='1' and name = 'name1') or (id = '3' and name = 'name3')");
sqls.add("select t1.id, t1.name, t2.age, t2.desc from csv.csv_user t1 left join csv.csv_detail t2 on t1.id = t2.id");
for (String sql : sqls) {
System.out.println("执行sql:" + sql);
printResultSet(conn.createStatement().executeQuery(sql));
}
}
5 结果
执行sql:select * from csv.csv_user
[1, name1, www1]
[2, name2, eee2]
[3, name3, qwe3]
[4, name4, qwe4]
[5, name5, qwe5]
[6, name6, qwe6]
执行sql:select * from csv.csv_user where (id ='1' and name = 'name1') or (id = '3' and name = 'name3')
[1, name1, www1]
[3, name3, qwe3]
执行sql:select t1.id, t1.name, t2.age, t2.desc from csv.csv_user t1 left join csv.csv_detail t2 on t1.id = t2.id
[1, name1, 11, aaa]
[2, name2, 22, bbb]
[3, name3, 33, cccc]
[4, name4, null, null]
[5, name5, null, null]
[6, name6, null, null]
可以看到, 通过sql可以正确的查询出csv的数据.
6 问题
回到刚刚scan方法的第二个List类型的参数, 打断点可以发现, 这个参数是where的条件信息, 如果把list里面的元素删除掉, 打印结果可以发现sql中的过滤条件失效了
所以在FilterableTable中可以获取到sql中的过滤条件, 目的其实是想要让我们可以自己来处理, 把过滤条件下推到我们数据源(即我们在读取文件的地方CsvEnumerator), 效率会更高
7 官方CsvFilterableTable的源码
查看官方CsvFilterableTable的源码, 可以发现, 他也是做了一个很简单的处理 : 在scan方法中, 在过滤条件参数filters中把AND类型的条件递归提取出来存到filterValues, 然后传给CsvEnumerator
下面使用的是官方自带的csv连接器执行sql, 主要是看看源码执行过程中的细节
select * from csv.csv_user where (id ='1' and name = 'name1') or (id = '3' and name = 'name3')
现在执行的sql的条件是OR, 所以可以发现filter中的条件没有被提取出来并删掉, filterValues是空的
把sql改成如下
select * from csv.csv_user where id ='1' and name = 'name1'
代码的细节提取的条件是AND关系连接的关系或者一个EQUALS条件
比如
select * from csv.csv_user where id ='1' | 提取id = '1' ,并且会remove掉原来的 |
select * from csv.csv_user where id ='1' and name = 'name1' | 提取id ='1' 和name = 'name1',但不会remove掉原来的 |
select * from csv.csv_user where (id ='1' and name = 'name1') or (id = '3' and name = 'name3') | 不提取不会remove掉原来的 |