Thrax Tutorial

Thrax Tutorial

Thrax is used for compiling grammars expressed as regular expressions and content-dependent rewrite rules into WFSTs.

commands

  • thraxmakedep
    • --save_symbols=true
  • thraxcompiler
    • -input_grammar=example.grm
    • --output_far=example.far
    • --save_symbols=true|false(default)
  • thraxrewrite-tester
    • --far=example.far
    • --rules=TOKENIZER
  • thraxrandom-generator
    • --far=example.far
    • --rule=TOKENIZER
    • --noutput=10

general statements

  • Each statement consists of an assignment terminating with a semicolon.
  • Statements start with export keyword will be written to final output archive.
foo = "abc";
export bar = foo | "xyz";

string input

  • String FSTs are defined by text enclosed by quotes (").
  • Raw strings, such as filenames, are enclosed by single quotes (').
  • In the default parse mode, each arc of the resulting FST will correspond to a single 1-byte character.
  • When use symbol table parse mode, symbols should be separate by separators, which by default is a space.
    • symbol table can be loaded by SymbolTable built-in function.
symtab = SymbolTable['/path/to/bears.symtab'];
pb = "polar bear".symtab;
  • We can create temporal symbols by enclosing the symbol name inside a bracket ([]) within an FST string.
  • If the symbol name is a complete integer, then we use the number as arc label directly.

parse mode

use . to explicitly specify the parse modes:

  • byte: parse the string byte-by-byte. This is default mode.
  • utf8: use UTF8 characters for FST arcs.
a = "haha"          # byte
b = "haha".byte     # byte
c = "haha".utf8     # byte
d = "haha".symtab   # symbol table

function

func UnionWithTriple[fst] {
    fst3 = fst fst fst;
    result = fst | fst3;
    return result;
}

export a_or_a3 = UnionWithTriple["a"]

symbols

  • (): Group an expression to be evaluated first.
  • <>: Attach a weight to the FST.
foo = "aaa" <1>;
goo = "aaa" : "bbb" <-1>;

operations

  • Closure: repeats the argument FST.
    • fst*
    • fst+
    • fst?
    • fst{x,y}
  • Concatenation: follows the first FST with the second.
    • foo bar
  • * Difference*: accepts by the first and not the second.
    • foo - bar
  • Composition: composes the first FST with the second.
    • foo @ bar
  • Union: accepts either of the two FSTs.
    • foo | bar
  • Rewrite: rewrites strings matching the first to the second.
    • foo : bar
  • Determinize: Determinize[fst]
  • RmEpsilon: RmEpsilon[fst]
  • Minimize: Minimize[fst]
  • Optimize: Optimize[fst]
  • Reverse: Reverse[fst]

file functions

  • LoadFst: load fst from a file or extracting from a FAR.
    • LoadFst['/path/to/fst']
    • LoadFstFromFar['/path/to/far', 'fst_name']
  • StringFile: load a file consisting of a list fo strings or pairs of strings.
    • Compiles it (in byte mode) to an FST that represents the union of those string. This is significantly more efficient for large
      • StringFilie['strings_file']
    • If the file contains single strings, one per line, then the resulting FST will be an acceptor.
    • If the file contains pairs of tab-separated strings, the result will be a transducer.
      • Specify the parse modes of left and right of the tab.
        • StringFile['strings_file', 'byte', symbols]
  • SymbolTable: loads and returns the symbol table.
    • SymbolTable['/path/to/symtab']
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值