PROCESS
The PROCESS statement is used to convert one RowSet to another RowSet. Uses include:
- Transformations
- Adding/Removing columns
- Filtering data
- Expanding data
Syntax
All PROCESS statements require:
- PROCESS [input]
- PRODUCE column(s)
- USING processor
Additionally, users can supply a WHERE or HAVING statement to either filter the input or output.
Example (1)
This is taken from the Processor page.
PROCESS
PRODUCE A,B,C,D
USING MyProcessor(-a);
What this is saying is:
- Input the RowSet produced by the previous command
- Output the columns "A", "B", "C", and "D" with types determined by MyProcessor
- Use the Processor called "MyProcessor" with the argument "-a" passed in
Example (2)
This example shows how to use WHERE and HAVING to filter the input and output Row objects.
PROCESS
PRODUCE A,B,C,D
USING MyProcessor(-a)
WHERE A != "scope" //filter the output
HAVING A != "filter"; //filter the input
What this is saying is:
- Input the RowSet produced by the previous command
- Filter the input so that only rows having A not equal to "filter" will be input
- Output the columns "A", "B", "C", and "D" with types determined by MyProcessor
- Use the Processor called "MyProcessor" with the argument "-a" passed in
- Filter the output so that only rows having A not equal to "scope" will be output
Processors
Processors are used to take a RowSet and produce a RowSet. Typically, they are used for:
- Transformations (i.e. converting "A" to "Apple")
- Adding/removing columns
- Filtering (i.e removing Row objects where the first column doesn't start with "A")
Internally, the WHERE and HAVING statements are implemented as Processors as well as many SELECT transformations.
Syntax
Let's look at the following script:
PROCESS
PRODUCE A,B,C,D
USING MyProcessor(-a)
WHERE A != "scope"
HAVING A != "filter";
What this is saying is:
- Input the RowSet produced by the previous command
- Filter the input so that only rows having A not equal to "filter" will be further processed
- Output the columns "A", "B", "C", and "D"
- Use the Processor called "MyProcessor" with the argument "-a" passed in
- Filter the output so that only rows having A not equal to "scope" will be output
Note: The WHERE and HAVING clauses are optional, but the other clauses are required.
Writing a Processor
Syntax
The easiest thing to do is to right-click in the editor and select Implement-Processor. Here's an example of a Processor that will filter data:
#CS
public class SampleProcessor : Processor
{
public override Schema Produces(string[] requestedColumns, string[] args, Schema inputSchema)
{
return inputSchema.Clone();
}
public override IEnumerable<Row> Process(RowSet input, Row outputRow, string[] args)
{
foreach(Row row in input.Rows)
{
// Filter rows
if (!row[0].String.StartsWith("A"))
{
row.Copy(outputRow);
yield return outputRow;
}
}
}
}
#ENDCS
The two methods that need to be implemented are:
- Produces
- Process
Tips
- The outputRow object is already handed to the Processor.
- Do not create your own Row object
- Make your code efficient