Introduction
An OnStream User Defined Function (UDF) in Netezza is a piece of code compiled into binary file and called from within a SQL statement. UDFs are implemented in C++ now, but it is planned that other languages (such as Java, Python, R and so on) can be used in the future.
A Netezza box (NPS) consists of a central host (two hosts forming an NLB cluster, actually) and a bunch of SPU modules. The central host uses the RedHat operating system, while SPUs use an embedded Linux OS. A UDF runs on either a central NPS host or a SPU depending on how it is invoked. This is why it is compiled for two different target platforms and two different binaries are produced. The binaries are copied to every SPU as well as to a system location on the central hosts and registered with the database system to make them available for use in SQL statements.
Sample UDF description
I am going to show you how to implement a Netezza UDF using the following real world scenario.
It is a common practice to use natural and surrogate keys for referencing database entities and to provide entity traceability both internally and externally. A “natural key” is often a concatenation of values from several columns that come from an external system and used for uniquely identifying an entity.
For the sake of this exercise, consider a function that accepts 3 varchar columns to form a natural key (NK) using the following algorithm:
NK = upper(trim(column1)) || upper(trim(column2)) || upper(trim(column3))
Therefore, we are going to implement a UDF named generate_nk() that accepts three varchar arguments; part1, part2, and part3 and returns a natural key built in accordance with the formulae above.
How to create a UDF manually
A UDF in Netezza is a class that inherits from an abstract parent class. The base class for a UDF is Udf (what a coincidence of names!)
The base class defines several abstract functions that you as a UDF developer must implement. So let’s start by including the required headers and creating a declaration for our class:
// add base UDF support
#include "udxinc.h"
#include "udxhelpers.h"
// this is required because we are going to use a toupper() function declared in ctype.h
#include <ctype.h>
// start using right namespaces
using namespace nz::udx::dthelpers;
using namespace nz::udx;
// declare the class
class Cgenerate_nk : public Udf
{
public:
static Udf *instantiate();
virtual ReturnValue evaluate();
};
Now that we have the class skeleton, let’s have a look at the two functions that it contains:
1. static Udf instantiate();
This function is required for creating an instance of your UDF. Create a new instance of the Cgenerate_nk class and return it.
2. virtual ReturnValue evaluate();
This function does the actual calculations and returns a value to the caller.
The instantiate() function body is a simple one-liner (add the following code snippets immediately after the function declaration within the same file:
// This static function creates a new instance of the class.
Udf *Cgenerate_nk::instantiate()
{
return new Cgenerate_nk();
}
The evaluate() is a little bit more complex (it can be very complex in fact. Everything depends on what you are trying to achieve.) In our case, the evaluate() looks like this:
// Calculates and returns the function result.
ReturnValue Cgenerate_nk::evaluate()
{
// find out input string lengths
int len[3];
len[0] = isArgNull(0) ? 0 : stringArg(0)->length;
len[1] = isArgNull(1) ? 0 : stringArg(1)->length;
len[2] = isArgNull(2) ? 0 : stringArg(2)->length;
// create an array that holds input strings
char* *parts = new char*[3];
parts[0] = len[0] == 0 ? NULL : stringArg(0)->data;
parts[1] = len[1] == 0 ? NULL : stringArg(1)->data;
parts[2] = len[2] == 0 ? NULL : stringArg(2)->data;
// prepare return value object
StringReturn *pResult = stringReturnInfo();
pResult->data[0] = 0;
char*pCurrent = pResult->data;
// process arguments one by one in a loop
for (int i = 0; i < 3; i++)
{
if (len[i] > 0)
{
// trim leading spaces
int j = 0;
while (parts[i][j] == ' ')
j++;
// trim trailing spaces
int k = len[i] - 1;
while (k > j && parts[i][k] == ' ')
k--;
// copy non-space chars to the result buffer converting to upper case
while (j <= k)
{
*pCurrent = toupper(parts[i][j]);
pCurrent++;
j++;
}
}
}
// clean up
delete []parts;
// specify the output string length
pResult->size = pCurrent - pResult->data;
// done, return the generated NK
NZ_UDX_RETURN_STRING(pResult);
}
The UDF is implemented!
What is left is to compile the UDF and register it with the NPS. Here are the steps.
1. Save the file as Cgenerate_nk.cpp
2. Log on to the NPS host using SSH
3. Copy the Cgenerate_nk.cpp to the NPS. For the sake of this sample, I assume you are logged in to the Netezza as nz user so the home folder is /home/nz and the full UDF file path is /home/nz/Cgenerate_nk.cpp
4. Compile and register the UDF in the target database using the following shell command:
/nz/kit/bin/adm/nzudxcompile "/home/nz/Cgenerate_nk.cpp" -o "/home/nz/Cgenerate_nk.o" --sig "generate_nk(VARCHAR(50), VARCHAR(50), VARCHAR(50))" --return "VARCHAR(150)" --class "Cgenerate_nk" --user "admin" --pw "adminpwd" --db "DEMO_DB" --nullcall
This command compiles the CPP file into a binary file and registers a UDF that accepts 3 varchar(50) parameters and returns a varchar(150) as result. The function is registered in DEMO_DB database, and the owner is ADMIN. The function does not return NULL if any argument is null.
The function is created, compiled, registered and ready for use! A sample call is this:
select generate_nk('first', 'second', 'third')
which returns FIRSTSECONDTHIRD
Another example of the function usage is
select generate_nk(manufacturer, vendor, product_code) from admin..raw_products
This produces a natural key for every record in the raw_products table by concatenating product manufacturer, vendor and product code.
How to create a UDF using Aginity Netezza Workbench
Aginity Netezza Workbench simplifies the whole procedure described above by hiding the details from the user and allowing the user to concentrate on coding the UDF itself without being distracted by those tedious SSH operations and manual compilation. Let’s implement the same UDF using the Workbench tool.
1. Launch the Workbench.
2. Use File / New UDX / Function menu to start a New UDF Wizard.
3. Specify generate_nk as the SQL name (this is the function name as it appears in SQL statements.)
4. Go to the Arguments node and add part1, part2, and part3 arguments and specify varchar(50) as an argument data type for all of them.
5. Go to the Returns node and specify varchar(150) as the return value.
6. Click OK to close the wizard.
7. The Workbench generates for you everything except the evaluate()body and even a little bit more :) Namely:
a. Includes
b. Namespaces
c. Class declaration
d. Argument accessors and helper methods for checking if an argument is NULL
e. Fully implemented instantiate() function
f. An optional calculateSize() function that is not useful for the sake of this sample. It is used with the functions that return variable length values (for instance varchar(any) or numeric(any). This function usage is out of scope for this post.
8. What Workbench has not done for you is the logic within the evaluate() function. I would call it artificial intelligence if it did :) Let’s add the missing pieces to see the whole puzzle.
a. Add line #include <ctype.h> to the includes section of the file.
b. Copy and paste the body of the evaluate() function from the section above. Please note that the tool has generated inline helper functions for improving arguments handling. (is_arg_null_part1(),is_arg_null_part2() and is_arg_null_part3() can be used for checking the arguments for NULLs. get_arg_part1(),get_arg_part2() and get_arg_part3() provide access to argument values by name rather than by index. This way a line of code that reads
len[0] =isArgNull(0) ? 0 : stringArg(0)->length;
can be re-written in a more readable form like
len[0] = is_arg_null_part1() ? 0 : get_arg_part1()->length;
c. Switch to the Test Query tab of the UDX designer window and specify a test SQL to be used for testing the UDF:
select generate_nk(' abc ', ' 1 2 3 ', ' def ');
9. The function is created! Press the F5 key (or use the UDX / Build & Create & Run menu) to create, and register the UDF and execute a test SQL statement. It prompts for your Linux OS user credentials when the SSH connection is established. You have an option to tell the tool to remember those credentials so that you are not prompted every time. The credentials are stored securely in Windows registry encrypted by a key specific to the currently logged in user (which is safe enough.)
10. Congratulations! :)
What’s next?
I am going to provide a sample for creating a Netezza Aggregate (User Defined Aggregate - UDA) in the next article. So stay tuned!