Sunday, November 25, 2012

Creating pointers to functions in C

If you have used a programming language that has first-class functions, that is, supports assigning functions to variables and passing them to other functions, like Python or JavaScript, you may wonder why such flexibility does not exist in C.

It is actually possible, and easy, to implement this in C. In this post, we shall go through
the process of creating pointers to functions.

The first thing to do is to note the function signature, i.e. the function's arguments and return type, who's pointer we want to create. In this example, we want to create a pointer to a function that accepts two integers and returns an integer.

int fptr(int x, int y);

The construct begins like a declaration of the actual function. The next step is to wrap the function name with a pointer syntax.

int (*fptr)(int x, int y);

At this point, fptr can act like a pointer to a function, and will accept a function assigned to it. We probably want to create a type so that we can create several pointers to functions. We therefore prepend typedef to the declaration.

typedef int (*fptr)(int x, int y);

fptr will now act like a type for pointers to functions that accept two integers and return an integer. Declaring a pointer to such a function is now a straight forward affair.

fptr f1, f2;

The following program listing demonstrates how pointers to two different functions with the same  function signature can be created, and how they are invoked within a program.

#include <stdio.h>

typedef int (*fptr)(int a, int b); /*pointer type to a function*/

int add(int x, int y)
{
  return x + y;
}

int sub(int x, int y)
{
  return x - y;
}

int main()
{
  fptr op1, op2; /*create two pointers and assign functions to them*/
  op1 = add;  
  op2 = sub;

  /*invoke the functions*/
  printf("op1 on %d and %d returns %d\n", 6, 5, op1(6, 5)); 
  printf("op2 on %d and %d returns %d\n", 6, 5, op2(6, 5));
  return 0;
}

Stack vs Heap in Memory Management

There are two types of memory available to a program running in a computer, the stack memory and heap memory. The operating system allocates a fixed amount of stack memory to a running program, and the heap is extra memory that may be utilized by the program if required. This is especially important to know when using a low-level language like C.

The stack is a Last In First Out (LIFO) data structure. Items can only be inserted and removed from one end. The insert action is a push, and the remove action is a pop. To access an item, all other items on top of it have to be popped from the stack.

To examine this, let's consider the following C program.

#include <stdio.h>

int add(int a, int b)
{
  return a+b;
}

int main()
{
  int x = 1;
  int y = 2;
  int sum = add(x, y);
  printf("The sum of %d and %d is %d\n", x, y, sum);
  return 0;
}

The program above calls a function add to find the sum of two integers. The variables in the main function, that is x and y, are pushed onto the stack. When the add function is called, the address of the instruction after the function call is pushed onto the stack, and the execution jumps to the first instruction in add. Variables local to the add function, a and b are also pushed onto the stack.

When the function exits, all variables local to it are popped from the stack, and are replaced by its return value, which is then assigned to the variable sum. The execution is able to continue from where it left off in the main function, since the address of the instruction to execute which had been pushed onto the stack earlier is popped off and execution continues from this address. Here, we've only discussed a high level description of what happens, actual mechanics involved will be illustrated in a later post.

Since the stack size allocated to the program is of a fixed size, there is a limit to the size of variables, and the number of nested function calls that can occur in a program. When the stack becomes full, a condition known as stack overflow occurs and the program crashes. This may happen if one allocates a very large array, or implements a function that recursively calls itself too many times (deep recursion). The case for  deep recursion is illustrated by the following C program.

#include <stdio.h>
#define MAX 1000000000 /*maximum number of times to recurse*/

int recurse(int c)
{
  printf("level %d, ", c);
  if (c < MAX)
    recurse(++c);
}

int main()
{
  recurse(1);
  return 0;
}

By experimenting with the constant MAX, one can control the number of times that the function recurse calls itself.

When there's need to create a large variable, memory from the heap can be used. Stack memory management is automatically handled by the operating system, but memory from the heap has to be manually allocated and freed by the programmer. In C, the functions malloc and free from header file  stdlib.h are used for this, as the following snippet demonstrates.

#include <stdlib.h>
...
int *ptr = (int*)malloc(sizeof(int) * 1000000); /*allocate memory for 1000000 integers*/
...
free(ptr); /*free the previously allocated memory*/
...

The programmer, however, has to be careful to free all memory manually allocated by the heap to make it available to other programs. Otherwise a memory leak occurs, where available memory in the computer decreases to an insufficient level.

The deep recursion problem can be solved by devising an iterative program, i.e. using a loop construct, instead of calling a function recursively, for some repeated computation. The technique, known as recursion removal, will be discussed in a later post.

Sunday, November 18, 2012

On proxy servers, IP masquerading and Network Address Translation


IP Masquerading

The network uses one public IP to access the Internet. Source IP addresses in the Internet requests from internal hosts are converted to the public IP address on the MASQ server so that the requests would appear to originate from one host. The internal hosts are configured to use the MASQ server as their Internet gateway.

Proxy Servers

Internal hosts route all their Internet traffic via the proxy server. The proxy server receives requests from the the internal hosts and re-initiates them as requests from the server itself. Destination addresses in the replies are reconverted back to the originating internal host. All applications in the internal network must support proxy services and be configured with the proxy settings. In addition, proxy servers may also support caching of web pages, reducing Internet bandwidth consumed and improving speeds from the client's point of view.

On many networks, MASQ and proxy services are provided on the same servers.

Network Address Translation

The NAT server contains a pool of public IP addresses provided by the Internet Service Provider. Whenever an Internet request is received from an internal host, the internal private IP is associated with an unused public IP from the pool, via which responses are received. When an associated IP address remains unused for a predetermined amount of time, it is returned to the pool of unused public addresses.

See: http://tldp.org/HOWTO/IP-Masquerade-HOWTO/what-is-masq.html

isinstance() and type() functions in Python

One of python's great strengths is introspection, the ability to examine an object's or module's properties at run-time. In this post we'll touch on two inbuilt functions that are used for the purpose.

The type() function returns the type of the object passed to it:

type("a string") # <type 'str'>
type(['a', 'list']) # <type 'list'>

isinstance() accepts as its parameters an object and class, and returns True if the object is an instance of the class:

isinstance("a string", str)  # True
isinstance("a string", list) # False

If you're ever need to choose between the two, isinstance() is the better one because it factors in inheritance. Say, for example, you subclass the string type:

class MyString(str):
    pass

mystring = MyString()

The types module defines names for all the inbuilt types. We'll import it and use it for comparison.

import types

type("a string") == types.StringType # True
type(mystring) == types.StringType # False

mystring belongs to a subclass of the inbuilt string type (str) and in many cases you'll want treat it as a string. isinstance() will allow you to determine if you can treat it like a string without having to know its exact type.

isinstance("a string", str) # True
isinstance(mystring, str) # True

in_lowercase = mystring.lower()
...