Sunday, December 9, 2012

Object-Oriented Programming in C

As a follow up to my previous post where I demonstrated the how to create pointers to functions in C, in this post we'll look at how to implement simple object-oriented programming in C.

The basic idea in object-oriented programming is encapsulating data and operations on the data into a single structure. Since we can create pointers to functions, we can therefore create structures that contain both data variables and functions that manipulate these data variables. We shall also see a basic form of inheritance.

In the following example, we'll create a base Animal class, and from it derive a Human class and Duck class. Take the terms class, method and attribute as used below with a pinch of salt. Object-oriented programming is implemented in different ways in the major programming languages in use.

We first create the Animal class:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct {
  char *description;
  void (*move)(void *self, int steps);
  void (*make_sound)(void *self);
  void (*destroy)(void *self);
  int (*init)(void *self, const char *description);
} Animal;

void Animal_move(void *self, int steps)
{
  printf("Moves %d steps\n", steps);
}


void Animal_sound(void *self)
{
  printf("Makes a sound\n");
}

void Animal_destroy(void *self)
{
  Animal *this = self;
  if (this) {
    if (this->description) free(this->description);
    free(this); 
  }
}

int Animal_init(void *self, const char *description)
{
  Animal *this = self;
  this->description = strdup(description);
  return 1;
}

void *Animal_new(Animal proto, size_t size, const char *description)
{                                                                                                                 
  if (!proto.move) proto.move = Animal_move;
  if (!proto.make_sound) proto.make_sound = Animal_sound;
  if (!proto.destroy) proto.destroy = Animal_destroy;
  if (!proto.init) proto.init = Animal_init;

  Animal *animal = calloc(1, size);
  *animal = proto; 

  int initval = animal->init(animal, description);
  if (initval)
    return animal;
  else
    return NULL;
}

#define NEW(N, d) Animal_new(N##Proto, sizeof(N), d)

Lines 5-11: Animal class, with an attribute description and methods move, make_sound, destroy and init.

Lines 13-38: Function definitions for the above Animal methods.

Lines 40-55: The object creation function. Takes in an Animal prototype, assigns methods and allocates heap memory for the object.

Line 47: The memory space allocated depends on the parameter size passed to the function. This is useful if the object is a sub-class of Animal with a bigger size. More on this later.

Line 48: Copy the contents of the prototype into the allocated space.

Line 57: Macro definition to simplify object creation syntax.

We then look at the implementation of the Human class:

typedef struct {
  Animal proto;
  char* species;
} Human;

void Human_move(void *self, int steps)
{
  printf("walkes %d strides\n", steps);
}

void Human_sound(void *self)
{
  printf("Talks\n");
}

int Human_init(void *self, const char *species)
{
  Human *h = self;
  h->species = strdup(species);
  return 1;
}

Animal HumanProto = {
  .init = Human_init,
  .make_sound = Human_sound,
  .move = Human_move 
};


Lines 1-4: Human class with Animal attributes and methods (proto), and an extra attribute species.

Lines 6-25: Function definitions for Human methods move, sound and init to override the base class definitions in Animal.

Lines 23-27: Animal prototype for Human object, with overriding methods assigned. This will be passed to the object creation function.

The same implementation for the Duck class:

typedef struct {
  Animal proto;
  char *species;
} Duck;

void Duck_move(void *self, int steps)
{
  printf("wobbles %d steps\n", steps);
}

void Duck_sound(void *self)
{
  printf("Quacks");
}

int Duck_init(void *self, const char *species)
{
  Duck *d = self;
  d->species = strdup(species);
  return 1;
}

Animal DuckProto = {
  .move = Duck_move,
  .make_sound = Duck_sound,
  .init = Duck_init 
};

And finally, the main function, which creates an object for the each the three classes:

int main()
{
  Animal AnimalProto = {};
  Animal *animal = NEW(Animal, "An animal");
  Human *human = NEW(Human, "Homo sapien");
  Duck *duck = NEW(Duck, "Anas rubripes");
  printf("%s ", animal->description);
  animal->move(animal, 5);
  animal->make_sound(animal);
  printf("%s ", human->species);
  human->proto.move(human, 5);
  human->proto.make_sound(human);
  printf("%s ", duck->species);
  duck->proto.move(duck, 5);
  duck->proto.make_sound(duck);
  return 0;
}

We shall walk through creation of the Human object to get the gist of the program.

We first create an Animal prototype HumanProto  for the Human object:

Animal HumanProto = {
  .init = Human_init,
  .make_sound = Human_sound,
  .move = Human_move 
};

The prototype is initialized with the Human init, sound and move functions.

The invocation of the NEW macro

NEW(Human, "Homo sapien");

is transformed to:

Animal_new(HumanProto, sizeof(Human), "Homo sapien");

The Animal_new function is called with HumanProto which carries with it the overriding functions. However, the size passed to it is that of type Human which is bigger than that of HumanProto (type Animal).

When execution gets to the statement

Animal *animal = calloc(1, size);

in the Animal_new function, the calloc function returns a pointer to a memory space of the size of type Human.

*animal = proto;

copies the contents of the prototype of HumanProto to the allocated memory space. Since the size of allocated space is larger than the size of the prototype, a space that exactly fits the attribute species (a pointer) of the Human class  remains.

The statement:

animal->init(animal, "Homo sapien");

then initializes the Human object's species attribute with the string "Homo sapien".

Sunday, November 25, 2012

Creating pointers to functions in C

If you have used a programming language that has first-class functions, that is, supports assigning functions to variables and passing them to other functions, like Python or JavaScript, you may wonder why such flexibility does not exist in C.

It is actually possible, and easy, to implement this in C. In this post, we shall go through
the process of creating pointers to functions.

The first thing to do is to note the function signature, i.e. the function's arguments and return type, who's pointer we want to create. In this example, we want to create a pointer to a function that accepts two integers and returns an integer.

int fptr(int x, int y);

The construct begins like a declaration of the actual function. The next step is to wrap the function name with a pointer syntax.

int (*fptr)(int x, int y);

At this point, fptr can act like a pointer to a function, and will accept a function assigned to it. We probably want to create a type so that we can create several pointers to functions. We therefore prepend typedef to the declaration.

typedef int (*fptr)(int x, int y);

fptr will now act like a type for pointers to functions that accept two integers and return an integer. Declaring a pointer to such a function is now a straight forward affair.

fptr f1, f2;

The following program listing demonstrates how pointers to two different functions with the same  function signature can be created, and how they are invoked within a program.

#include <stdio.h>

typedef int (*fptr)(int a, int b); /*pointer type to a function*/

int add(int x, int y)
{
  return x + y;
}

int sub(int x, int y)
{
  return x - y;
}

int main()
{
  fptr op1, op2; /*create two pointers and assign functions to them*/
  op1 = add;  
  op2 = sub;

  /*invoke the functions*/
  printf("op1 on %d and %d returns %d\n", 6, 5, op1(6, 5)); 
  printf("op2 on %d and %d returns %d\n", 6, 5, op2(6, 5));
  return 0;
}

Stack vs Heap in Memory Management

There are two types of memory available to a program running in a computer, the stack memory and heap memory. The operating system allocates a fixed amount of stack memory to a running program, and the heap is extra memory that may be utilized by the program if required. This is especially important to know when using a low-level language like C.

The stack is a Last In First Out (LIFO) data structure. Items can only be inserted and removed from one end. The insert action is a push, and the remove action is a pop. To access an item, all other items on top of it have to be popped from the stack.

To examine this, let's consider the following C program.

#include <stdio.h>

int add(int a, int b)
{
  return a+b;
}

int main()
{
  int x = 1;
  int y = 2;
  int sum = add(x, y);
  printf("The sum of %d and %d is %d\n", x, y, sum);
  return 0;
}

The program above calls a function add to find the sum of two integers. The variables in the main function, that is x and y, are pushed onto the stack. When the add function is called, the address of the instruction after the function call is pushed onto the stack, and the execution jumps to the first instruction in add. Variables local to the add function, a and b are also pushed onto the stack.

When the function exits, all variables local to it are popped from the stack, and are replaced by its return value, which is then assigned to the variable sum. The execution is able to continue from where it left off in the main function, since the address of the instruction to execute which had been pushed onto the stack earlier is popped off and execution continues from this address. Here, we've only discussed a high level description of what happens, actual mechanics involved will be illustrated in a later post.

Since the stack size allocated to the program is of a fixed size, there is a limit to the size of variables, and the number of nested function calls that can occur in a program. When the stack becomes full, a condition known as stack overflow occurs and the program crashes. This may happen if one allocates a very large array, or implements a function that recursively calls itself too many times (deep recursion). The case for  deep recursion is illustrated by the following C program.

#include <stdio.h>
#define MAX 1000000000 /*maximum number of times to recurse*/

int recurse(int c)
{
  printf("level %d, ", c);
  if (c < MAX)
    recurse(++c);
}

int main()
{
  recurse(1);
  return 0;
}

By experimenting with the constant MAX, one can control the number of times that the function recurse calls itself.

When there's need to create a large variable, memory from the heap can be used. Stack memory management is automatically handled by the operating system, but memory from the heap has to be manually allocated and freed by the programmer. In C, the functions malloc and free from header file  stdlib.h are used for this, as the following snippet demonstrates.

#include <stdlib.h>
...
int *ptr = (int*)malloc(sizeof(int) * 1000000); /*allocate memory for 1000000 integers*/
...
free(ptr); /*free the previously allocated memory*/
...

The programmer, however, has to be careful to free all memory manually allocated by the heap to make it available to other programs. Otherwise a memory leak occurs, where available memory in the computer decreases to an insufficient level.

The deep recursion problem can be solved by devising an iterative program, i.e. using a loop construct, instead of calling a function recursively, for some repeated computation. The technique, known as recursion removal, will be discussed in a later post.

Sunday, November 18, 2012

On proxy servers, IP masquerading and Network Address Translation


IP Masquerading

The network uses one public IP to access the Internet. Source IP addresses in the Internet requests from internal hosts are converted to the public IP address on the MASQ server so that the requests would appear to originate from one host. The internal hosts are configured to use the MASQ server as their Internet gateway.

Proxy Servers

Internal hosts route all their Internet traffic via the proxy server. The proxy server receives requests from the the internal hosts and re-initiates them as requests from the server itself. Destination addresses in the replies are reconverted back to the originating internal host. All applications in the internal network must support proxy services and be configured with the proxy settings. In addition, proxy servers may also support caching of web pages, reducing Internet bandwidth consumed and improving speeds from the client's point of view.

On many networks, MASQ and proxy services are provided on the same servers.

Network Address Translation

The NAT server contains a pool of public IP addresses provided by the Internet Service Provider. Whenever an Internet request is received from an internal host, the internal private IP is associated with an unused public IP from the pool, via which responses are received. When an associated IP address remains unused for a predetermined amount of time, it is returned to the pool of unused public addresses.

See: http://tldp.org/HOWTO/IP-Masquerade-HOWTO/what-is-masq.html

isinstance() and type() functions in Python

One of python's great strengths is introspection, the ability to examine an object's or module's properties at run-time. In this post we'll touch on two inbuilt functions that are used for the purpose.

The type() function returns the type of the object passed to it:

type("a string") # <type 'str'>
type(['a', 'list']) # <type 'list'>

isinstance() accepts as its parameters an object and class, and returns True if the object is an instance of the class:

isinstance("a string", str)  # True
isinstance("a string", list) # False

If you're ever need to choose between the two, isinstance() is the better one because it factors in inheritance. Say, for example, you subclass the string type:

class MyString(str):
    pass

mystring = MyString()

The types module defines names for all the inbuilt types. We'll import it and use it for comparison.

import types

type("a string") == types.StringType # True
type(mystring) == types.StringType # False

mystring belongs to a subclass of the inbuilt string type (str) and in many cases you'll want treat it as a string. isinstance() will allow you to determine if you can treat it like a string without having to know its exact type.

isinstance("a string", str) # True
isinstance(mystring, str) # True

in_lowercase = mystring.lower()
...