Saturday, February 16, 2013

Easy Steps to Create a Bootable Debian Installer on a USB stick.

This instructions work on Ubuntu Linux. They should be trivial to implement on other Linux distributions:

  1. Acquire a hybrid Debian installation disk image. You can use a hybrid to both run Debian off the installation medium, or install it on the machine. Download it and save it in your machine. See this page for info on accessing Debian ISOs: http://www.debian.org/CD/
  2. Insert your USB stick into the machine. Ensure that you backup all data since it will be overwritten
  3. Execute the following command in the terminal:
    $ sudo fdisk -l
  4. Note the device file for the USB stick. If you cannot understand the output, perform the above command both before and after inserting the USB stick and note the appended file in the output. On my system it is
    $ /dev/sdb1
  5. Execute the following command:
    $ cat /path/to/debian-iso.iso > /dev/sXY
I had to execute $ su first in order to carry out the command as the root. Remember to do $ exit immediately after the command terminates. Replace /path/to/debian-iso.iso above with the actual path to the ISO you downloaded. My /dev/sXY was /dev/sdb.

Once you are done, you can restart your machine and set it to boot from the USB stick.

How Disk Encryption Works

If you hold sensitive data on your laptop, work or home computer, you may need to implement some sort of disk encryption to keep it secure. This may come in handy when you lose your laptop, or if some attacker makes away with your hard drive (a case of data theft, or some government agency..).

I'll attempt to give a high-level description of disk encryption.

The whole disk is divided into equal sized blocks. A random character string called a key is generated by the system, and is passed to an encryption function, together with the contents of each block of the disk, and the output is stored on the disk. This data therefore looks like some random gibberish without meaning.

Any person who accesses this storage device cannot derive the unconcealed form of the data.

When the block data needs to be decrypted, the stored data is passed to a decryption function, together with the key that was used in the encryption process, to derive the unconcealed version. The security of the encrypted data therefore depends on the secrecy of the key.

One way of protecting the key is to store it on an external storage device, such as a flash-drive, and this is inserted into the system whenever the owner wants to boot up the computer. Another technique is to store it on an unencrypted part of the hard drive, and protect it with a passphrase, which the owner enters at boot time to retrieve the key. In UNIX-like systems, this may be in the /boot partition.

In the latter case, the owner needs to select a strong passphrase.

Once the key is available to the system, any data that is loaded to the memory is decrypted on the fly, and any data being written to the disk is similarly encrypted. Thus, if the attacker gains access to the system while it is on, disk encryption may not help.

That is, hopefully, an understandable  high-level description of disk encryption. In real sense, the actual implementation is more complex. See the document here for details.

If you understand deeply disk encryption, feel free to correct any errors or clarify any ambiguities in the blog comments.

Sunday, February 10, 2013

Some Terms in Parallel Computing

SIMD (Single Instruction, Multiple Data) - A computer with multiple processors each of which performs the same operation on different data streams simultaneously.

MIMD (Multiple Instructions, Multiple Data) - Each processor in a multiprocessor system performs a different operation on a separate data stream simultaneously.

SPMD (Single program, Multiple Data) - A more restrictive form of MIMD where each of the different operations are of the same program.

See Flynn's taxonomy for more information on the above.

Communication Bandwidth - The maximum amount of data that can be transmitted in a unit of time.

Communication Latency - The amount of time from when a piece of data is sent, to when it is received by the target.

Message Passing - A model of interaction among processors in a multiprocessor system. A message is composed by instructions on one processor and sent to another processor through the interconnecting bus(es).

Shared Memory  - A model of interaction where the the separate processors can read and write on the same memory space, and therefore access each others data values. It could be physical where only one memory is available to all the processors, or logical, in the case where each processor has its own memory, and a request to access a non-local  memory address is converted to some form of inter-processor communication.

Aggregate Function - A model of interaction where a group of processors act together. An example is barrier synchronization, where each processor outputs a data value on reaching a barrier (a particular point in the computation process) and the communication hardware returns a value to each processor that is a function of all the values received from the processors.

SMP (Symmetric Multiprocessors) - A multiprocessor system with two or more identical processors and a single shared memory, under control of a single OS. It can be thought of as MIMD with shared memory.

Processor Affinity - The OS scheduler keeps a process on the same processor in a multiprocessor system to take advantage of locally cached data.

Shared Everything - All data structures are in shared memory.

Shared Something - Only a subset of the data structures (the ones that need to be shared) are in shared memory.

Atomicity - The concept of an uninterruptible and indivisible operation (sequence of instructions) on a data object.

Cache Coherence - maintaining identical caches of shared memory. A change on one caches should be propagated to other caches.

Mutual Exclusion - utmost one processor or process is updating a given shared object at a given time.

Gang Scheduling - Only related processes or threads are running simultaneously in a multiprocessor system at a given instance. This could be processes of one program, or situation where the input of one process depends on the output of another running at the same time.