My Tech Blog: May 2012

Thursday, 31 May 2012

The EXEC Family

For more info visitBefore getting into the family structure of EXEC, let's first understand, what is exec. exec is a command of unix which overlays the currently running process area with the details of a new process, i.e. as soon as we run a new process using exec , the process area used by the running process in the memory is replaced by the new process. A child process is not created when exec is used. We all know that unix shell provides an environment for running processes, but at the same time, this is also true that it itself is a process. Try running exec with any command or a script name from the shell prompt, you will notice that the shell shuts down abruptly. This is because the process area of the shell, which contains the details of currently running shell process, is replaced by a new process. The process doesn't return anything after successful execution of exec command, because the process itself is replaced by a new process in the memory. Normally, we use exec and its family in child processes, so that the parent process keeps on running smoothly. This was about exec command which we use directly in unix. Now we also have some library functions and a system call in exec family. Infact, exec command use these library functions and system call to get its work done.

EXEC FAMILY:

There are six members in exec family, out of which five are library functions and one is a system call. Ultimately, all the library functions use the system call for their needs. The five library functions are execl(char * pathname , const char *arg1.......const char *argn , (char *)0) execv(char *pathname , char *const arg[]) execlp(char *filename , const char *arg1.......const char *argn , (char *)0) execvp(char *filename , char *const arg[]) execle(char *pathname , const char *arg1.......const char *argn , (char *)0 , char * const env[]) execve(char *pathname , char *const arg[] , char * const env[]) Out of these six members, the last one, which is shown in bold, is a system call. We can use any of these members in our programs depending on our needs. There are some slight differences among the members. execl: The first member, which is execl, takes the full pathname of the script or command as its first argument, the arguments to be passed for that script or command as its subsequent arguments, and at last, it also accepts a null pointer which marks the end of arguments. The following program explains, how execl can be used. Suppose we have a simple script myscript.sh

#!/bin/ksh
echo "Hello World"

Now, we will write a simple c program to understand the usage of execl

#include
int main()
{
execl("/home/user1/buff/myscript.sh","myscript.sh",(char *)0);
exit(0);
}

Compile and run this program. Output will be Hello World. execv: There is only a minor difference between execl and execv. After seeing the following example, you will get to know on your own.

#include
int main()
{
char *arg[]={"myscript.sh",(char *)0};
execv("/home/user1/buff/myscript.sh",arg);
exit(0);
}

Did you notice the difference? In execv, we can pass the arguments for the script in an array, instead of passing them individually. execlp: It's same as execl with the only difference that we can pass the script name, instead of full pathname. The only thing is that, we will have to set the path of the directory in which the script to be run using exec is residing. execlp will automatically search the PATH environment for the full pathname of the script. Below is the example for execlp.

#include
int main()
{
execlp("myscript.sh","myscript.sh",(char *)0);
exit(0);
}

execvp: It it same as execv with the only difference that it takes the full pathname of the script directly from PATH environment variable. Only replace the individual arguments given in the execlp function by an array argument. I am not giving the example for this function, as it is very simple. execle: The only difference between this function and all other functions is that it takes an extra environment variable in the parameters passed to it. Normally, when a child spawns, all the environment variables from parent are passed to the child. Sometimes, we may not want to pass all the environment variables, or we may only want to pass child specific environment variables. For doing this, we use execle function. Let's see an example for this function.

#include
int main()
{
char * env[]={"PATH=/home/user1/buff"};
execle("/home/user1/buff/myscript.sh","myscript.sh",(char *)0,env);
exit(0);
}

Here we are going to do a slight change in myscript.sh to see the change due to env parameter. The changed script is given below.

#!/bin/ksh
echo $PATH
echo "Hello World"

The output will be /home/user1/buff Hello World The PATH variable for the child script contains only that value which we had passed in the env parameter. execve: This is the last and the most important member of the exec family. It's the only system call in the family. Every other member of the family uses this system call to get its work done. The syntax and working is same as execle, with the only difference that it takes arguments for the script or command to be run, in an array.

Saturday, 26 May 2012

printf() anomaly after fork()...

For more info visitActually the title of this post is wrong, printf() doesn't show any deviation from its normal behaviour. Everything is fine with printf(), but then, after seeing a couple of examples given below, you will definitely say that something is wrong with printf(), until I explain you the reason behind this behaviour. Now let's see a couple of examples. 1)

#include<stdio.h>
int main()
{
int pid;
printf("Hello World");
if((pid=fork())==0)
{
printf("\nHello, I am child\n");
}
else if(pid>0)
{
sleep(1);
printf("\nHello, I am Parent\n");
}
exit(0);
}

I wrote sleep(1) in the program because I just wanted the output of child process to come before parent process, otherwise, there is no special significance of sleep(1) in this program. Compile and run this program, and you will find an interesting output. The output of this program will be OUTPUT Hello World Hello, I am child Hello World Hello, I am Parent Did you notice the anomaly in the output? "Hello World" is printed twice on terminal despite of the fact that we have only one print statement printing "Hello World" in our program, and that also before fork(). So, where did this second "Hello World" came from? Now, let's see, what actually happened? We know that printf() is line buffered when it is connected to the terminal ,and fully buffered when connected to any other device. The first printf() statement in the program puts its output in the output buffer connected to STDOUT stream, instead of directly printing it on STDOUT. This is because it has not encountered '\n' yet. So, a hidden output buffer is in existence in the address space of process. Now comes the interesting part, when forking takes place, a new child process is born. We all know that during the forking, complete address space of parent process is copied to child's address space. This hidden buffer connected to STDOUT stream also gets copied to child's address space. After understanding this, let's go with the flow of the program. i) "Hello World" is in the output buffer in the parent's address space after the first printf is executed. It hasn't got flushed to STDOUT stream because, '\n' is not encountered yet. ii) Fork() copies this output buffer with data "Hello World" to the child's address space.Now child also contains an output buffer with data "Hello World" connected to the same stream, i.e STDOUT. iii) The parent process sleeps for 1 sec. Child process executes the printf() statement inside the if { } condition. First '\n' is encountered, and "Hello World" from the output buffer of child is flushed out to STDOUT. After that, "Hello ,I am child" is also flushed to STDOUT. iv) Parent process wakes up after 1 sec and after finding first '\n' , it flushes "Hello world" to STDOUT. After this, it again flushes "Hello, I am Parent" to STDOUT. In the next example, we are redirecting the STDOUT stream of both child and parent processes to a file. Let's do some slight changes in our program. 2)

#include<stdio.h>
int main()
{
int pid;
printf("Hello World\n");
if((pid=fork())==0)
{
printf("Hello, I am child\n");
}
else if(pid>0)
{
sleep(1);
printf("Hello, I am Parent\n");
}
exit(0);
}

Note the changes which we have made in our program. All the changes are made inside printf() statements. Run this program. The output will be OUTPUT Hello World Hello, I am child Hello, I am Parent which is absolutely normal, since I have already flushed my data before forking(). Now, run this program as follows ./a.out>temp We are redirecting the STDOUT stream of both the processes to a file, "temp". Open the file "temp "with cat temp. The content of the temp will be Hello World Hello, I am child Hello World Hello, I am Parent Again, there is a reason for this behaviour. When printf() is connected to a stream other than STDOUT then it is fully buffered. This means, even '\n' cannot flush the data. The data is flushed at the end, when both the processes terminate.

Saturday, 19 May 2012

Difference between Date, Timestamp, Timestamp with Time Zone and Timestamp with Local Time Zone

For more info, visitLet's create a table to understand the difference between these four. The table script will be

create table mytime(datetime1 DATE,datetime2 TIMESTAMP,datetime3 TIMESTAMP WITH TIME ZONE,datetime4 TIMESTAMP WITH LOCAL TIME ZONE);

We have created a table with four columns with datatypes of DATE, TIMESTAMP, TIMESTAMP WITH TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE. Now, let's understand what their definitions have to say about them. DATE: Can store a date and the time resolved to seconds. TIMESTAMP: DATE and TIMESTAMP are almost same, with the only difference that TIMESTAMP is able to resolve to a billionth of a second, i.e. it has 9 decimal places of precision for a second. TIMESTAMP WITH TIME ZONE: Contains all the features of a TIMESTAMP and additionally, stores the TIME ZONE information with it. TIMESTAMP WITH LOCAL TIME ZONE: This is one of the tricky datatype. Though I am writing its definition, but I am sure it won't be much clear until we see an example. The definition is Its name has TIME ZONE in it, it doesn't actually stores the time zone. Whenever we store a date time with time zone information in this field, it converts the date time with time zone information into database time zone information, and stores it. After this, whenever any session fetches the information from this field, it first converts the information into the local session time zone and presents it. We have already created a table mytime. Now let's insert values in it. Our insert script will be

insert into mytime values(to_date('20-may-2012 10:30:40','dd-Mon-yyyy hh:mi:ss'),TO_TIMESTAMP('20-may-2012 10:30:40.123456 AM','dd-Mon-yyyy hh:mi:ss.FF6 AM'),TO_TIMESTAMP_TZ('20-may-2012 10:30:40.123456 AM EST','dd-Mon-yyyy hh:mi:ss.FF6 AM TZR'),TO_TIMESTAMP_TZ('20-may-2012 10:30:40.123456 AM EST','dd-Mon-yyyy hh:mi:ss.FF6 AM TZR'));

We are inserting values in four columns as shown above. Now, alter your session and set your time zone to -7.00 as shown below. ALTER session SET time_zone='-7:00'; Why did we change the session time zone? We will get to know in more detail after seeing the result of the following select query. select * from mytime; And the output is DATETIME1 -------------------- DATETIME2 --------------------------------------------------------------------------- DATETIME3 --------------------------------------------------------------------------- DATETIME4 --------------------------------------------------------------------------- 20-may-2012 10:30:40 20-MAY-12 10.30.40.123456 AM 20-MAY-12 10.30.40.123456 AM EST 20-MAY-12 07.30.40.123456 AM DATETIME1 reports the date and time in simple format. DATETIME2 reports the date and time with extra precision in seconds. DATETIME3 reports the date and time with time zone. DATETIME4 is the most interesting. It calculates and reports the date and time according to the local session timezone. NOTE: By default DATETIME1 column will not display the time information. You will have to alter your session as follows. Alter session set NLS_DATE_FORMAT='dd-mon-yyyy hh:mi:ss';

Sunday, 13 May 2012

Hole in the file.

For more info visitIf you would ask me this question, I would say, this hole is the only one of its kind which is not visible to a human, unless they really know file handling. Otherwise, a hole in anything is always visible, doesn't matter how much small it is, it always has a physical existence. Hole in the file doesn't mean that your file is perforated , i.e you opened a file, read some data, and then damn, your file is perforated so you couldn't read some data, and then again you started reading after the hole. There is nothing like loss of data due to a hole in the file. There is also no need for the hole to occupy any disk block. Although this statement is not totally true. In some cases holes may occupy some disk space. At the end of this post, I will let you know the scenario in which holes may occupy some disk space. Now how can a file with a hole is created. Given below is a simple program in c which will generate a file with a hole.

#include<sys/stat.h>
#include<stdio.h>
#include<fcntl.h>
int main()
{
char databfhole[]="abcdefghij";
char dataafhole[]="klmnopqrst";
int fd;
fd=creat("filewithhole",S_IRUSR|S_IWUSR);
write(fd,databfhole,10);
lseek(fd,20000,SEEK_CUR);
write(fd,dataafhole,10);
close(fd);
return 0;
}

In the above code we are creating a file "filewithhole". S_IRUSR and S_IWUSR give read and write permission to the owner of file. After the file is created, file pointer will point at the beginning of the file. WRITE function writes a string of characters stored in the array databfhole. lseek is used to seek 20000 position past the current position of the file pointer. After that we again write some data, in array dataafhole, to the file Note: lseek doesn't write anything in the file, it just moves the position of current offset in the file table. Compile and run the program filewithhole is created. Now run the following command to see the contents of file cat filewithhole And the output will be abcdefghijklmnopqrst But, when we check the size of the file it is 20020 bytes. So, what are these other bytes, if not visible characters? Let's check it with the use of od command. Run od -c filewithhole 0000000 a b c d e f g h i j \0 \0 \0 \0 \0 \0 0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0047040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 k l m n o p 0047060 q r s t We can see that all other hidden characters are null characters or 0. Depending on the file system and data block size, these holes may or may not occupy any area in disk. How? Suppose my data block size is 2K, i.e. 2048 bytes. I have a file with hole which has a size of 8000 bytes. Now, let's say my first 1000 bytes are full of data, then there are holes after that, and then again I have last 1000 bytes of data. In this scenario my first data block will have 1000 bytes of data and 1048 bytes with value 0. So, these 1048 bytes hole would occupy some space in the disk. After this 2 data blocks of holes will be discarded i.e. they will not occupy any space on the disk, and then again starting from fouth data block i.e 6144 byte, holes will occupy some disk space till 6999. The last 1000 bytes will then be occupied by the data. So, we can see that only two data blocks were used to store this file.

Tuesday, 8 May 2012

Zombie and Orphan processes in Unix.

For more info visit

Zombie Process

A zombie process is the process which is actually dead, but is actually haunting the process structure table. We know that whenever a process runs in Unix, its entry is made in the process structure table which keeps track of a lot of things, for example number of files that the process is using, etc. Whenever a child process dies, its parent process recieves the signal SIGCHLD. Suppose a parent spawned the child process and forgot to handle the SIGCHLD signal or ignored the signal (which is the default scenario when signal is not handled), then the child process becomes the Zombie process. Whenever a parent waits for its child process and reads its exit status, the child process is removed from the process structure table before the parent. Zombie processes are denoted by Z in the status column when we run the command ps -le. Note: Suppose the child has become a zombie, it remains zombie only for the time interval for which its parent is alive. Once the parent of a zombie process is dead, it is adopted by the init process of the unix which then acts as the father of the process. Init continously runs wait system call to remove zombies from the system. Zombie processes are a big problem when their parent processes run for a long time. In this case Zombies will persist in the process structure table for a lot of time. Let's see an example which creates a Zombie process. We will write a program in C language to understand the zombie process.

#include<stdio.h>
int main()
{
int pid;
pid=fork();
if(pid>0)
{
sleep(30);
printf("parent process id %d",getpid());
printf("in parent process");
}
else if(pid==0)
{
printf("child process id %d",getpid());
printf("in child process");
}
return 1;
}

Now we will understand, what the above code is doing. With the use of fork() system call we are creating a new child process. Fork() actually returns twice, i.e. from now onwards two process are created, so it returns 0 to the child process and pid of child process to the parent process. Compile this program and run it in the background because we want the shell to be interactive to check the creation of zombie process. While this program runs, it will run the parent process and put it to sleep for 30 sec, which was necessary for our analysis. At the same time pid of child process will be printed and child process will die after completion. Immediately run the command to see the dead child zombie. ps -le|grep 'Z' This command will search for Z in the list of processes running. As we already know that zombie process remains in the process structure table unless its parent is dead. So, we will be able to find its details through ps command till 30 seconds. If everything worked right, you will notice the process detail with status as Z and pid as the pid of the child process.

Orphan Process

Orphan process is the process for which parent is dead. Suppose a program forks a new child process and while the child is performing some work, its parent dies. This child process then becomes orphan. Orphan process doesn't remain orphan for a long time because after its parent death , the child process is adopted by the init process which we already know is the parent of all the processes. When the child process is adopted by the init process then its ppid value becomes 1, which is the pid for init process.

#include<stdio.h>
int main()
{
int pid;
pid=fork();
if(pid>0)
{
printf("parent process id %d",getpid());
printf("in parent process");
}
else if(pid==0)
{
printf("child process id %d",getpid());
sleep(30);
printf("in child process");
}
return 1;
}

Note the pid of the child process which is printed by the program. Run ps -el command and search for pid as pid of the child process. See the ppid(parent process id) of this process and you will notice that ppid of this process is 1. This is because the orphan child process has now been adopted by the init process

Friday, 4 May 2012

File descriptor manipulation in Unix

For more info visitTo understand File Descriptor manipulation we will have to know, what are file descriptors? Whenever we run a process, Unix automatically associates some indicators with the process for accessing input and output devices. The default indicators which unix associates with any process are 0 - Input from user or standard input. 1- Output to user or standard output. 2- Error output which shares the same stream as standard output i.e , standard output and error output share the terminal. When we login into the terminal , unix kernel creates a shell for us which is a process. 0,1,2 are also associated with the shell.We can also redirect these file descriptors to some other source for example echo John > myfile It actually means cat John 1> myfile This simply means, redirect the output, which was actually meant for standard output, to the file "myfile". Now the output will go to the file "myfile", instead of coming on screen. We all know that in unix, everything is treated as a file. Standard input/output terminal is not an exception to this, /dev/tty is the file associated with every terminal which is connected to Unix system. Remember, there can be one or more terminal connected to a Unix system with different user working on each system. So whenever anyone tries to use /dev/tty, it corresponds to their own terminal. I already told you that error stream also shares the same output device as output stream, i.e terminal .We can also redirect error stream to some file, instead of showing it on terminal. cat zombie 2>myfile I don't have any file "zombie" in my system, but instead of showing error on terminal , I am redirecting error to the file "myfile". We can also manipulate file descriptors using exec command. For example, if we run the following command on the shell. exec 1>myfile Now, whenever we try to print something on terminal, it will go to the file. This is because we have redirected output stream of the process "shell" to myfile. We won't be able to see the contents of myfile, since it is being used by the process "shell" to store its output. We can see the contents of "myfile" once we redirect the shell's output stream back to "/dev/tty", using exec 1>/dev/tty. We can also create our own file descriptors and associate them with some file. Those file descriptors can have numbers like 3,4,5..... .Whenever we create our own file descriptor the shell opens file, either in reading or writing mode depending on the file descriptor made. This happens in the background, and user never gets to know about this. So, creation of every file descriptor leads to opening of a file, or more precisely opening of any file by a process leads to creation of a file descriptor. Generally, kernel maintains these file descriptors to keep track of access patterns for a file. For example. exec 3>myfile Output file desciptor 3 is associated with "myfile". We can write commands like the following to send output to "myfile". echo hello >&3 Above command will put output "hello" in "myfile" instead of displaying it on screen. We can also create an input file descriptor for a file. exec 4<myfile After this, the process can take input from this file. cat <&4 We can also create a file descriptor based on some other file descriptor. For example, 4 is the input file descriptor for "myfile".We can associate some other input file descriptor, using file descriptor 4, with "myfile". exec 5<&4 Now, we can read input from myfile using any file descriptor(4 or 5). In the same way we can do it for output file descriptor. Note: i) Whenever we create a file desciptor using another descriptor then both should be either output file descriptor or input file descriptor. ii) Whenever we close one file descriptor for a file, then it doesn't close other file descriptor which was made using the previous file descriptor. In the example if we close file descriptor 5, it won't close file descriptor 4. iii) A file descriptor made with the help of other file descriptor shares the same pointer to the file in the file table.

Closing of File Descriptor

We can close the file descriptor using following command. exec 4>&- Note: We always close the file descriptor using ">", irrespective of the fact that it's an input file descriptor or an output file descriptor.