Apr 2007
Options parsing can be difficult at times to say the least. There exist a number of common methods and libraries to assist with options parsing. In this text, a look at writing option and argument parsing homespun and with a little help.
shSimple parsing is easy in the shell:
while [ "$#" -gt "0" ]
do
case $1 in
-F)
F_FLAG=1
;;
-f)
shift
FILE_ARGUMENET=$1
;;
-u)
Usage
exit 0
;;
*)
echo "Syntax Error"
Usage
exit 1
;;
esac
shift
done
Above, the input string is iterated over and particular options
act or assign a variable. The posix getopt capability
allows for built in - parsing:
while getopts ":f:Fu" opt; do
case $opt in
F) F_FLAG=1;;
f) FILE_ARGUMENT=$OPTARG;;
u) usage;;
*) usage
exit 1
;;
esac
shift
done
A colon after an option indicates it requires an argument. The
getopt code is far more compact than the first
example. What if the script requires long options? One approach is
simply to hard code long options:
while [ "$#" -gt "0" ]
do
case $1 in
-F|--setflag)
F_FLAG=1
;;
-f|--file)
shift
FILE_ARGUMENET=$1
;;
-u|--usage)
Usage
exit 0
;;
*)
echo "Syntax Error"
Usage
exit 1
;;
esac
shift
done
Setting up long options appears to be simple, however, it can
quickly get out of control using the method show above. Instead,
writing code to handle long options that can either be sourced in
or easily dropped into scripts makes far more sense. Grigoriy
Strokin has a good script that can either be copied in or sourced
and can be found on his website. Following is
the same code from above using getoptex:
. getoptx.sh
while getoptex "F; f; u. setflag file usage." "$@"; do
F) F_FLAG=1;;
f) FILE_ARGUMENT=$OPTARG;;
u) usage;;
*) usage
exit 1
;;
done
It is pretty obvious that the single character is mapped to the
the long option past the first . and the full
terminator is the second dot. Of course, there is an even easier
method as long as a few rules are observed:
while [ "$#" -gt "0" ]
do
opt="${1//-}"
opt=$(echo "${opt}" | cut -c 1 2>/dev/null)
case $opt in
F) F_FLAG=1;;
f) shift;FILE_ARGUMENT=$1;;
u) usage;;
*) usage; exit 1;;
esac
shift
done
The problem with the last method is the long options are not
hard-coded, the first character of the alpha string is cut and used
as an option. In other words, --help and
--heck will do the same thing. The idea is harmless
except no options can be mixed and matched. Generally speaking, not
having a --help and --heck valid in the
same script or program should be avoided if possible.
With no case statement built in, doing options
parsing in Perl can be a little
tricky. Using the same example from the shell code above a simple
options parser might look like: [ 1 ]
while ( my $arg = shift @ARGV ) {
if ( $arg eq '-F' ) {
$F_FLAG = 1;
} elsif ( $arg eq '-f' ) {
$FILE_ARGUMENT = shift @ARGV;
} elsif ( $arg eq '-u' ) {
usage();
} else {
usage();
exit 1;
}
}
Relative to the shell, Perl seems a bit heavy handed in the amount of work needed. In Perl the options for handling are almost limitless. Associative arrays, hashes, arrays or just plain scalars arranged a certain way could be used.
Of course, another great thing about Perl is how simplistic string operations are handled. Using a method similar to the last shell method above can simplify the code a great deal:
for (my $argc = 0; $argc <= @ARGV; $argc++) {
$opt = $ARGV[$argc];
$opt =~ s/--//; # Get rid of 2 dashes
$opt =~ s/-//; # Get rid of 1 dash
$opt = substr($opt,0,1); # cut the first char
if ($opt eq 'F') {
$F_FLAG=1;
} elsif ($opt eq 'f') {
$FILE_ARGUMENT=$ARGV[++$argc];
} elsif ($opt eq 'u') {
usage();
} else {
usage();
exit 1;
}
}
Of course, the same two problems from the shell-code which cuts out the first alphanumeric exists; no two long options can start with the same letter and there is no verification of long options. Not unlike the shell, a simple list can be used to verify that long options are valid, following is an example sub routine:
...
my @valid_optlongs=("setflag", "file", "usage");
my @valid_optshort=("F", "f", "u");
...
sub parseopt{
my ($opt) = shift;
$opt =~ s/--//; # Get rid of 2 dashes
$opt =~ s/-//; # Get rid of 1 dash
if (scalar($opt) > 1) {
for ($i = 0; $i < @valid_optlongs; $i++) {
if ($opt eq $valid_optlongs[$i]) {
return $valid_optshort[$i];
}
}
} else {
return $opt;
}
}
Essentially instead of just trimming out the first valid alphanumeric, if the option is a long option check it against the list of valid long options and return the matching single byte option the long option correlates to.
Ultimately, using the getopt module should be done
if it is available, why reinvent the wheel? Here is an example of
using the Getopt module:
use Getopt::Std;
...
getopt ('f:uF');
die "Usage: $0 [ -f filename -u ]\n"
unless ( $opt_f or $opt_u );
if ($opt_f) {
my $filename = shift @ARGV;
} elsif ($opt_u) {
usage();
exit 0;
}
Definitely shorter and compact.
The oldest high level programming language - not unlike Perl - has many different approaches a programmer can take without using libraries:
int
main(argc, argv)
int argc;
char *argv[];
{
if (argc < 2) {
printf("usage: %s number-of-execs sbrk-size job-name\n",
argv[0]);
exit(1);
}
....
int main (argc, argv) {
for (c = 0; c <=argc; c++) {
if (argc[c] == 'F') {
F_FLAG=1
...
libc offers up two levels of built in options
handling, one for single options and one for long options. Since
the options handling routines are in modern implementations, the
examples will use GNU's version.
...
#include <getopt.h>
...
int main (int argc, char **argv)
{
int c;
char * file;
while ((c = getopt(argc, argv, "F:f:u:")) != -1) {
switch (c) {
case 'F':
F_FLAG=1
break;
case 'f':
file = optarg;
break;
case 'u':
usage();
return 0;
break;
default:
usage();
return 1;
break;
}
}
Far more succinct than what may have happened using the previous C examples which would have been pretty spaghetti'd. Long options are even more interesting. The GNU C library internally handles assignment of long options by using the single alpha as the key inside of a data structure:
...
#include <getopt.>
...
int main(int argc, char **argv)
while (1)
{
static struct option long_options[] =
{
{"setflag", no_argument, 0, 'F' },
{"file", required_argument, 0, 'f' },
{"usage", no_argument, 0, 'u' },
{0,0,0,0} /* This is a filler for -1 */
};
int option_index = 0;
c = getopt_long (argc, argv, "F:f:u:", long_options, &option_index);
if (c == -1) break;
switch (c) {
case 'F':
F_FLAG=1;
break;
case 'f':
file = optarg;
break;
case 'u':
usage();
return 0;
break;
default:
usage();
return 1;
break;
}
}
Short, sweet and to the point.
Sometimes parsing can be extremely simple, adding long options and flag setting to the mix can be daunting when writing from the ground up, luckily libraries and modules exist to help along the way.