WebUtil
User manual
Copyright © 1998,2000 by Harms
Software Engineering
All rights reserved.
Table of
contents
1.
Introduction
- 1.1.
Requirements
2. WebUtil
3.
Basic skills
4.
Installation
5.
Configuration
- 5.1.
The [MAIN] block
5.2. The [HTTP] block
5.3.
The [FTP] block
- 5.3.1. Advanced settings
5.4. Conditionally executing blocks
6.
Error handling
- 6.1. Web sites and web pages [HTTP]
- 6.2.
FTP sites [FTP]
- 6.3.
Errorlevels
7.
Registration
8.
Abbreviations
1.
Introduction
The Internet has changed the way
that we use computers, in fact, the Internet has changed the way
we live. By making information of all kinds available to anyone
anywhere in the world, it is safe to say that we have even become
dependent on the Internet. The Internet has also caused us to
change. We have become more demanding, for example. Since we can
get most anything, in terms in terms of information, we want to
get is fast as possible. Search engines such as Lycos and Alta
Vista, have become a commercial success as a result of our desire
for information.
The Internet also has a down side.
It is slow. Millions of people are experimenting with the
Internet and many of those people it will become an important
part of their lives. The large number of people using Internet
makes it a success, but at the same time, slows the Internet
down, making it sometimes frustrating to use. Just as Search
engines were an answer to finding information quickly, a new
concept called Web spiders was an answer to retrieving
information from the Internet automatically so that you do not
have to do it manually, thereby speeding up the use of the
information.
A Web spider is a tool that
collects the information that you are interested in, from the
Internet, at a time when it is most convenient for you. For
example, a Web spider can pick up the news from an international
news agency in the middle of the night, while you are sleeping,
so that you can read the news when you wake up in the morning.
Stock prices, weather forcasts, new software releases, you name
it, a Web spider can get it for you, at a convenient time, and
without manual intervention!
Just as the Internet has grown
large and slow, most Web spiders are not cut out to be used
without manual intervention. They are large, have thousands of
features, and require a college degree to understand what
everything means. If you are not familiar with the technical
details of the Internet, you probably will not understand most of
the options that they offer. The whole purpose of a Web
spider is to make your life easier. It must be easy to
understand, easy to install, easy to configure, and easy to use!
WebUtil is an answer to the large
and slow Web spiders currently on the market. WebUtil has been
designed to be easy to install, easy to configure, and very easy
to use. It does not have many options, but it does have a lot of
features! The power of WebUtil lies in the fact that it offers
the functionality that you would expect without bothering you
with the technical details of the Internet that you probably do
not care to know about.
(back)
1.1
Requirements
WebUtil
runs under OS/2 and a 32 bit Windows environment (NT, 95, 98) and
requires that you have TCP/IP support installed. We also assume
that you have a internet browser available, such as WebExplorer
or NetScape. WebUtil will only run when you have a connection
with the Internet. It does not matter whether this connection is
via a dial up line or via a network.
WebUtil
only requires a few kilobytes of harddrive space and it also does
not require very much system memory or CPU time. In short, it is
a very small and efficient program.
(back)
2. WebUtil
WebUtil, like
other Web spiders, collects files from the Internet. The concept
behind WebUtil is that it collects files, which can be web pages
or files from FTP sites, and you subsequently use your web
browser to view those pages. WebUtil will automatically change
all of the links in the pages that you instruct it to pick up, so
that they point to the other pages that WebUtil has picked up.
For example, if
you instruct WebUtil to pick up an entire web site, then all of
the links between the pages of the web site are translated in
such a way that you can view the entire site on your local
harddrive without having a connection to the Internet!
If you instruct
WebUtil to only pickup a limited number of pages, then only the
links to the pages downloaded will be translated. Later, when you
view those pages with your favorite internet browser and you
select a link that was not translated, your browser will
automatically try to retrieve that page from the internet,
assuming you have online connection at that point. This means
that, assuming you are connected to the Internet when viewing
your downloaded pages, all links will still point to valid pages,
regardless if they were translated or not. If you do not have a
connection to the Internet at that moment, then your browser will
generate an error message, saying that it can not find the site.
To sum up what
we have just explained:
Another very
powerful feature that WebUtil has is the ability to only pick up
web pages or files if they have changed since the last download.
WebUtil will determine if the file has changed, and in case it
has, it will pick it up. If it has not changed, it will not.
For FTP sites,
WebUtil is even more powerful! It can synchronize the contents of
a directory on your local harddrive with a directory on an FTP
site. You can specify which location is the "leading"
location. For example, you may want the FTP site to contain
exactly the same files as a directory on your local harddrive. In
another situation, you may want exactly the opposite. WebUtil is
extremely flexible. You can synchronize a specific set of files,
entire directories, or a combination of the two!
Below is a
summary of the features we have just explained in the above
paragraphs. WebUtil can:
The Windows version of WebUtil
also has the ability to automatically hang up the modem after it
has finished its task.
(back)
3. Basic
skills
WebUtil consists of one executable
and a configuration file. The configuration file contains the
information that instructs WebUtil what to pickup from the
Internet. WebUtil itself does not contain any buttons or any menu
items. When started up, WebUtil will carry out the instructions
in the configuration file and then shut down. Executing WebUtil,
therefore, consists of simply entering the name WEBUTIL on the OS/2 commandline or by double clicking the
WebUtil icon on the desktop.
WebUtil has only optional
commandline that can be used to tell WebUtil to use a different
configuration file. Normally, WebUtil will look for a file named WEBUTIL.INI. If a different name is specified on the
commandline, WebUtil will use that one instead. Example usage:
WEBUTIL MYCONFIG.INI
(back)
4.
Installation
As we stated in Chapter 1, WebUtil
is easy to configure.
To install WebUtil, simply un-zip
the WebUtil archive into a new directory.
Then type install.
The installation procedure will
create a new folder on your desktop with the name WebUtil. This
folder will contain icons for the program WEBUTIL.EXE,
a sample configuration file WEBUTIL.INI,
the help file WEBUTIL.HTM, and the registration file WEBUTIL.REG.
(back)
5.
Configuration
We stated
in Chapter 1 that WebUtil is easy to configure. The instructions
that WebUtil uses to determine what to pickup from the Internet
are stored in a text file. The name of this text file is WEBUTIL.INI. This text file must contain three different types
of information. First, general information about the name of the
log file to use, your name, and your registration code, incase
you have registered WebUtil. The second and third types of
information are HTTP and FTP
instruction blocks. Below is a sample WEBUTIL.INI
file. It may look a little complex at first, but in the following
paragraphs we will explain how it works and you will see that it
is extremely simple and intuitive.
;
; Sample WEBUTIL.INI file.
;
; Copyright (C) 1998,2000, Harms Software Engineering, all rights
reserved.
;
[MAIN]
LOG=c:\webutil\webutil.log
KEY=unregistered
NAME=Harald Harms
[END]
;
[HTTP]
NAME=ALLFIX WebSite
URL=http://www.allfix.com
LEVEL=3
LOCATION=d:\webutil\allfix
IF_MOD=TRUE
[END]
;
[FTP]
NAME=ALLFIX FtpSite
URL=ftp.allfix.com
USER=harald
PASSWORD=test
RETRY=3
TRANSFER=c:\files\myfiles.zip,/harald/,IFMOD
[END]
;
As can be
seen in the example above, each block begins with a name
enclosed in square brackets and it ends with the workd END also enclosed in square brackets. The block MAIN contains the general information. A block of the
type HTTP contains instructions for which web sites
or web pages need to picked up, and a block of the type FTP contain instructions for which files need to be
collected or placed on an FTP site.
The
instructions in the blocks consist of a verb followed by an equal
sign which is in turn followed by a value. Some verbs are simple
Yes/No items. In those situations, the value of the verb is
either YES or NO, as can be seen in the HTTP block above (see
verb IF_MOD).
Lines that
start with a semicolon (;), are regarded as comments. This means
that WebUtil will ignore those lines. We suggest that you include
comments in your configuration file because it makes it easier
for you, and for others, to understand what you have done.
(back)
5.1
The [MAIN] block
This
block contains general information. It can contain three
different verbs. Below is a list of the verbs and their meaning:
(back)
5.2
The [HTTP] block
This
block contains information that instructs WebUtil which web sites
and web pages need to be picked up from the Internet. The
configuration file may contain up to 1000 of these blocks. Each
block contains the name of one web site or web page along with
some other information that tells WebUtil where to store the web
pages on your local harddrive, how many levels to pickup, and
more.
Web
pages contain links to other pages on the Internet. The main
page, often called INDEX.HTML, for example, may contain links to 10
other pages, which in turn contain links to many many more pages.
Each time a link is followed from one page to another, we say we
have gone a level deeper. This means that if you follow a link in
INDEX.HTML to PRODUCTS.HTML, and then to HELP.HTML,
we would say that you are currently at level 3, in the web site.
Levels
are very important because you can tell WebUtil how many levels
to pickup. If you specify a level of 1, then only the page that
you include in the HTTP block will be picked up. If you specify 5
levels, then WebUtil will follow each link picking up the
subsequent pages, until it has arrived at level 5.
The
HTTP block can contain a number of different verbs. Below is a
list of the verbs and an explanation of what they mean:
NAME |
This verb can be used by you to give an easy
to understand name to the web site or pages that you want
to pickup. WebUtil will display this name on the screen
while it is busy picking up these pages. WebUtil does not
use this information for any other purpose, therefore,
you are free to fill in whatever you want. |
URL |
This verb is used to identify the web site
or web page you want WebUtil to pick up. URL is the only
internet technical term you are going to find in this
manual. It is an acronym for Universal Resource
Identifier. You may use an IP address in this field! |
PORT |
This verb, which is optional, can be used to
specify an alternative port number for this web site.
Normally, you do not need to use this verb. WebUtil will
use the standard port numbers when connecting to a
website. However, if you know that a web site uses a
different port assignment, then you can tell WebUtil to
use it, with this verb. |
LEVEL |
This verb can be used to instruct WebUtil on
how many levels to pickup. |
LOCATION |
This verb is used to identify where, on your
local harddrive, the web pages that are picked up, should
be stored. |
IFMOD |
This verb can be used to tell WebUtil to
only download those pages that have changed since the
last time WebUtil was active. This feature makes it
WebUtil much faster since it does not have to download
every page each time. The value for this verb is YES or
NO. |
TRANSLATE |
This verb tells WebUtil to translate the
links in the downloaded HTML files so that they all point
to files on your local hard disk. You should use this
feature if you want to view your pages off line. The
value for this verb is YES or NO. |
SHORTNAMES |
This verb can be used to instruct WebUtil to
convert long filenames to short DOS style filenames
(8.3). Using this feature does have one unfortunate
consequence, namely, that the IF_MOD feature does not
work very well anymore. WebUtil will not be able to
detect changes to files for which the names have been
shortend. The value for this verb is YES or NO. |
EXCLUDE |
This verb can be used to instruct WebUtil to
ignore certain files or file specifications. Example:
EXCLUDE=*.GIF;*.JPG
|
When
downloading pages, the directory where they are stored can become
quite a mess. WebUtil will automatically clean up the directories
each time it is started up, unless the IF_MOD feature has been
turned on. If this feature is turned off, the directories will
not be cleaned up.
(back)
5.3
The [FTP] block
This
block contains information that instructs WebUtil which files to
pickup or place on an FTP site. The configuration file may
contain up to 1000 of these blocks. Each block contains a file
specification (including wildcards!), the location where those
files can be found, and the place to put the files.
The FTP
block can contain a number of different verbs. Below is a list of
the verbs and an explanation of what they mean:
NAME |
This
verb can be used by you to give an easy to understand
name to the FTP site.. WebUtil will display this name on
the screen while it is busy picking up these pages.
WebUtil does not use this information for any other
purpose, therefore, you are free to fill in whatever you
want. |
URL |
This
verb identifies the name of the FTP site. It is important
that you do not include paths in the FTP site name. For
example, use ftp.allfix.com instead of ftp.allfix.com\pub. You may use an IP address in this
field! |
PORT |
This
verb, which is optional, can be used to specify an
alternative port number for this ftp site. Normally, you
do not need to use this verb. WebUtil will use the
standard port numbers when connecting to an ftp site.
However, if you know that a ftp site uses a different
port assignment, then you can tell WebUtil to use it,
with this verb. |
USER |
This
verb identifies the user name that should be used to log
into the FTP site. If the site allows anonymous logins,
then you should enter the word "anonymous" here
and your email address as the password (see next verb). |
PASSWORD |
This
verb identifies the password that should be used to log
into the FTP site. |
RETRY |
When
something goes wrong when attempting to login into an FTP
site, it sometimes necessary to retry the login procedure
more than once. This verb can be used to specify the
number of times that WebUtil should try to log into an
FTP site. The default value, incase this verb is omitted,
is 1. |
RETRY_DELAY |
This
verb can be used to instruct WebUtil how long it should
be wait before trying to re-establish a connection. The
value entered here is the number of seconds that WebUtil
should wait. The default value is 3 seconds. |
LIST_DELAY |
This
verb can be used to instruct WebUtil how long it should
wait for directory information from the server. The
default value, which is 5 seconds, should be adequate for
most servers. But, if you notice that not all files are
being downloaded from a server, then increasing this
value may help. Please note that this option is not
required. |
TIMEOUT_DELAY |
Normally, when
downloading a file, the FTP server will acknowledge the
end of the file transfer. Sometimes this does not happen,
and WebUtil will then wait a certain period of time,
before it continues. This time out delay is set to 300
seconds. If this happens frequently, then you may want to
reduce the length of that delay, with this particular
verb. The value specified is in terms of seconds. |
TRANSFER |
This
verb identifies which files should be transferred, where
they are located, where they should be placed, and
whether or not they should be synchronized or only
transferred if they have been modified. Up to 255
TRANSFER commands can be defined per FTP block. The
following format should be used for the value entered in
this verb: [from location][filespec],[to
location][filespec],flag1|flag2|...|flagn
For
example:
c:\files\myfiles.zip,/harald/
or
/harald/myfiles.zip,c:\files\
The
two parameters may be a little confusing at first. The
first parameter indicates the location and filespec that
should be transfered to the location specified by the
second parameter. WebUtil will determine, based on the
first and second parameters, whether or not files should
be uploaded to the FTP site or downloaded from it.
For an explanation of the
different flags that may be used, please see section 5.3.1
Advanced settings.
|
(back)
5.3.1
Advanced settings
The
transfer command contains several other flag parameters that can
be used. These are normally only used in more advanced
configurations. To keep the explanation of how to use WebUtil
simple, they are discussed here in a seperate section. You can
use more than one flag on the command line, in which case you
need to seperate them with a pipe symbol, "|". The
different flags are:
EXCLUDE= |
This flag can be used to tell WebUtil which
files it should not transfer. You can include complete
filenames or filespecs. Multiple filenames and filespecs
should be seperated with a semicolon (;). Example: EXCLUDE=*.GIF;*.JPG
|
TREE |
This flag tells WebUtil to transfer the
entire directory tree. This flag only works when you are
tranferring all of the files (i.e. *.*). |
SEM= |
This flag can be used to specify the name of
a semaphore file that WebUtil should use. WebUtil will
look for that semaphore file in the directory being
transferred to. If the semaphore file does not exist,
WebUtil will create one one in that directory. After it
has finished, it will delete it again. If the semaphore
file does exist, it will not process this particular
TRANSFER command. Please note that the name of the
semaphore may not contain a path! Example: SEM=ihub.bsy
|
MAXSIZE= |
This flag can be used to specify the maximum
size of files to transfer. This can be useful if you do
not want to download files larger than a particular size.
The value entered must be in kilobytes and it may only
contain numerical characters. In the following example is
set to 5 megabytes (i.e. 5000 kilobytes): MAXSIZE=5000
|
SYNCH |
The TRANSFER command becomes
a little trickier when you want to use the synchronize
feature. In that case, the first and second parameters
may both contain file specs and directory names. There
are four different combinations of synchronizing files
that can be identified. These four are explained below:
- Both directories (local and remote)
must contain the same files. In this situation,
both the first and the second parameters must
indicate a directory, and should not contain any
filespecs.
Example: c:\files\,/harald/,SYNCH
- Certain files on the local harddrive
need to be synchronized with the files on the FTP
site. In this case, the first parameter should
contain a location and filespec. The second
parameter should contain only a location.
Example: c:\files\*.ZIP,/harald/,SYNCH
- Certain files on the FTP site need
to be synchronized with the files on the
harddrive. In this case, the first parameter
should only contain a location. The second
parameter should contain location and filespec.
Example: c:\files\,/harald/*.ZIP,SYNCH
- Certain files on the FTP site need
to be synchronized with files on the local
harddrive AND certain (other) files on the local
harddrive need to be synchronized with the FTP
site. In this case both the first and the second
parameters should contain location and filespecs.
Example: c:\files\*.ZIP,/harald/*.ARJ,SYNCH
|
IFMOD |
This flag tells WebUtil
to only transfer files that have been changed. |
OVERWRITE |
When transfering files,
it is possible that WebUtil encounters a file that
already exists in the destination directory. Normally,
WebUtil will not overwrite those files. This flag can be
used to tell WebUtil that it should overwrite the
existing file, in these situations. |
(back)
5.4 Conditionally executing blocks
Normally, WebUtil will carry out all of the
instructions in the INI file. It is possible to conditionally
carry out some or all of the blocks when a certain flag is
present. A flag can be either the name of a semaphore file, a
parameter of the WEBUTIL environment variable, or a commandline
parameter.
When a flag exists WebUtil will carry out that
block. If it does not exist, then it will ignore that block. This
mechanism makes it possible for you to make more complex INI
files that only carry out certain functions in specific
situations. This mechanism works for both FTP and HTTP blocks.
The proper syntax for using this mechanism is:
[FTP::flag] or [HTTP::flag]
For example:
[FTP::c:\runme.now]
...
[END]
In the above example, you can see that the flag is a
semaphore file, called c:\runme.now in this example, is added to
the block marker. Please not the double colons seperating it from
the text "FTP". The following example shows how to use
this mechanism with an environment variable:
[FTP::DOIT.NOW]
...
[END]
In this example, WebUtil will run this particular
block if the WEBUTIL environment variable is set and it contains
the value "DOIT.NOW". Using a command line parameter is
done in the exact same manner. As a matter of fact, at runtime,
WebUtil will check to see what the flag is, whether it is a
semaphore file, an environment variable, or a command line
option.
By preceding the flag with an exclamation mark, !,
you can tell WebUtil to execute a block if the flag DOES NOT
exist. In the following example, the FTP block will only execute
when the semaphore file does NOT exist:
[FTP::!c:\runme.now]
...
[END]
As with the other examples, the ! can be used with
the other types of flags as well.
(back)
6.
Error handling
Every
internet user knows that things sometimes go wrong. Some sites
may be down, the Internet may be congested resulting in time out
errors, or the address of a web site may have changed. WebUtil is
smart enough to handle many different kinds of small errors,
however, it does occur that it can not carry out its task for one
reason or another. In order to help you find out what is going
wrong, WebUtil writes all of its actions to a log file. The name
of this log file can be specified in the configuration file (see
section 5.1). This chapter contains a list of the error messages
that the log file can contain and a short explanation of the
problem with suggestions on how to solve the problem.
In
addition to these errors, WebUtil can also create a special DEBUG
log file, which can be specified in the configuration file with
the DEBUG_LOG verb (see section 5.1. The
[MAIN] block). This
particular log file contains all text and commands that are sent
to and received from the server (HTTP or FTP), If you experience
any problems, you can use this feature to get a better idea about
what is going wrong. The items in this log file are preceeded by
a letter. "C" means connect, "R" means
Receive, and "S" means Send.
(back)
6.1 Web sites and web pages [HTTP]
This section contains the error messages that
WebUtil can report when picking up web sites and weg pages from
the Internet.
Unable to establish connection
This error means that either the web site could
not be found or that WebUtil was not able to connect to the
web site. Possible causes could be that the web site has
moved to a different location, or that the Internet is
extremely congested at this moment which prevented WebUtil
from establishing a connection with the web site.
Unable to find <name>
This error means that the specific web page
specified, could not be found. This error can occur if the
page specified in the configuration file could not be found,
but can also be given when WebUtil tries to follow a link to
another page in order to pick that up as well. If WebUtil can
not find a particular file referenced in a web page, then it
will also give this error.
Unable to create file on local drive <name>
This error is given when WebUtil can not create
the file, it wants to pick up, on the local drive. The most
probable cause for this error is that the filename is not a
valid filename.
File not modified, skipping download
This particular message is more of a
notification than an error. It simply means that the file
that was to be downloaded has not been changed.
(back)
6.2
FTP sites [FTP]
Unable
to establish connection
This
error messages is given when WebUtil is unable to establish a
connection with the FTP site. Possible causes include an
incorrect FTP address, a site that is (temporarily) down, or
congestion on the Internet, making it difficult to establish
a connection.
User
name incorrect
This
error indicates that the user name, as specified for this
site (in WEBUTIL.INI) is incorrect. In other words, there
is no user account on the FTP site with this name.
Password
error
This
error indicates that the password for the user account is
incorrect. Passwords are almost always case sensative.
Therefore, a possible cause for this error could be that the
case of some of the letters in the password is incorrect.
Unable
to establish data channel
Transferring
data to and from an FTP site is done via a, so called, data
channel. Before transferring data, WebUtil establishes a
second connection, namely, the data connection. This error
indicates that WebUtil was not able to establish such a
connection, making it impossible to upload or download files.
Unable
to find file
This
error indicates that the specified file was not found on the
FTP site in the specified directory.
Login
unsuccessful
This
error message is given when logging into the FTP was not
successful. This error message is always preceeded by one of
the first three error messages.
File not
modified, skipping download
This
particular message is more of a notification than an error.
It simply means that the file that was to be downloaded has
not been changed.
(back)
6.3
Errorlevels
To make it
easier for you to use in a batch file, WebUtil will exit with the
following errorlevels:
0 - no files
downloaded nor uploaded.
1 - 1 or more files downloaded via HTTP or FTP.
2 - 1 or more files uploaded via HTTP or FTP.
3 - 1 or more files uploaded and 1 or more files downloaded via
HTTP or FTP.
* Note:
Please note that only the OS/2 version of WebUtil exits with
these error levels.
(back)
7.
Registration
WebUtil has
been released under the shareware concept. This means that you
are allowed to use to use it for a maximum of 30 days. If you
enjoy using WebUtil and you wish to continue using it, then you
are required to register the program. By registering the program,
you receive a registration key which will make all of the
features available.
In the
un-registered version of WebUtil, the following features have
been disabled:
Upon
registering WebUtil, the above features will become available to
you.
A
registration key is valid for the current release version and for
the next two release versions. This means that if you register
WebUtil version 1.00, you will also be able to use 1.10 and 1.20
(assuming that those are the next two versions that are
released).
You can
register WebUtil by filling out the electronic registration form
on our Web site, www.allfix.com or by completing the registration form (WEBUTIL.REG). Please consult the registration form for more
information.
(back)
8.
Abbreviations
(back)