Author: | Michael W. Shaffer mwshaffer@yahoo.com |
Current Version: | 1.7 |
Status: | Stable |
Release Date: | 2001-12-05 |
Source Archive: | ssync.tar.gz (includes binaries for Linux / Libc6 systems) |
Change Log: | CHANGES |
License: | GPL |
News
2001-12-05
The several new features in 1.7 include a --help message, a test mode which runs through the entire
sync cycle but writes nothing to the destination filesystem (--test | -X options), and a few more
minor fixes to small issues. Complete details are in CHANGES. The only major
thing on the TODO list at the moment is to rewrite the worklist parsing to load
everything into a list and process it in order instead of using a hash table and proceeding helter-skelter
as we do now.
2001-12-05
Several new features have been implemented by request, including no-sync-data, no-sync-time,
no-sync-meta, and update-only. I am still working on a test option which would run
through a whole sync procedure but not actually write anything. The CHANGES file
details a couple of other minor mods. After the recent improvements, I actually feel good enough about the features
to now list them before the limitations.
2001-12-03
I've added some information on tuning ssyncd for large workloads, including some actual
config files (in the examples directory of the tarball) showing what I did to reduce the replication period of
our own production servers from around 80 - 100 minutes to between 3 and 20 minutes just by introducing multiple
instances of the daemon and playing with the configs for a little while one morning.
If anyone has real world numbers on how ssyncd compares to other replication tools in terms of filesystem workload vs.
minimum replication period, feel free to pass them on. I may be wrong, but I think that ssyncd has a significant advantage
over some alternatives in that it is viable to run a large number of instances on one machine due to its relatively light memory
and CPU footprint. I don't think we could reasonably consider running nine parallel instances of rsync even on our relatively
well endowed production machines since they would probably consume somewhere between 500MB and 1500MB of RAM altogether. I'm sure some
would criticize the design for not being multi-threaded or automatically forking multiple instances, but I tend to think that simple and
efficient single threaded designs can be just as effective and somewhat easier both to implement and control in actual use compared to
multi-threaded implementations.
Version 1.5 added a whole gang of speed and efficiency optimizations as well as removes a moderate amount of tracing code of
somewhat questionable utility. The new version should show significantly reduced CPU time (maybe as much as 20% depending on workload)
over the earlier releases as well as a slight reduction in the already minimal memory requirements. I haven't eliminated all
of the things which might be considered to be wasteful of cycles, but I think I've made some noticeable improvements in the
tightest inner routines which are called thousands or millions of times during each run.
Why another synchronization tool?
The name ssync is a contraction of (simple|silly|stupid) filesystem synchronizer. Which
prefix you prefer depends entirely on how well it fits your needs, I suppose. This program is not
really intended to compete with anything as highly evolved and functionally rich as rsync,
but was just designed to be an extremely simple and reliable solution to a significant operational
need. On the network I manage, I recently put into production a pair of loosely coupled highly available
Linux file servers which run Samba,
NFS, and dhttpd to service the file sharing
needs of about 500 users with client machines running Windows and various UNIX platforms. I chose not to
use any of the currently available HA packages to manage these systems for various reasons:
Limitations
The basic function of ssync is simply to make the directories, files, and links on a destination
filesystem match those on a source filesystem. The default behavior is to read a list of paths to sync from
a specified file and recursively process each of them.
Building and Installing
I have tested and deployed ssync on both RedHat and Debian Linux. I am not aware of any Linux specific
features which it uses, so I think it will work fine on most other UNIX-like platforms as well. There is no configure
script since I just didn't feel like writing one and I don't really think one is necessary at this point. There may
be one in the future. You may need to change the makefile if you don't have gcc available or you
want to use syslog logging instead of the default plain file based logging. Otherwise, a plain old
Configuration
All of the available configuration options are shown in the example ssyncd.conf configuration file and can be set
either in this file (for ssyncd), in .ssyncrc (for ssync), or on the command line (for both). A
summary of config options is below. The -c option only makes sense on the command line (duh), and the interactive
version of the program only really uses the -c, -w, and -v options. You will only get a moderate
number of 'informational' messages even at the default log-level of 0. If you want to suppress everything except
errors, set log level 3 (warn). Log level 2 (info) is probably what most people want.
Config file | Long Option | Short Option | Comment |
- | --help | -h | display usage message and version |
conf-path | --conf-path | -c | read alternative config file from the default |
interval | --interval | -i | number of seconds to sleep between completing one run and starting the next |
work-file | --work-file | -w | path for file containing work paths |
priority | --priority | -n | scheduling priority (-20 - +20), see renice(8) |
no-sync-data | --no-sync-data | -D | do not sync data (content) of files |
no-sync-time | --no-sync-time | -T | do not sync atime / mtime |
no-sync-meta | --no-sync-meta | -M | do not sync meta-data (uid / gid / mode) |
update-only | --update-only | -U | only sync things if source mtime is > destination mtime |
test | --test | -X | run sync procedure and collect statistics without actually modifying anything |
pid-path | --pid-path | -p | path for pid file |
log-path | --log-path | -l | path for log file if using file based logging |
log-ident | --log-ident | -s | identification string if using syslog based logging |
log-level | --log-level | -v | logging verbosity (0 - 5), lower levels are more verbose (2 is normal, 0 may be excessive) |
Here's the example ssyncd.conf file:
# # ssyncd.conf # interval: 300 # time between sync runs in seconds work-file: /etc/ssyncd.work # list of paths to synchronize priority: -1 # scheduling priority (range -20 - +20) # be careful with this! and read renice(8) # if you don't know what it means #no-sync-data: yes # [y|n] do not sync data (file contents) #no-sync-time: yes # [y|n] do not sync atime / mtime #no-sync-meta: yes # [y|n] do not sync meta-data (uid / gid / mode) #update-only: yes # [y|n] update only if source mtime > dest mtime #test: yes # [y|n] test only (modify nothing in dest.) pid-path: /var/run/ssyncd.pid # path for pid file log-path: /var/log/ssyncd.log # path for file based logging log-ident: ssyncd # id for syslog based logging log-level: 2 # 0 - ALL # 1 - TRACE # 2 - INFO # 3 - WARN # 4 - SEVERE # 5 - FATAL
The work file just contains a list of work items, one per line, in the form:
/source/path | /destination/pathThe paths can be either files or directories, and source directories will be processed recursively. There is no form of substitution or environment variable parsing, and there is no facility for excluding things. If the destination is a different type than the source (i.e. source is a file and destination is a directory), then the program will unlink the destination object (recursively) and re-create it as the new type. This means that if you wanted to sync a file into a directory, you should give the full path name of the destination including the file name. This 'feature' might also have some disastrously unexpected effects if you tried to specify a symlink to a directory or file as the source path and a real directory or file as the destination. The config file parsing routines are really simple-minded and will just discard all whitespace in either config file (meaning paths with whitespace will not be parsed correctly). If it causes a lot of issues, I may refine this behavior in the future. Here's the example ssyncd.work file:
# # ssyncd.work: Example work file for ssync / ssyncd # # Each line must be of the form: # # source path | destination path # # Individual files /mnt/peer/etc/adduser.conf | /etc/adduser.conf /mnt/peer/etc/aliases | /etc/aliases /mnt/peer/etc/apt/apt.conf | /etc/apt/apt.conf /mnt/peer/etc/apt/sources.list | /etc/apt/sources.list /mnt/peer/etc/dante.conf | /etc/dante.conf /mnt/peer/etc/exim.conf | /etc/exim.conf /mnt/peer/etc/exports | /etc/exports /mnt/peer/etc/fstab | /etc/fstab /mnt/peer/etc/fstab.backup | /etc/fstab.backup /mnt/peer/etc/fstab.primary | /etc/fstab.primary /mnt/peer/etc/group | /etc/group /mnt/peer/etc/group- | /etc/group- /mnt/peer/etc/gshadow- | /etc/gshadow- /mnt/peer/etc/gshadow | /etc/gshadow /mnt/peer/etc/hosts | /etc/hosts /mnt/peer/etc/hosts.allow | /etc/hosts.allow /mnt/peer/etc/hosts.deny | /etc/hosts.deny /mnt/peer/etc/inetd.conf | /etc/inetd.conf /mnt/peer/etc/krb5.keytab | /etc/krb5.keytab /mnt/peer/etc/lilo.conf | /etc/lilo.conf /mnt/peer/etc/logrotate.conf | /etc/logrotate.conf /mnt/peer/etc/motd | /etc/motd /mnt/peer/etc/ntp.conf | /etc/ntp.conf /mnt/peer/etc/passwd | /etc/passwd /mnt/peer/etc/passwd- | /etc/passwd- /mnt/peer/etc/peerd-http.conf | /etc/peerd-http.conf /mnt/peer/etc/peerd-nfs.conf | /etc/peerd-nfs.conf /mnt/peer/etc/peerd-smb.conf | /etc/peerd-smb.conf /mnt/peer/etc/peerd-sync.conf | /etc/peerd-sync.conf /mnt/peer/etc/peerd-thing.conf | /etc/peerd-thing.conf /mnt/peer/etc/raidtab | /etc/raidtab /mnt/peer/etc/services | /etc/services /mnt/peer/etc/shadow- | /etc/shadow- /mnt/peer/etc/shadow | /etc/shadow /mnt/peer/etc/ssh/ssh_config | /etc/ssh/ssh_config /mnt/peer/etc/ssh/sshd_config | /etc/ssh/sshd_config /mnt/peer/etc/ssyncd.conf | /etc/ssyncd.conf /mnt/peer/etc/ssyncd.work | /etc/ssyncd.work /mnt/peer/etc/sysctl.conf | /etc/sysctl.conf /mnt/peer/etc/syslog.conf | /etc/syslog.conf /mnt/peer/usr/omni/config/cell/cell_server | /usr/omni/config/cell/cell_server # Directory trees /mnt/peer/etc/cron.d | /etc/cron.d /mnt/peer/etc/cron.daily | /etc/cron.daily /mnt/peer/etc/cron.monthly | /etc/cron.monthly /mnt/peer/etc/cron.weekly | /etc/cron.weekly /mnt/peer/etc/init.d | /etc/init.d /mnt/peer/etc/logrotate.d | /etc/logrotate.d /mnt/peer/etc/rc0.d | /etc/rc0.d /mnt/peer/etc/rc1.d | /etc/rc1.d /mnt/peer/etc/rc2.d | /etc/rc2.d /mnt/peer/etc/rc3.d | /etc/rc3.d /mnt/peer/etc/rc4.d | /etc/rc4.d /mnt/peer/etc/rc5.d | /etc/rc5.d /mnt/peer/etc/rc6.d | /etc/rc6.d /mnt/peer/etc/rcS.d | /etc/rcS.d /mnt/peer/vol00/apps | /vol00/apps /mnt/peer/vol00/backups | /vol00/backups /mnt/peer/vol00/deadaccts | /vol00/deadaccts /mnt/peer/vol00/groups | /vol00/groups /mnt/peer/vol00/home | /vol00/home /mnt/peer/vol00/infonet | /vol00/infonet /mnt/peer/vol00/nfstmp | /vol00/nfstmp /mnt/peer/vol00/ninstall | /vol00/ninstall /mnt/peer/vol00/share | /vol00/share /mnt/peer/vol00/var/mail | /vol00/var/mail /mnt/peer/vol00/var/sysnap | /vol00/var/sysnap /mnt/peer/vol00/var/www | /vol00/var/www #/mnt/peer/usr/local | /usr/local /mnt/peer/usr/local/bin | /usr/local/bin /mnt/peer/usr/local/sbin | /usr/local/sbin