SSYNC


Author:Michael W. Shaffer mwshaffer@yahoo.com
Current Version:1.7
Status:Stable
Release Date:2001-12-05
Source Archive:ssync.tar.gz (includes binaries for Linux / Libc6 systems)
Change Log:CHANGES
License:GPL

Contents


News

2001-12-05

The several new features in 1.7 include a --help message, a test mode which runs through the entire sync cycle but writes nothing to the destination filesystem (--test | -X options), and a few more minor fixes to small issues. Complete details are in CHANGES. The only major thing on the TODO list at the moment is to rewrite the worklist parsing to load everything into a list and process it in order instead of using a hash table and proceeding helter-skelter as we do now.

2001-12-05

Several new features have been implemented by request, including no-sync-data, no-sync-time, no-sync-meta, and update-only. I am still working on a test option which would run through a whole sync procedure but not actually write anything. The CHANGES file details a couple of other minor mods. After the recent improvements, I actually feel good enough about the features to now list them before the limitations.

2001-12-03

I've added some information on tuning ssyncd for large workloads, including some actual config files (in the examples directory of the tarball) showing what I did to reduce the replication period of our own production servers from around 80 - 100 minutes to between 3 and 20 minutes just by introducing multiple instances of the daemon and playing with the configs for a little while one morning.

If anyone has real world numbers on how ssyncd compares to other replication tools in terms of filesystem workload vs. minimum replication period, feel free to pass them on. I may be wrong, but I think that ssyncd has a significant advantage over some alternatives in that it is viable to run a large number of instances on one machine due to its relatively light memory and CPU footprint. I don't think we could reasonably consider running nine parallel instances of rsync even on our relatively well endowed production machines since they would probably consume somewhere between 500MB and 1500MB of RAM altogether. I'm sure some would criticize the design for not being multi-threaded or automatically forking multiple instances, but I tend to think that simple and efficient single threaded designs can be just as effective and somewhat easier both to implement and control in actual use compared to multi-threaded implementations.

Version 1.5 added a whole gang of speed and efficiency optimizations as well as removes a moderate amount of tracing code of somewhat questionable utility. The new version should show significantly reduced CPU time (maybe as much as 20% depending on workload) over the earlier releases as well as a slight reduction in the already minimal memory requirements. I haven't eliminated all of the things which might be considered to be wasteful of cycles, but I think I've made some noticeable improvements in the tightest inner routines which are called thousands or millions of times during each run.

Why another synchronization tool?

The name ssync is a contraction of (simple|silly|stupid) filesystem synchronizer. Which prefix you prefer depends entirely on how well it fits your needs, I suppose. This program is not really intended to compete with anything as highly evolved and functionally rich as rsync, but was just designed to be an extremely simple and reliable solution to a significant operational need. On the network I manage, I recently put into production a pair of loosely coupled highly available Linux file servers which run Samba, NFS, and dhttpd to service the file sharing needs of about 500 users with client machines running Windows and various UNIX platforms. I chose not to use any of the currently available HA packages to manage these systems for various reasons: The actual monitoring and failover features are handled by a separate daemon I created called peerd (regarding which I will post information and source code here as well in the near future). Since the implementation does not rely on a shared disk subsystem, some means of keeping the two separate filesystems of the peer machines in relatively close synchronization was needed. Originally, the solution to this requirement was a shell script which ran various rsync commands, first using a connection to an rsync server process on the master machine and later relying on a couple of NFS filesystems exported on the master and mounted on the slave specifically for the replication. As it turned out, this solution was less than satisfactory since rsync would randomly but fairly frequently fail to complete the synchronization of one or more directory trees by either hanging indefinitely or barfing out some bogus permission errors. The more I thought about it, the more I was convinced that what was needed was something much less complex and hopefully more reliable than rsync seemed to be in this application, and thus was born ssync / ssyncd. I don't pretend that this program is useful for anything besides the rather narrow mission for which it was designed (and it may not even be useful for that). I do think, however, that it at least provides an alternative sync tool for certain situations, and I was unable to find any viable alternative to rsync in the open source world when I wrote this.

Features


Limitations

The basic function of ssync is simply to make the directories, files, and links on a destination filesystem match those on a source filesystem. The default behavior is to read a list of paths to sync from a specified file and recursively process each of them.

Building and Installing

I have tested and deployed ssync on both RedHat and Debian Linux. I am not aware of any Linux specific features which it uses, so I think it will work fine on most other UNIX-like platforms as well. There is no configure script since I just didn't feel like writing one and I don't really think one is necessary at this point. There may be one in the future. You may need to change the makefile if you don't have gcc available or you want to use syslog logging instead of the default plain file based logging. Otherwise, a plain old make clean ; make should do it. Installation consists of copying the ssync and ssyncd executables to wherever you want them and then creating /etc/ssyncd.conf and /etc/ssyncd.work config files appropriate to your machine (examples of each are included). If you are running the interactive ssync version, it will obey whatever command line options you give as well as any configuration it might find in a file called .ssyncrc in the current directory. I have not yet gotten around to implementing any behavior for ssync to look for a .ssyncrc file in the user's home directory.

Configuration

All of the available configuration options are shown in the example ssyncd.conf configuration file and can be set either in this file (for ssyncd), in .ssyncrc (for ssync), or on the command line (for both). A summary of config options is below. The -c option only makes sense on the command line (duh), and the interactive version of the program only really uses the -c, -w, and -v options. You will only get a moderate number of 'informational' messages even at the default log-level of 0. If you want to suppress everything except errors, set log level 3 (warn). Log level 2 (info) is probably what most people want.

Config fileLong OptionShort OptionComment
---help-hdisplay usage message and version
conf-path--conf-path-cread alternative config file from the default
interval--interval-inumber of seconds to sleep between completing one run and starting the next
work-file--work-file-wpath for file containing work paths
priority--priority-nscheduling priority (-20 - +20), see renice(8)
no-sync-data--no-sync-data-Ddo not sync data (content) of files
no-sync-time--no-sync-time-Tdo not sync atime / mtime
no-sync-meta--no-sync-meta-Mdo not sync meta-data (uid / gid / mode)
update-only--update-only-Uonly sync things if source mtime is > destination mtime
test--test-Xrun sync procedure and collect statistics without actually modifying anything
pid-path--pid-path-ppath for pid file
log-path--log-path-lpath for log file if using file based logging
log-ident--log-ident-sidentification string if using syslog based logging
log-level--log-level-vlogging verbosity (0 - 5), lower levels are more verbose (2 is normal, 0 may be excessive)

Here's the example ssyncd.conf file:


#
# ssyncd.conf
#

interval:		300			# time between sync runs in seconds
work-file:		/etc/ssyncd.work	# list of paths to synchronize
priority:		-1			# scheduling priority (range -20 - +20)
                                                # be careful with this! and read renice(8)
                                                # if you don't know what it means
#no-sync-data:		yes			# [y|n] do not sync data (file contents)
#no-sync-time:		yes			# [y|n] do not sync atime / mtime
#no-sync-meta:		yes			# [y|n] do not sync meta-data (uid / gid / mode)
#update-only:		yes			# [y|n] update only if source mtime > dest mtime
#test:			yes			# [y|n] test only (modify nothing in dest.)

pid-path:		/var/run/ssyncd.pid	# path for pid file
log-path:		/var/log/ssyncd.log	# path for file based logging
log-ident:		ssyncd			# id for syslog based logging
log-level:		2	# 0 - ALL
				# 1 - TRACE
				# 2 - INFO
				# 3 - WARN
				# 4 - SEVERE
				# 5 - FATAL


The work file just contains a list of work items, one per line, in the form:

/source/path | /destination/path
The paths can be either files or directories, and source directories will be processed recursively. There is no form of substitution or environment variable parsing, and there is no facility for excluding things. If the destination is a different type than the source (i.e. source is a file and destination is a directory), then the program will unlink the destination object (recursively) and re-create it as the new type. This means that if you wanted to sync a file into a directory, you should give the full path name of the destination including the file name. This 'feature' might also have some disastrously unexpected effects if you tried to specify a symlink to a directory or file as the source path and a real directory or file as the destination. The config file parsing routines are really simple-minded and will just discard all whitespace in either config file (meaning paths with whitespace will not be parsed correctly). If it causes a lot of issues, I may refine this behavior in the future. Here's the example ssyncd.work file:
#
# ssyncd.work:   Example work file for ssync / ssyncd
#
# Each line must be of the form:
#
#   source path | destination path
#

# Individual files
/mnt/peer/etc/adduser.conf     | /etc/adduser.conf
/mnt/peer/etc/aliases          | /etc/aliases
/mnt/peer/etc/apt/apt.conf     | /etc/apt/apt.conf
/mnt/peer/etc/apt/sources.list | /etc/apt/sources.list
/mnt/peer/etc/dante.conf       | /etc/dante.conf
/mnt/peer/etc/exim.conf        | /etc/exim.conf
/mnt/peer/etc/exports          | /etc/exports
/mnt/peer/etc/fstab            | /etc/fstab
/mnt/peer/etc/fstab.backup     | /etc/fstab.backup
/mnt/peer/etc/fstab.primary    | /etc/fstab.primary
/mnt/peer/etc/group            | /etc/group
/mnt/peer/etc/group-           | /etc/group-
/mnt/peer/etc/gshadow-         | /etc/gshadow-
/mnt/peer/etc/gshadow          | /etc/gshadow
/mnt/peer/etc/hosts            | /etc/hosts
/mnt/peer/etc/hosts.allow      | /etc/hosts.allow
/mnt/peer/etc/hosts.deny       | /etc/hosts.deny
/mnt/peer/etc/inetd.conf       | /etc/inetd.conf
/mnt/peer/etc/krb5.keytab      | /etc/krb5.keytab
/mnt/peer/etc/lilo.conf        | /etc/lilo.conf
/mnt/peer/etc/logrotate.conf   | /etc/logrotate.conf
/mnt/peer/etc/motd             | /etc/motd
/mnt/peer/etc/ntp.conf         | /etc/ntp.conf
/mnt/peer/etc/passwd           | /etc/passwd
/mnt/peer/etc/passwd-          | /etc/passwd-
/mnt/peer/etc/peerd-http.conf  | /etc/peerd-http.conf
/mnt/peer/etc/peerd-nfs.conf   | /etc/peerd-nfs.conf
/mnt/peer/etc/peerd-smb.conf   | /etc/peerd-smb.conf
/mnt/peer/etc/peerd-sync.conf  | /etc/peerd-sync.conf
/mnt/peer/etc/peerd-thing.conf | /etc/peerd-thing.conf
/mnt/peer/etc/raidtab          | /etc/raidtab
/mnt/peer/etc/services         | /etc/services
/mnt/peer/etc/shadow-          | /etc/shadow-
/mnt/peer/etc/shadow           | /etc/shadow
/mnt/peer/etc/ssh/ssh_config   | /etc/ssh/ssh_config
/mnt/peer/etc/ssh/sshd_config  | /etc/ssh/sshd_config
/mnt/peer/etc/ssyncd.conf      | /etc/ssyncd.conf
/mnt/peer/etc/ssyncd.work      | /etc/ssyncd.work
/mnt/peer/etc/sysctl.conf      | /etc/sysctl.conf
/mnt/peer/etc/syslog.conf      | /etc/syslog.conf

/mnt/peer/usr/omni/config/cell/cell_server | /usr/omni/config/cell/cell_server

# Directory trees
/mnt/peer/etc/cron.d           | /etc/cron.d
/mnt/peer/etc/cron.daily       | /etc/cron.daily
/mnt/peer/etc/cron.monthly     | /etc/cron.monthly
/mnt/peer/etc/cron.weekly      | /etc/cron.weekly
/mnt/peer/etc/init.d           | /etc/init.d
/mnt/peer/etc/logrotate.d      | /etc/logrotate.d
/mnt/peer/etc/rc0.d            | /etc/rc0.d
/mnt/peer/etc/rc1.d            | /etc/rc1.d
/mnt/peer/etc/rc2.d            | /etc/rc2.d
/mnt/peer/etc/rc3.d            | /etc/rc3.d
/mnt/peer/etc/rc4.d            | /etc/rc4.d
/mnt/peer/etc/rc5.d            | /etc/rc5.d
/mnt/peer/etc/rc6.d            | /etc/rc6.d
/mnt/peer/etc/rcS.d            | /etc/rcS.d
/mnt/peer/vol00/apps           | /vol00/apps
/mnt/peer/vol00/backups        | /vol00/backups
/mnt/peer/vol00/deadaccts      | /vol00/deadaccts
/mnt/peer/vol00/groups         | /vol00/groups
/mnt/peer/vol00/home           | /vol00/home
/mnt/peer/vol00/infonet        | /vol00/infonet
/mnt/peer/vol00/nfstmp         | /vol00/nfstmp
/mnt/peer/vol00/ninstall       | /vol00/ninstall
/mnt/peer/vol00/share          | /vol00/share
/mnt/peer/vol00/var/mail       | /vol00/var/mail
/mnt/peer/vol00/var/sysnap     | /vol00/var/sysnap
/mnt/peer/vol00/var/www        | /vol00/var/www
#/mnt/peer/usr/local           | /usr/local
/mnt/peer/usr/local/bin        | /usr/local/bin
/mnt/peer/usr/local/sbin       | /usr/local/sbin

mwshaffer@yahoo.com