Ransacker is still under construction, so these docs are really for developers who want to contribute, or just to play with the code.
Ransacker currently uses the anydbm module that comes bundled with python. Anydbm will try and load gdbm or dbm, which you have to turn on when you compile python. If anydbm doesn't find them, it'll use dumbdbm, which should work just fine for testing, but is really slow.
*nix users should make sure python is compiled with the good stuff turned on... (Note: the python on sourceforge's shell accounts has it on already) Windows users might want to check out the port of gdbm to win32 at ftp://ftp.python.org/pub/python/contrib-09-Dec-1999/Database/
Basically, we have an Index class that you create like so:
import ransacker idx = ransacker.Index("mydbfile.rki")
*.rki indicates a ransacker index file. In the above example, two files will actually be created: mydbfile.rki and mydbfile.rkw.. *.rkw indicates a ransacker word index. Word indexes map words to numbers to help keep the index smaller. They're stored in their own file so that they can be shared between *.rki's.. To specify the wordlist to use, pass it as the second parameter to Index(), eg:
idx = ransacker.Index("index.rki", "words.rkw")
Here's how to add stuff to the index:
idx.add("item one", "teach a man to fish") idx.add("item two", "one fish, two fish, red fish, blue fish")
Ransacker is an incremental indexer. This means you can create the file, add pages to the index, close the file, come back later, and add more pages. You can even change the content without disrupting the index:
idx.add("change me!", "in the beginning...") idx.add("change me!", "my, how you've changed!")
Searching is currently built in to the Index class, but it'll eventually be moved to its own SearchEngine class. Here's the CURRENT syntax. This will change!!
results = idx.search("fish")
search() returns a tuple.. Using the index from the examples above, it should returns ('item two', 'item one'). Note that item two appears first. this is because "fish" matched more times in item two.
ransacker.test.suite is a test suite I've been using.. It can be ignored. If you actually want to run it, you'll need pyunit...
Ransacker is maintianed by Michal Wallace (sabren@manifestation.com)...
Ransacker is free software. You can use it under the terms of the GNU GPL.