Field Indexing from Python

I’m having a bit of difficulty finding a solution to a problem, maybe you guys can help.

In my User Data Services project, I need to have the ability to search for items which may be files, may or may not be currently available and which come from many different sources. I’m fairly used to Xapian and Lucene full text indexing, and something a kin to the field indexing of lucene would be the right kind of thing.

But python… well I’m not having much luck with anything that integrates into python. Lucene is Java based and the cLucene port doesn’t have a working s python api.

I was also thinking about Ubuntu’s existing full text file searcher, I think there may be more than one. If it’s possible to use that system, or one of the new tagging systems, then so much the better. It’s just a matter of integration and would reduce external dependencies.

In fact the tagging system is fascinating, some sources support tagging and most do not, integrating that is going to be interesting. But I’ll leave that for another time.

Update:

  • pylucene is a Java wrapper and as such has heavy dependacies and is unstable for deployment.
  • trackerd is the ubuntu indexer and it will be investigated since it’s dbus based and supports tags.
  • python-xapain is available and just needs some TLC to convert my existing Perl Xapian knowledge.