Hi, I’m using Cayley in a new linked data project. It’s aimed at libraries that use Library of Congress data. Often developers and librarians need to cleanup their data, specifically their categories. They want to make sure their local categories match those of the Library of Congress. In some cases they are using a pre approved ID and they can easily lookup the new label via an API, but sometimes all they have is a string, and they need to find the best guess.
That’s where I am right now.
I am using Cayley to load in all the RDF triples, and allow for quick lookup via Subject, but now I need to tackle the less clear part which is basically a full-text search issue.
My current thought is that I would use Bleve to build a secondary full-text index.
So, I would load all the data into Cayley, and then I would iterate over all the data in Cayley and push it into a Bleve index.
I have Cayley working, and I have a Bleve index waiting, now I’m trying to make sure I efficiently iterate over the quads. I’m hoping to do some amount of batching. Here is what I have so far, and I can make do, but if there is a more efficient route I would love to hear about it. I don’t need to index every quad, just the labels so being able to select just a portion of the data would be helpful as well.
qs := a.NewQuadstore(...)
iterator := qs.QuadsAllIterator()
defer iterator.Close()
for iterator.Next(nil) {
val := iterator.Result()
quad := qs.Quad(val)
if quad.Predicate.String() == "<http://www.w3.org/2004/02/skos/core#altLabel>" {
...Send quad to be indexed...
}
if quad.Predicate.String() == "<http://www.w3.org/2004/02/skos/core#prefLabel>" {
log.Print(quad.Object.String() + quad.Subject.String())
...Send quad to be indexed...
}
}