Most efficient way to model file system paths


#1

I’m new to graph databases and have started playing with Cayley to try and see if it is the best choice for a project where I need to store filesystem path hierarchies with metadata attached at each node. I’ve started out by using the in-memory graph in Go and modeling against a file system node struct. But one thing I am not able to wrap my head around is the properly efficient approach to indexing straight to a particular path. I had hoped to avoid encoding the entire path at each node because that would make branch renames very annoying.

So given /path/to/some/dir/A, I would have to start at the ‘path’ root and iterate the outs until “A”? Is this an efficient usage of a graphdb for this problem? Would this entire recursive query end up running server-side with only one roundtrip?

Edit

To be more specific about my project, I need to model something like:

type Template struct {
    Name string
    Date time.Time
    Trees []*Node
}

type Node struct {
    Name string
    Mode int
    Type int
}

func (n *Node) Path() string {
    // return /path/to/node
}

There would be many templates containing 1 or more trees of filesystem hierarchies.


#2

It’s my [limited] understanding with graph databases that, yes, a filesystem would be modeled something like:

  • Directory -inside-> Directory
  • File -inside-> Directory

You might have to write code on your end to convert the path in to the right query, but it should be possible to query for things like “give me the file X ‘inside’ Y ‘inside’ Z.” Neo4j’s Graph Databases book certainly uses models like this, although depending on how complex your queries get you might add additonal edges for caching purposes.


#3

Thanks for the reply on this. So if one were to model a filesystem the way you suggest, a query could either start from the file and follow each parent. Or it could fine the root parent and follow the inputs to the child.

But I am still curious if this is considered efficient in a graph database world? I don’t want to cache full paths along the way since like I said it would probably be annoying to keep them updated with renames.


#4

They are supposed to handle it pretty well; modeling edgewise relationships
is why they were made (and what makes them fun.)

Don’t cache the full path. Find a file and use it’s OID (this is what your
OS mostly does when asked to open file handles.) Paths are only for opening
the handle or telling someone where to look when they need to open one
later.