Hierarchical file systems allow you to say for sure where a file isn't. Every ta...

kijin · on April 6, 2018

Tags are fun when you have a few thousand items to test your MVP with. It gets much less fun when you have millions of items with thousands of tags, all on a flat hierarchy.

On the other hand, when you're stuck with a flat hierarchy anyway (e.g. thousands of pictures, all named DCIMxxxx.jpg), tags can be more useful. But only if they're automatic.

I want the best of both worlds. I want to organize my stuff into folders and use tags to search for individual items. There's no need to be a purist on either side. "Designing better file organization around tags" is a good thing. "Designing better file organization around tags, not hierarchies" is not.

enobrev · on April 6, 2018

I absolutely agree with this, except to add that it seems, in theory, in the case of a 100% tag-based file-system, there would never (*very rarely) be a flat list where you have to scroll through millions of files. The UX of a single flat list with millions of files named DCIMxxxx.jpg is a limitation of the current format and doesn't make sense when so much information can be generated about our files upon creation.

In this hypothetical FS, all files would have dynamically generated tags for created date/time, modified date/time, access date/time, originating application, given filename, owner, group, permissions, format, geo-data if available, originating hostname, EXIF data should all be first-class tags as prominent as the others, and so on. I visualize it to work, by design, something like how google photos UX works, grouping everything by EXIF data automatically before you ever start organizing things on your own.

Just as well, with any files created manually, in the application "tagging" should be just as prominent a function as "naming" is right now.

TeMPOraL · on April 6, 2018

Part of the adoption problem here would be trust. Hierarchical filesystems are something we're used to, and we can trust they're implemented correctly. That means, if I visit a folder and see some files, I know what I see is all of the files there (+/- hidden file settings); if something is missing, it's not there, period.

Tag search is a search. Can be broken. Can be optimized in a way that causes it to lie. I look at the results, and I'm not sure if they're complete. Maybe the file I'm looking for is really not there, or maybe the search gave up too early. Or the tag was slightly misformatted?

Maybe I'm too used to the old thing, but I like the notion that there's one canonical tree structure that makes all my data reachable. In case I've misplaced something, the search space of all paths through the filesystem tree (or a subtree of interest) is vastly smaller than the search space of all possible values of all relevant tags.

Izkata · on April 6, 2018

> Tag search is a search. Can be broken. Can be optimized in a way that causes it to lie. I look at the results, and I'm not sure if they're complete. Maybe the file I'm looking for is really not there, or maybe the search gave up too early. Or the tag was slightly misformatted?

Google Docs, while not exactly a tag system, wants you to search instead of use a hierarchy, and so is my go-to example: A good 90+% of the time, I can't find something I know is there and have to ask a co-worker for a link.

The missing consideration when it comes to tags is simple discoverability. You have to already know enough about what you're looking for in order to find it. A hierarchical system lets you do systematic browsing.

enobrev · on April 6, 2018

You make excellent points, although I think I the issues you raise exist in our current filesystems as well. Provided the FS is indexed properly, opening a tag should show all files associated with that tag immediately. Just like opening (or listing) a directory does.

And in the case of search, it absolutely sucks on today's filesystems. Don't get me wrong, find and grep are incredible tools. I simply mean that it's not like searching for files beyond the hierarchy is known for the pleasurable UX. The only way I know a grep of a whole drive or deep directory is done is because I get my blinking cursor back.

At the very least with a proper tagging system, we would be inherently familiar with the indexes available to us.

TeMPOraL · on April 6, 2018

> And in the case of search, it absolutely sucks on today's filesystems.

It does. You mention grep, I'd even mention find - half of the time I'm wondering whether it has searched everything I wanted, or I misspelled the command. Or file search in Windows (Vista+) - I just don't trust it; I'm pretty sure it missed some data in the past for one reason or another.

Now with traditional file systems, I at least have the file tree. With tag-based systems, I'd only have search - so it better be trustworthy, both in reality and UX-wise. It needs to project the feeling of correctness and completeness or results.

> The only way I know a grep of a whole drive or deep directory is done is because I get my blinking cursor back.

The only way I know a find of a whole drive does what it's supposed to be doing is because it emits a stream of "find: `/some/path': Permission denied" messages.

> At the very least with a proper tagging system, we would be inherently familiar with the indexes available to us.

Fair enough.

enobrev · on April 6, 2018

> I'd only have search - so it better be trustworthy

Absolutely agreed. It should be as reliable and as immediate as what we have now.

> I at least have the file tree

I don't know how this would work in practice, but I'm imagining something where, UX-wise, a tag-based FS could act very much like what we're already used to. Google was very much on this track in their early versions of "labels" in gmail and google drive (shame they've slowly moved away from it)

Just last night I used some desktop app I found to tag a few thousand scanned documents so I could do my taxes this morning (researching my options is how I ended up finding this article). Once they were all tagged, I was able to traverse in a very familiar way.

At "root", there's too much noise, but as soon as I pick a tag, say "2017" - now I have whittled down my available tags. And then I pick "receipts". Smaller list of files and a smaller list of tags. And then "restaurants". And then "business".

That seems quite a bit like a hierarchy to me. The subset of tags that are related to the first one I chose act just like sub-directories. The UI could work exactly like what we already know and love. As we know it now, I would have ended up at ./2017/receipts/restaurants/business.

Of course with directories, that's the only way I could organize my files. But if we're working with tags, I would get the exact same results going to:

/business/receipts/2017/restaurants/

/receipts/2017/business/restaurants/

You get the idea. But, I could also potentially do something like:

/receipts/2017/client_1+client_2+client_5

or

/2017/receipts/business+!client_3

Now, still within the realm of a directory structure - even using terminal commands we're all familiar with and a bit of extra sugar - I have access to more features. I can't merge directories in a tree. Not that easily, anyway. But in this case I can `cd` into a directory of exactly what I want in a familiar way without trying to remember if what I'm looking for is in ~/Dropbox/receipts/2017 or ~/Documents/business/client_1/receipts.

It's in both. "Dropbox" and "Documents" are no longer necessary. Nor is ~/.

nayuki · on April 7, 2018

I like the content of your answer: filtering by tags to narrow down the search results, only showing tags that belong to the current set of results, the benefits of order-insensitive path parts, and the ease of taking unions of tag results.

The path examples that you created are Boolean queries with different symbols: slash means AND (low precedence), plus means OR (medium precedence), and exclamation means NOT (high precedence). Your last example could be rendered as "2017 AND receipts AND (business OR NOT client_3)" and mean the same thing.

In any case, the illustration you made is indeed the sort of user interaction that I want to design into a future prototype.

dTal · on April 7, 2018

>if I visit a folder and see some files, I know what I see is all of the files there (+/- hidden file settings); if something is missing, it's not there, period.

Funny you should say that. Only a few weeks ago a colleague of mine was perplexed by a file which showed up in a 'save as' box, but not in Windows Explorer. It was an ordinary log file, same as a bunch of others in that folder, no reason for it to be different. Apparently he later discovered the file was visible if navigated to from C:, but not through the desktop shortcut he'd made to that folder. We could only conclude it was a Windows bug. Whatever the cause, it wasted a good deal of our time hunting for that file...

sullyj3 · on April 6, 2018

I think the parent wasn't concerned about scrolling through large numbers of files, so much as the performance issues associated with querying them.

nayuki · on April 6, 2018

I understand your concerns, and they are indeed valid. First off, I doubt that managing millions of files in a traditional hierarchical file system is fun either. You'd likely run into problems with making unique names, sharding folders, and categorizing files that logically belong in multiple places. I also have some worries that existing file systems (say NTFS or XFS) will behave or perform well with millions of files. I believe that implementing tags is a starting point for the problem of managing millions of files in a sensible way.

Speaking of thousand of pictures, what I really want is to dump all my photos into one folder. Right now, tools are ill-equipped to deal with large folders, so I am forced to manually create a new folder for every thousand or so items.

cyphar · on April 6, 2018

> I also have some worries that existing file systems (say NTFS or XFS) will behave or perform well with millions of files.

I believe the mantra for XFS is "if you have large or lots, use XFS". XFS has a lot of optimisations for metadata operations which should mean it's better than most filesystems for lots-of-files and large-files cases (Dave Chinner has given several talks about the performance characteristics of XFS with "large or lots" cases).