Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just curious to learn: when do you need sortable primary keys?


It makes pagination much more efficient and possible to completely use an index (re: data is pre-sorted) if you have some sort of increasing id, vs sorting in-memory on request. Even having some sort of indexed "createdAt" field, and sorting on that, isn't super reliable for pagination when the dataset is changing. You usually want a stable sort to make pagination nice, so you add an ID to it, but then this gets very inefficient w.r.t index use if the PK is a UUID since it is not monotonic/sortable.


UUIDs are sortable, and you can do efficient pagination on them. (The results wouldn't be in any meaningful order, but that's orthogonal.)


Yeah sorry I had a brain fart and was thinking of stable pagination w/ compound sort+filter. It doesn't matter if id is UUID, sigh.

Where date > y or (date eq y and id > x) order by date, id

This can't be completely fulfilled by an index. It'll have to sort/merge somevresults in memory, very CPU expensive.

More examples here https://www.mixmax.com/engineering/api-paging-built-the-righ...


In sqlserver (and others?) you can have one clustered index. Typically this is the primary key (doesn’t have to be though).

The clustered index is actually the physical order of the data stored on disk, so any new guid causes tremendous amounts of churn.

Not sure how modern this concern is, but I would guess a LOT of SASS companies have to consider this.

Quick edit: sequential GUIDs are obviously a thing and I believe alleviate a lot of these concerns. I do not believe that was an option in sql server 2008 r2 (a version that lived a LONG time).


A clustered index is certainly not actually the physical order of the data stored on the disk, at best it tends to approximate it by placing data associated with similar keys in similar blocks. You would have to actually physically reorganize your entire clustered table or index to get it in physical order, and often you lose performance by doing so.

The performance impact of randomly distributed keys on any ordered index on that key is certainly real though, so I agree with you there.


Almost every database algorithm is based on ordering and binary search.


… and for which UUIDs are sortable.

OP called them "not sortable", which is a poor phrasing, as they're sortable. What he's likely getting at is that they're not ordered by generation time, which matters to some folks.


When you need to get a sorted list or perhaps a range?

When do you need primary keys?


that doesn't really make sense though. how would you know the range to sort by? presumably you'd use the criterion directly to sort.

a better reason and justification to not use UUIDs is that they take up needless space if you're space constrained. they also have worse performance


Maybe you are doing some sort of cursor for pagination or whatever. You can use one ID and fetch the next 50 records. Then use the last ID in those 50 to fetch the next 50, etc.


but you can still do pagination with uuids, your rows can be sorted on any arbitrary criteria like created date or updated date.


But that would be slow, even with an index, as compared to using a clustered primary key. Especially when your data doesn't fit in ram.


Sure, already mentioned UUID’s being slow. There are trade offs


Dates might not be unique but I can see other possibilities that would be fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: