Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Things I Hate About PostgreSQL (2013) (kupershmidt.org)
148 points by subleq on Sept 10, 2016 | hide | past | favorite | 114 comments


I somewhat agree, and somewhat have the opposite view:

I love psql and tab completion, it's magic, in fact it spoiled me, I know have a hard time to work on any other DB.

Not mention in the article, but I love how the postgres dev keep up with modern SQL. Again it makes it really hard to go back on a SQL 92 compliant system.

Now, the vacuum business, that's horrible. I've run into so many case where "vacuum full" just solve the weirdest problems. One very recently where I use postgres as a queue (because it was easy and works across platform/clouds) nad it became slow and horrible even though I had dropped a bunch of schemas. "vacuum full;" oh wow!

The replications story is ugly. You can make it work, but it's literally a craft and you end up having to babysit it.

One the author missed, lots of weird performance issue, got my answer from the uber post. The rule #1 of adding indices is to only use column that you really really need, because otherwise you're going to slow down your writes. Well, it turns out not on postgres, all indices for the table you're writing to are being updated! I'm pretty sure very few people were aware of that before uber told us.

The more I use the postgres front end (psql, SQL) the more I love it, but the backend (performance, replication) is starting to taste more and more sour.


Really? Coming from PGSQL to a MySQL-only shop made me really miss PGSQL's performance. Sure, MySQL is faster on `select pk from tbl where pk=1`, but the second you get into complicated joins, including subqueries, or doing any analytics the performance is very random.

With pgsql, I can get a clear concise explain and if I disagree with what's happening (because I know exactly what my hardware is capable of) I can tune any query to be decently performant.

With mysql, you get... I mean explain is ok, but I wouldn't put it above that. "EXPLAIN ANALYZE" is my gold standard forever and ever. I even liked it more than SQL Server's equivalent; which is also fantastic.


I've used PostgreSQL, SQL Server, and MySQL.

In my experience, MySQL had the worst query optimizer of the three. By A LOT. Basically, anything but a simple select could suck very unnecessarily. Even for simple selects, MySQL messes up.

I've fixed more than performance issue by changing SELECT * FROM users WHERE key1 = 0 OR key2 = 5 into two queries and concatenating the results.


MySQL's lack of CTEs is particularly annoying. Even SQLite has supported them for some time now!

I've seen this cause MySQL query authors to use various workarounds that end up impacting the query performance negatively.


I suspect MySQL is used with more bare bones schemas, without as many SQL function and complex triggers, as it has worse support for those (IMO, as someone that used MySQL). This does have the side effect of making it very ORM friendly, and the more you rely on an ORM the less you care or even know what queries are being run under the covers.


The index issue is being worked on now after the Uber blog post.

commitfest: https://commitfest.postgresql.org/10/775/

email thread: https://www.postgresql.org/message-id/CABOikdMNy6yowA%2BwTGK...


There's really only one thing I dislike about pgsql. It seems overly difficult to return two disjointed result sets from a stored procedure/function. This is fairly common and easy in MSSQL. It also is easy in MySQL. Something like this:

SELECT int_col_1, int_col_2 FROM table1

SELECT varchar_1, varchar_2, int_col_2 FROM table2

In pgsql if the columns are the same type and number then we can use a union, otherwise the suggested alternative seems to be to just make 2 queries.


What is a good example of when you'd want to do that? I've been doing database work on the 4 majors for about 15 years and this is the first I've ever heard of somebody wanting to return two totally different result sets from a single procedure.

I have no issue with putting logic in your database. In many cases it's the ideal solution...but your example is something I've never even heard mentioned in 15 years much less advocated for or used as criticism for not being available.


One straightforward example would be feeding some kind of dashboard. Imagine a time-consuming query using a complex series of CTEs to build up or filter or reduce a very large set of rows, then running a bunch of "final" selects to transform and carve up that set in different ways to feed a bunch of tables and graphs. Obviously this could be done in various ways, but making one request to a procedure and receiving one response with multiple result sets can be a very pragmatic (and performant) approach.


One place I worked used a disconnected data set model. The idea was a single stored procedure returned multiple result sets all relating to a single business entity; specifically, an insurance quote, along with all the drivers, vehicles, accidents, convictions, etc. The entity was locked by inserting a row in a table.

Thereafter, the application didn't communicate with the database until it was time to save the record. The disconnected data set was small enough to store in shared session store, or even in a cookie (encrypted, naturally).

This architecture had a number of interesting knock-on effects. The entire state of a client's conversation with the server was tiny and entirely encapsulated by this disconnected data set, so you could record and play back each request to recreate a bug.

It didn't have to use multiple result sets from the stored procedure, but it did save a bunch of round trips.


>... or even in a cookie (encrypted, naturally).

I won't comment on anything else because I don't know the case, but you should never put user data in cookies. Even with encryption you are exposing it unnecessarily, not to mention you have to double check data integrity... Why? Cookies are not meant for this.


One query, multiple result sets--so you can get a customer and the customer's orders in one shot without any join awkwardness. It's part of the jdbc spec and somewhat common in mssql land.


I know for a fact we have reports which do this. I didn't write them though, our old data guy did.


> What is a good example of when you'd want to do that?

When you've got a long round trip.


Or when you can do common subexpression elimination, where you can compute something such as a temporary table, then use that to produce result R, then use it to produce result S.


So this would be used more in BI/data mining type workflows?

That would explain why I haven't had to deal with it.


This is my number one gripe about PGSQL. It's never had "real stored procedures", a feature which has been on the TODO list for a long, long time and nobody is even working on it. See Server-Side Languages/Implement stored procedures items and associated notes/threads here - http://wiki.postgresql.org/wiki/Todo

Beyond that, my other gripes about PGSQL are all surrounding the fact that it's simply not as polished or as widely used as SQL Server. I can't just drop PG in at some business site and expect the IT staff to be able to handle maintenance.

The Postgres GUI tools are extremely lackluster compared to the standard Microsoft SQL Server tools. Even the third party apps like Navicat are simply not as robust or as trustworthy as SQL Server Management Studio and the SQL Server Profiler. And there is literally nothing out that that compares with SQL Server Data Tools.


What's the difference between "real" stored procedures and postgres functions?


I would imagine PG doesn't place much emphasis on GUI tools, being primarily hosted on Linux/BSD... Does your criticism apply to CLI tools too?


> In pgsql if the columns are the same type and number then we can use a union, otherwise the suggested alternative seems to be to just make 2 queries.

If the two commands have nothing to do with each other then yes that's usually the best option. That does open you up to understanding transaction semantics if you want to ensure both see exactly the same snapshot.

Another option is to use JSON as the great normalizer. Any row can be converted to JSON via row_to_json(...) or manually via the more low level functions. That allows you to "stack" multiple, distinct, JSON responses atop each other. Probably a bad idea 99% of the time to use something like that but it can come in handy that one time you want to jam things through in one query.


You didn't state a single reason why you think that grabbing multiple resultsets in a single trip to the database is a bad idea though...

The only real argument I've ever seen against it has to do with "Separation of Concerns". Some people don't want any logic in their database. However, there's never any reason why.

(I bet you've UPDATEd multiple tables in one query though.)

Reasons are often given, sure...but they're almost never based on that persons experience because PGSQL users don't have this feature. So, they'll never know just how awesome it is to have real stored procedures like TSQL has.

Anyway, it's not about "jamming" things into one query. It's about composability and PGSQL is lacking it.


> You didn't state a single reason why you think that grabbing multiple resultsets in a single trip to the database is a bad idea though...

It's a bad idea for the same reason that having a function that returns back multiple values is a bad idea. Yes it's useful sometimes but on the whole it ends up being confusing. From a result set processing perspective it's also a pain in the ass as your app code is now tied to handling multiple results in a particular order, i.e., more/tighter coupling.

> (I bet you've UPDATEd multiple tables in one query though.)

You'd win that bet. There are a lot of ways to do this and it's a legit operation because you want N things to change at once (where N isn't in the same table).

> Reasons are often given, sure...but they're almost never based on that persons experience because PGSQL users don't have this feature. So, they'll never know just how awesome it is to have real stored procedures like TSQL has.

> Anyway, it's not about "jamming" things into one query. It's about composability and PGSQL is lacking it.

What's your goto "killer example" of what this could be used for?

I've used SQL Server quite a bit[1] and while it's not a terrible database, on the whole I don't like it. Rattling off at random: lack of MVCC (sure they have it now but it's not the default and still has warts), explicit locking that drives you nuts, defaulting to case insensitive string comparison, lack of built in functions. The inane licensing options don't help either.

I'd take Postgres over SQL Server any day of the week and twice on Sundays (i.e. side projects).

[1]: Officer I swear! It was already like that when I got there!


> It's a bad idea for the same reason that having a function that returns back multiple values is a bad idea.

OK, what's the reason then? I didn't hear you give one.

Every PGSQL query already return multiple values because you get a command status along with the result set.

Every Golang call returns 2 values. They even make it very easy for you to return even more values. So what is your reasoning?


He did give you one, you just aren't listening.

Having a function return the minimum and maximum of an array is returning one value. The value is an array of size 2. Having a function return an array of the minimum, maximum and the number of states in a given country is returning two values.

You are asking PG to return multiple unrelated data sets. The number of use cases that this is useful and good is pretty much for displaying reports. Do two queries. The problem is already solved.


"You are asking PG to return multiple unrelated data sets

Who talked about 'unrelated'?

Just as one can return min, max, average and standard deviation of column C in table T in one call, one can, for example, return that, the top ten records with highest C value and the records with the top ten most common values for C.

Yes, you can do two queries, but doing them in one go can be faster, some times much faster.

The risk, of course, is that one gets "one stored procedure per screen", but as long as one is aware of that, I don't see anything inherently wrong with that.


> Do two queries. The problem is already solved.

I can think of a handful of scenarios where the second result set needs to be based on something from the first, and combining them can make some sense in that case, but it's a stretch. For "long running process" or "slow query time" arguments... eh... that may be a problem, but it'd be very much an edge case for most users of the DB engine in question.

Dunno if Oracle supports this or not, but pretty sure MySQL doesn't and PG doesn't. MSSQL is the only DB engine I know that supports multiple result sets from one stored procedure. If you want to tie yourself to features of just one DB, that's great - most people do in one way or another, just don't expect everyone else to support that particular feature or syntax.


> OK, what's the reason then? I didn't hear you give one.

Read the rest of that paragraph. You only quoted the first sentence.

> Every PGSQL query already return multiple values because you get a command status along with the result set.

And most generic database driver interfaces (ex: JDBC) return that out of band. Errors get turned into exceptions and update counts get returned back as integers. That's far from perfect (particularly with JDBC) but it's usually done to fit the programming paradigms of the driver's native language.

> Every Golang call returns 2 values. They even make it very easy for you to return even more values. So what is your reasoning?

That's because Golang doesn't have exceptions. Every function that could fail needs to indicate if there's an error. It's like errno in C but per-function.


He gave an opinion. That is not a reason.


> It's a bad idea for the same reason that having a function that returns back multiple values is a bad idea.

Every single language that lacks multiple return values has some kind of hack to compensate, like varying parameters, that breaks encapsulation and makes code inconsistent.


You could also wrap your output in a struct, or tuple if so inclined


On a tuple, yes. That's how most language implement returning multiple values, but you need good accessors. On a struct you'll need to declare it first.


You can write a plpgsql function that returns a setof refcursor (pointer to cursors). You can then iterate these cursors to retrieve the results from an arbitrary number of tables.


The slides are from mid-2013, how many points are still valid? Has there been progress in the last 3 years?


The google thing gets worse every release.

The Django project fixed it by adding canonical links to all the docs, see for example view-source:https://docs.djangoproject.com/en/1.9/ref/models/querysets/ (Has a <link rel="canonical" href="https://docs.djangoproject.com/en/1.10/ref/models/querysets/...)

The postgres project really needs to do that. It's a mess.


> The slides are from mid-2013, how many points are still valid? Has there been progress in the last 3 years?

From skimming through, I'd say the vast majority. Not sure about the PDF doc link issues as I only use the online HTML Docs but the overall generation of the docs is still a black art.


I'm just trying to get started at PG contributing, and a lot of things regarding the process and regarding the strong emphasis on backwards compatibility ring true.

But my experience is very limited, so take that with a grain of salt.


Sublime Text works well for me as a Postgres query editor. Rock solid super fast, unlike all alternatives, (I've tried them all) except psql, which is solid but ugly. ST3 has for me a much nicer interface and has amazing search and replace ability, attractive themes, syntax highlighting from pg specific plugin. Remarkably it even copes with queries that return millions of rows. You can control it's behaviour with the same config options as psql, full feedback error messages, line numbers etc Easy to setup build system, see http://blog.code4hire.com/2014/04/Sublime-Text-psql-build-sy...


Have you look at DataGrip? (https://www.jetbrains.com/datagrip/) Been using since it was in beta, and it handles SQL Server and Postgresql well. (haven't used it for other platforms yet)


I always enjoy good critique. There was a similarly excellent talk about the differences between Python and Ruby, which discussed the strengths and weaknesses of the design choices each made. It was fascinating. Unfortunately, I can't remember the link...


Maybe http://www.wikivs.com/wiki/Python_vs_Ruby or one of the links at the end of that page?


Extra relevant link cause when you try to go to 'main page' from that article you get a postgresql error!


I know a lot of people use Postgres (myself), and I was looking at the source recent, it seems pretty high quality. Makes it all the more surprising that there aren't really any performance tests.

Reviewing and pulling things in is always hard, though. In other software I highly recommend always breaking up big things into smaller changes if possible, but I don't know if that works well for the conservative release cycles for DBs.

Though maybe the conservative release cycle is what's wrong? Having to do a refactor to implement a feature doesn't mean that you need to do both at once.


Oddly, I don't see anything about managing user permissions, access, and authorization. It's wicked arcane compared with every other DB out there. I believe one main reason MySQL caught on early with hosting providers was because of this.


That's odd, I feel the opposite. For years the way to create a user in MySQL was this:

    grant usage on *.* to 'bob'@'localhost'
      identified by 'itsasecret';
Meanwhile, PostgreSQL had the extremely straightforward "createuser" command line tool, and the "create user" SQL statement. (MySQL got "create user" and "alter user" around 2006 or so.)

Having to edit pg_hba.conf to allow network connections was always an annoyance for me, but at least the error message you get if it's set up wrong tells you to edit that file, which is also very straightforward and contains documentation out of the box.

Not allowing external access by default is arguably a security practice, though I always thought distro maintainers ought to have erred on the side of usability by enabling localhost access by default.


The main thing I've hated about postgres over the years is pgadmin3. Haven't had to use it for a while but it was always easy to get it into a pickle in common situations, like losing the connection to the database.


Yeah, PGAdmin is fickle and not very intuitive. I've heard there are some decent alternatives, though, like Navicat.


DBeaver is pretty good.


Have you tried pgadmin 4? It's a complete rewrite.


It looks pretty awkward to me. I don't get why they felt it's a good idea to do it in a browser with spaghetti-bowl of jQuery full of second-long animations.


It is indeed awful, albeit functionally superior to pgAdmin III.


I haven't had any postgres projects in the last year or so, so no - it does look a lot more functional than 3.


One thing that I really wish existed was a comprehensive guide to vacuuming, transaction id wraparound and similar issues. There have been a few posts recently that discussed problems that arise, but what I haven't seen is a sort of comprehensive "here are the issues that exist, here is what you must monitor in order to be safe" type of guide. The official docs are helpful, but fall short of that kind of guide, and none of the posts I've seen are really comprehensive.

Unfortunately, I'm pretty new to postgres and can't write it myself.


I found this to be great introductory that touches on all the major points to keep in mind with Postgres. It's pretty up-to-date too!

https://www.youtube.com/watch?v=knUitQQnpJo

It isn't quite what you ask for, but should be enough to get you high-level overview of major systems and contains solid practical advices (on vacuuming, mvcc, replication, backups, etc…), so you know what to google for deeper understanding.


Thanks! It's too long for me to watch right now, but I've saved it for the next time I can dedicate that much time to a video.


Feel free to skip to any topic, it is quite well structured to do that.


I'd be all over pgsql if replication wasn't such a mess. Yes it works (presumably), but I can't find any useful (e.g. informative) information on a comparison between (the many??) methods to go about it and/or most modern way to accomplish it.

If anyone here knows of a good source, I'd love to see it :D (If you use the word "just" or link me the docs, you're dead to me).


Why do you reject the documentation? It has exactly what you're asking for:

https://www.postgresql.org/docs/current/static/different-rep...

It describes the various methods that are available, it lists implementations of each, it explains the tradeoffs involved, and so on. Table 25-1 summarizes the information.

Regardless of the database system being used, replication is just inherently complex. There isn't really a one-size-fits-all solution. The method to use depends on the requirements and context of a given implementation.


I think you might want to watch this tutorial from january 2015: https://www.youtube.com/watch?v=GobQw9LMEaw from Josh Berkus.

It starts with a comparison of the different ways available for replication.


Calling it a mess is unfair. It's actually quite straightforward, and the documentation (see sibling comment) guides you through the pros and cons.

For example, if you run with streaming replication — the recommended approach for most setups — then you can decide to enable WAL logs or not. If you only want replication, and not point-in-time recovery, then just running with WAL-less streaming replication is extremely simple and straightforward.


I really like the slideshow UI. Anyone know which software was used to make this?


This is made with Reveal.js, but there is another one as well called Impress.js, they seem quite equal.

[0] https://github.com/hakimel/reveal.js/

[1] https://github.com/impress/impress.js/


10 slides in, and I realize that my back button is fucked.

Why do people do this?


If I was doing slides like this, it would be because I was going to use them in a talk. Then I would want the back button to go back one slide.

If I then put them on the internet, I may not bother editing them to make the more "internet friendly".


I personally would expect the back button to go back one slide, and would argue the opposite: that having the back button skip all the way to the previous web page is unintuitive and not "internet friendly". I mean, imagine if this slide deck were implemented using normal links: the fact that they are using some JavaScript-oriented feature to change the hash instead of changing the path shouldn't change the functionality, and implementing this without JavaScript would make it more clear that this is conceptually a page transition.


But how would you go forward in the first place? Not with the "Forward" button, because you haven't visited the page yet.

Instead, you press "Right" or "PgDn" or what have you. So I would expect the converse button ("Left" or "PgUp") to go back a slide as well (which it presumably does).

That leaves the browser's Back button to go back to the previous web page.

Everybody's happy!


>That leaves the browser's Back button to go back to the previous web page

Why approach 10+ slides as "a single page"?

If you consider them as 10 web pages, the back button makes sense to go back one page. Same for the left button, given the "presentation" use case.

And if they were actually designed as webpages, with html links taking you to slide2.html, slide3.html etc., that's exactly what you would get.

So if anything, whether this is a SPA or not, this is more in tune with how the internet works, and how it was designed to work.


This way a person can link to a particular page.


You CAN link to a particular page. Here, page 14.

http://kupershmidt.org/pg/10_Things_I_Hate_About_PostgreSQL/...


It's reveal.js http://lab.hakim.se/reveal-js/#/, a pretty easy way to put presentations together. Just so you know where to direct the blame ;)


It's not messing up the back button for me.


Just needs a [2013]


Yey, looks like people just watch presentations, not read it, that's why here is 1 comment about the content, 15 about how the presentation looks like :)


I'm not even able to read it. I can't figure out how to go forward from the title page.


How about pressing space or right arrow key? If this doesn't work, make sure it's focused by clicking on a slide. Oh, and there are controls in bottom-right corner.


Since time immemorial, the way to go forward in such pages is to press the left and right arrow buttons.

This has worked in 99.9999% of presentations uploaded on the internet and posted on HN.

It's also very intuitive to try, since this is how it is in any presentation desktop app too.


> Since time immemorial, the way to go forward in such pages is to press the left and right arrow buttons.

"Time immemorial"? Damn, I remember pretty clearly what life was like before Web 2.0, and I'm still in my 20s. The first time I encountered one of these presentations I was confused as hell, at least this one has the decency to put arrows to click in the bottom right corner, a lot of presentations don't.


>I remember pretty clearly what life was like before Web 2.0, and I'm still in my 20s

Well, I'm in my late thirties, and "before web 2.0" is "time immemorial" in tech years.

>The first time I encountered one of these presentations I was confused as hell

Yeah, but that should have been like 5-10 years ago. How come people still don't get them?

Heck I was confused as hell when I first encountered DOS, mice, GUIs, UNIX, browsers, etc back in the day. But we learn and move on. What puzzles me is that HN is not full of "average users" but devs and technies, and also the fact that such presentations are posted tons of times a month, and yet someone still asks...

(Sure, I can understand that this could imply a "fundamental non intuitiveness" of such UI, but whether it's intuitive or not when we first meet it, it should be second nature by now.

I believe in "idiom based design" over intuitiveness (which constraints us to UIs that we can understand at first glance, preventing designs that could need a little getting accustomed to, but be far more powerful in the long run).

And I'd argue it's not even that non-intutive. From games to Powerpoint, all kinds of apps use the arrow keys to navigate -- why wouldn't one at least try them?


> How come people still don't get them?

The reason these comments keep appearing is because tech community is large & growing and each person encounters & is confused by their first presentation in this style at a different time. As much as I hate the relevant xkcd meme, it is pretty apt here: https://xkcd.com/1053/

> Mice

This isn't a good analogy because you are in control of whether you use a mouse or not. If it confuses you, you'll either stop using it or figure it out before participating on HN. OTOH you can surf the web for years without encountering one of these presentations, so it's jarring when someone else creates a web page that violates your expectation of how the web works

> And I'd argue it's not even that non-intutive. From games to Powerpoint, all kinds of apps use the arrow keys to navigate -- why wouldn't one at least try them?

Because people are used to navigating the web with their mouse, not their keyboard


I can't read it, because it requires JavaScript to display images and text. Apparently HTML isn't good enough for that anymore.


Arrows! How do they work‽


What an awful format for slides.


Disable CSS (in Firefox: View, Page Style, No Style) and the problem is solved.


I loved them, looks great, easy to read.

Liked the animations too. Not too much, but looks good.


Besides the fact that the font is really tiny, it's actually much nicer on phone than most of the "n things..." websites which seem to try and maximize clicks.


The font is _too_ tiny. I couldn't read the mono space parts.


Really? It all looks perfectly legible for me on FF and Chrome, on both win + lin. It'd be even more so if fullscreened like you'd expect a slideshow to be presented.


Probably, but the context here was mobile. The monospaced parts are too tiny on my iPhone 5c.


Shitty life pro tip : Get a pair of 2x glasses at the dollar store for all your phone reading needs.


What exactly is awful about them?


It seems great for presenting them but after the fact:

Pollutes browser history(or if you click too fast, doesn't even have any), not convenient to browse at your own pace, no way to jump to a specific slide(visually)

The formatting is really good too and would probably work great as an article format but you're stuck reading tidbits, going back and forth.


>Pollutes browser history(or if you click too fast, doesn't even have any)

In what way browser history, usually full of tons of random items and webpages one has opened, should not be polluted? It's just a log, not some kind of organised curated list of webpages.

>not convenient to browse at your own pace

How so? I can click left/right and go wherever I want at whatever pace I like.

>no way to jump to a specific slide(visually)

That, I give you.


Website is broken. UX fail.


It's amazing that in 2016 people still don't know to press left and right arrows (or spacebar) in an online presentation.

It's the way ANY slideshow works, on desktop and online form. And it's a trivial thing to try, even when not told to. Not to mention by now there have been around 1000 such slideshows posted on HN.


> It's amazing that in 2016 people still don't know to press left and right arrows (or spacebar) in an online presentation.

It's amazing that in 2016 people still don't fall back gracefully in the absence of JavaScript and/or CSS.


Sorry, but no it's not. You're in the extreme minority if you don't have JS or CSS. It's simply not even worth the time to cater to you.


> It's simply not even worth the time to cater to you.

That attitude is evil. That attitude underlies much (albeit not all) that is wrong with the Web in 2016. That attitude is wrong.

Espousing that attitude evidences a profound failure to understand the value of the Web.


It takes more time to make an unfriendly JavaScript UI then it takes to make normal pages.

It is worth the time to make your pages without JavaScript if you care about googlability, and accessibility.


To add to it, the issue is about purpose. There's nothing wrong with an application that needs JavaScript because it can't function without it. My day job is one such application, and I've written others as side-projects (anything that does calculations without a server-side component is an instance).

But this presentation is just a bunch of text with navigation. This is what the web was written for, and it should work no matter whether you have CSS or JavaScript on, and it should use progressive enhancement to add on the useful features. That means using hyperlinks for navigation and CSS that doesn't hide all the content until your JS loads (which seems to be the problem with Chrome. The page actually is readable in Lynx).


>It's amazing that in 2016 people still don't fall back gracefully in the absence of JavaScript and/or CSS.

Nothing amazing about it. At this point those are given prerequisites.

It's like websites are not catering for a fallback for lack of internet access or electricity.


> At this point those are given prerequisites.

No, no they are not. JavaScript is a privacy-destroying technology. It is a security-impairing technology. It is also pretty neat: it can be great in self-hosted applications, and it can add nice functionality when users trust a site.

Requiring it is akin to a restaurant requiring customers to deposit their wallets at the bar; it's akin to the police requiring the ability to read one's mind (with the promise that they won't abuse the privilege). Requiring JavaScript in order to display text and images is like the guys at the carwash requiring you to give them your house key as well as your car keys.


>No, no they are not. JavaScript is a privacy-destroying technology. It is a security-impairing technology. It is also pretty neat: it can be great in self-hosted applications, and it can add nice functionality when users trust a site.

First, whether it's "privacy-destroying" or "security-impairing" is orthogonal to it being a prerequisite for the modern web. Especially since 99% of the people don't see it that way in the first place.

Second, they (government etc) can read all your mails, tap the backbones, and store your phone-calls, even track your moves through cell towers, and know everyone you spoke with. The websites you visit is probably the least interesting information about you.

>Requiring it is akin to a restaurant requiring customers to deposit their wallets at the bar; it's akin to the police requiring the ability to read one's mind

It's akin to this argument going nowhere because of hyperbole...



I still don't understand database systems, how they work internally, and what problem they really solve.

It seems that new paradigms like R and RAM-only key values system are just simpler, faster and cheaper in programmer time. Loading everything in RAM and do a search seems like a huge saving of time and just works for most cases. Usually if you have more complex needs, you need to adapt your solution, and databases don't seem like a silver bullet.

Database queries seem like a solution to the problem of storing data on disk when RAM was too expensive. So today it's still used as some sort of standard, but when you can have 16GB of ram, I think you better teach yourself what sort of algorithm and data structures a database use to be faster, use the ones you like and need and solve your problem case by case.

The example of how reddit store its data is pretty demonstrative that ultimately, you should not let a database system do all the work. Databases are just a file format to me, but the way it tries to work for you at a lower level will respond to basic cases, but when you increase complexity it's not relevant anymore.

Especially today when you have big data and machine learning, everyone should just learn to understand data manipulation. Not saying to teach yourself C all over again, but having a decent idea of the math of what indexing is really about. Forcing yourself to use a database because the company always used it isn't appealing to me.

It's like one of those things when a programmer had an idea which is based on a constraint, everyone starts using it, several products are made, but nobody really remember the original idea of the inventor of that paradigm.


Time to wheel out pseudo-Greenspun again:

"Any sufficiently complicated NoSQL program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of a decent RDMS."

Or maybe: "Those who do not learn history are doomed to repeat it"


I guess companies like Google, Facebook, Twitter, Apple, Microsoft, LinkedIn, eBay who run their entire core business on NoSQL databases don't know what they are doing.


Facebook, Twitter, Google and LinkedIn all use MySQL. A lot of it. Yes, they use other datastores, but claiming they run their entire core business on NoSQL is patently wrong.


> but claiming they run their entire core business on NoSQL is patently wrong.

Those companies are very large and thus they use many different technologies.

> Yes, they use other datastores

I guess that's what he meant by nosql. Overall he is saying nosql are for very specific, very large scale needs.


The key word there is entire. They do not use exclusively NoSQL storage technologies. They use a mix of NoSQL and SQL. There are tradeoffs to each, and if you have as many really, really great engineers as such places employ, it can make sense to deploy whatever database technology best suits each narrow storage problem in order to meet absurdly high levels of usage. For 95% of us, who have neither the resources or the needs of a business that large, using a relational database for everything is probably the right choice.


The post I was responding to seemed to be arguing that relational databases were redundant and that was the point I was reacting to. I certainly didn't intend to suggest they were the only game in town.


>I still don't understand database systems, how they work internally, and what problem they really solve.

The use a formal, mathematical abstraction (relational algebra), even if somewhat crapily implemented by particular RDBMS, to facilitate saving data, relationships between data, and queries, while at the same time assuring certain properties (e.g. ACID) and doing so in a platform/programming language neutral format and with a consistent and industry standard query interface.

NoSQLs, key-stores and the like are not newer developments -- they predate databases. They were found to be a bad fit for what we wanted in enterprise use, and RDBS caught on.

For uses with huge data (Google and such scale) a denormalised approach might be more practical for performance reasons, in which case developers and ad-hoc programs get to re-implement all the functionality and assurances of a RDBMS in an ad-hoc way on top of rawer stores. (And all this could be alleviated with a properly optimised RDBMS for such purposes the still respects relational algebra).

In a conventional enterprise setting on the other hand, a DB trumps NoSQL etc solutions any day of the week.

>Database queries seem like a solution to the problem of storing data on disk when RAM was too expensive. So today it's still used as some sort of standard, but when you can have 16GB of ram, I think you better teach yourself what sort of algorithm and data structures a database use to be faster, use the ones you like and need and solve your problem case by case.

The whole idea is to free the data from being tied to a particular language, data structures and algorithms.

Back in 2000-2007 it was all about XML, and how we should store data in XML format and get them back with XQUERY, XPATH and the like. A lot of people bought into the hype and the resulting products. Then it was about JSON -- and we now have JSON stores. In 10 years, it will be something else, again ad-hoc.

Meanwhile SQL has worked for the past 3+ decades, and is based on a solid mathematical abstraction (relational algebra).

>It's like one of those things when a programmer had an idea which is based on a constraint, everyone starts using it, several products are made, but nobody really remember the original idea of the inventor of that paradigm.

You'd be surprised. Or rather you're exactly right -- few "really remember the original idea of the inventor of that paradigm", and that's why we're moving in circles with ad-hoc technologies re-implemented 30+ years after they were discarded. Or why people jumped enthusiastically to Mongo to return crying back to PostgreSQL.

Read a little around here for example:

http://www.dbdebunk.com/2015/07/the-sql-and-nosql-effects-wi...

http://www.allanalytics.com/author.asp?section_id=2386&doc_i...

http://www.dbdebunk.com/2015/11/moving-in-circles-sql-for-no...


JSON is just as much rooted in math as SQL tables - it is `fix T = FiniteMap String T` plus a few primitive types, while SQL is `Collection (Set [String])` plus a few primitive types.

Of course, SQL has a nice efficiently-implementable algebra of set comprehensions, which may or may not be interesting for your particular use-case.


>JSON is just as much rooted in math as SQL tables - it is `fix T = FiniteMap String T` plus a few primitive types, while SQL is `Collection (Set [String])` plus a few primitive types.

It's not the tables that make SQL what it is, it's the relations.

Heck, it could even not have any types at all (a la dyamic languages or Sqlite), and relational algebra as an abstraction would still hold.


> In a conventional enterprise setting on the other hand, a DB trumps NoSQL etc solutions any day of the week.

For a data warehouse yes people are still using RDBMS like Teradata.

But in enterprise environments most are now ingesting this data into Hadoop based data lakes and using NoSQL systems to drive the core analytics.


GB of RAM is fine until your database is one terabyte.

Also, RDBMS do constraint enforcement for you and that's absolutely critical when you work with any sufficiently big database (I work on a 100GB DB with a team of 6 people, changing here and there and I can't tell you how crucial the constraints are to maintain the data quality in face of bugs, misjudgment, etc.). And no, it's not for lack of testing. When you work on millions of super complex records, you always miss some corner cases (unless you're NASA and have a huge budget to test thoroughly)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: