That's probably the best way of looking at it. PostgreSQL is generally going to ...

escherize · on June 18, 2015

What is a good space to look at when you get write heavy data to the point of streaming?

brightball · on June 18, 2015

Depends on your application. In most cases your "write heavy" is going to be isolated to 1 or 2 tables worth of data (or equivalent) and for those cases some type of NoSQL solution can be a really good option. Especially since you can use a PG foreign data wrapper to allow PG to run queries against that information too.

If it's system wide and you need PostgreSQL itself able to handle it, using PG's async features is one potential option but setting up and managing a Postgres-XC cluster would be the next best. XC allows scale-out for writes. If you're at a company with a budget for that type of thing I think I remember reading that EnterpriseDB (the PG company) is offering first class support for PG XC.

In most cases though, I find that the write heavy parts of a system are so isolated that diverting them to a simple NoSQL solution tends to be easiest (Mongo, Couchbase, DynamoDB from AWS, etc).

fennecfoxen · on June 18, 2015

Depending on your exact needs, Cassandra's not a bad idea for this space. O(n) scalablity for write capacity (+), and even on an individual node it's pretty write-friendly: its sstable data structures stream to disk well, keeping spinning disks happy while avoiding write amplification issues on SSDs. It does help if the data is nicely shardable, of course.

That's the one I'm familiar with, anyway.

(+) Write capacity is O(n) as you add machines but an individual write's time is pretty constant and cluster-wide maintenance operations do start taking longer as you add machines and they gossip to each other. It's not magic, obviously :)

marcosdumay · on June 18, 2015

The filesystem.

Ok, that not completely serious, but almost completely.

spacecowboy_lon · on June 18, 2015

Could work dump the incoming steam to disk in some sensibel text format and have an asynchronous queue process the data into the database.

neeleshs · on June 18, 2015

At that point may be something like Kafka starts looking attractive

floppydisk · on June 18, 2015

It depends on what your needs are. If you just have a lot of data that is coming in quickly but you aren't doing constant analysis on it, you can still use Postgres. Switch to large batch writes for getting the data into the database to reduce the transaction overhead and look at using a Master-Slave replica setup. The 9.4/9.5 log replication features worked really well last time I used for them handling streaming data. We had a write master and a read slave and optimized accordingly. It worked pretty well once we got the log replication tuned accordingly.

dlss · on June 19, 2015

The kind of setup described by http://c2.com/cgi/wiki?PrevalenceLayer works quite well