I am investigating replacing MySQL with MongoDB in my model layer for my next prototype.
My mind is still thinking in 3NF though. I understand dernormalization and avoiding joins will be useful from a performance standpoint. However I am unsure when to either go ahead and include a foreign key, retrieve it and perform a second query from the application layer, and when to go ahead and duplicate\embed all the field data. I'm leaning towards just making 2 simple sequential key lookup queries, the 2nd on the retrieved foreign key, rather than duplicating fields everywhere and keeping track of massively cascading changes. Instead of performing 1 MySQL Join. Although I usually think in terms of minimizing roundtrips to the database server.
Wondering if anyone has a heuristic for this or suggested reading?
Please ask yourself why. In reality, MongoDB has almost all of the same limitations of MySQL and PostgreSQL, but lacks the production proven record. In addition, MongoDB has very weak durability guarantees on a single server, poor performance for data that is not in-memory, and continues some common SQL/ACID scalability pitfalls (use of arbitrary indexes and ad-hoc queries).
Outside of this, you need to switch the question you ask from "what data do I need to capture?" to "what questions do I need to answer?"
Looking for a way to hands-free scale very cheaply without vendor lockin. Would be nice if I can simply add another machine to the database to the cluster. And not have to generate the ids in the application layer, use hashing algo to select correct machine, and have everything stop working when a single database goes down. Seems like it should be a solved problem by now. Investigating new tech won't set me back much time, and my MySQL queries aren't going to disappear if I don't like it. Furthermore looking at large sites such as Flickr that massively scaled MySQL it seems like they stopped using its relational features anyway.
It's not as "hands-free" as you'd like to believe. Check out the MongoDB sharding introduction[1]. There are some pretty big caveats. Very few people are using auto-sharding at scale in production (bit.ly and BoxedIce are all I know of).
There are other operational issues with MongoDB. MongoDB can only do a repair if there is twice the available disk space as the database uses, and the server must be effectively brought offline to do this. To reclaim unused disk space, you have to do a, you guessed it, compact/repair. Want to do a backup? The accepted way to do this is to have a dedicated slave that can be write-locked for however long it takes to do your backup. They suggest using LVM snapshots to make this short, but disk performance on volumes with LVM snapshots is terrible.
I would consider using MongoDB for a setup that would either be either non-critical, completely within memory with bounded growth (which itself sort of begs the question...), or involve mostly write-once data, such as feeds, analytics, and comment systems.
Well my platform is n number of $20 linodes to start. I'm clustering the python application across them using uwsgi+nginx (all I have to do is add an IP address in the config to scale), it's going to be a given that I shard the database across them as well. If you feel I should avoid Mongo would you recommend Cassandra instead?
Regardless, I think my initial question regarding when to denormalize data applies to any database including scaled MySQL, but perhaps was a better question for stackoverflow.
Cassandra has it's own hurdles, but I think if we're talking about getting your mind in the right place, it might be a better answer. Cassandra definitely has a much more mature scalability implementation that isn't caveat-ridden like MongoDB is. It's operating at scale at both Twitter and Facebook.
Cassandra has online compaction, but still requires up to 2x space for compaction. However, Cassandra does not have to do a full scan of the entire database to do compaction, and almost never actually uses the 2x space. It's also much easier to maintain a Cassandra cluster, because each instance shares equal responsibility, and replication topology is handled for you.
Despite what their fans will say, these are both beta-quality products.
Care to back this up? I think MongoDB is awesome from what I've seen so far and I've heard nothing but good thiings about it even in the performance sector. I'd love to see more info on what you're saying. Thanks.
MongoDB is designed specifically for multi-server durability. All writes are done in-place. It will support single-server durability in 1.8. Until then, 10gen strongly recommends against using a single-server setup with any data you care about[1].
I don't have any ready figures in front of me for disk performance, but MongoDB uses an in-memory layout and memory-mapped files, which provide a far less than optimum data layout for disk performance. As you might imagine, it works well with SSDs, but the performance is awful on rotational disks. Foursquare's outage was caused in part when their MongoDB instance began to exceed available RAM. My own experiences with larger-than-RAM MongoDB collections mirror this conclusion. Under these circumstances, you're likely to see much worse performance with MongoDB than MySQL or PostgreSQL, especially with concurrent writes[2].
As far as the SQL pitfalls, and database in general, don't think for a second MongoDB has some magic that exempts it from performance problems that RDBMS' have. Start using any of the familiar SQL scaling "no-no" features: multiple indexes, range queries, limits/offsets, etc.... and it's going to start exhibiting performance characteristics like a PostgreSQL or MySQL database would under those circumstances.
The temptation with MongoDB is to be lured into this idea that a brand new, immature database with convenient, SQL-like features can perform significantly better than it's highly-tuned RDBMS brethren. There are some reasons to choose MongoDB over MySQL and PostgreSQL, but performance should not be one of them.
All the same limitations? Most SQL databases don't support anywhere near the amount of atomic operations that MongoDB provides. Also sharding MongoDB is a lot easier than sharding MySQL and PostgreSQL.
I'm a little confused by this; how do SQL databases not support more atomic operations that MongoDB, seeing as you have full control of the transactions themselves? Incrementing, adding to a list (generally done with inserting a row in another table), etc., are all standard operations.
We used mongodb for a weekend hack/prototype cycling news site. It was an excuse to play with Mongo, not because we felt we'd need Mongo for this particular site. I am very glad we did use it, because I now understand some of the cool things you can do with Mongo.
Basically, if you are prototyping and have a few extra hours to spend playing, I would say go for it. Can't hurt to understand the tool, so you can pick it if it makes sense for what you need to do down the road.
Do HNers still watch TV?
I stopped paying for TV ~ 2 years ago and find it mildly nauseating when I visit relatives and they want to have it on in the background.
I like my push TV service (i.e. cable) for the following things:
1. sports -- there is no viable alternative at the moment.
2. Sometimes I just want to veg and channel surf (be passively entertained). There is no good way to do this with online services or recorded stuff.
3. Sometimes I need push infotainment because I don't know about the existence of something to actively pursue interest in it. While HN and Reddit (and so on) are rapidly replacing this niche, there are some things they just don't do well, somethings are just more interesting when presented as a video first, rather than as a link (cooking stuff comes to mind...).
That being said, having the TV on in the background is different than "still watch TV", and I agree, it is a bit annoying and sad.
I watch live sports events, there's no good on-line replacement for that yet. I have the minimum level of TV + Internet from Comcast, as that's cheaper than Internet alone.
We haven't had broadcast TV in our home for 10 years, and haven't had an actual TV for about 7 years. It's been Netflix and Hulu, mostly. Hulu's commercials have grown more intrusive lately, so Hulu may be on the chopping block very soon.
> wealth would be 'the ability to follow curiosities and interests, whatever they may be.'
How would you differentiate between the terms wealth, freedom, and happiness (or do you consider them interchangeable)?
I think from those 3, 'wealth' has the heaviest connotation of objective comparability and applicability to organizations in addition to individuals. (ex. a wealthy company or a wealthy country)
Wealth is the ability of an entity to create change.
Skills you know, your business network, your emotional support network, assets you can exchange.
People with great wealth can affect the world in profoundly positive or negative ways. In other words, wealth is closely tied to the notion of power.
A member of congress who has relatively low networth could have as much financial influence as Bill Gates. While this power is not as liquid or easy to exchange as cash, it is still exchangeable and a store of value.
[ To directly answer the OP: No we should not exclude speculation, everything is speculation to some small degree. Even holding everything in cash is a speculation on that nation's currency. We'd have to use other measures to estimate expected security\longevity, like liquidity of assets. ]
> their backers were betting that Mr Obama would push through an energy bill that would force America to embrace alternative sources of energy more aggressively.
Give him a break, getting healthcare reform passed already involved flame-baiting half of congress to the extreme. There's a thing called filibustering that massively slows down all legislature once anyone bill becomes flame-baitable enough. There is no way any president could have halted the greatest financial collapse since the great depression, passed education reform, passed healthcare reform, AND also gotten to a new energy\immigration policy in 18 months, while filibustering exists, without assuming dictatorial power.
> They don’t need to be rewarded for risk, because they actually get utility out of risk itself. In other words, they like adventure.
Completely flawed. Every sane person seeks to reduce risk. What entrepreneurs also seek is to maximize gains. Increased risk is merely often required to do so.
More importantly, distinguishing gains from risk is only meaningful if gains are entirely decoupled from risk. I claim that many entrepreneurs seek non-monetary gains like prestige or autonomy, and pursue strategies that maximize prestige or autonomy at the expense of their risk-adjusted financial gain.
A taste for risk and/or a taste for the pure satisfaction of managing risk is a non-monetary gain. If we accept my claim that some entrepreneurs seek some non-monetary gains, how can we be certain that no entrepreneur seeks risk as its own reward?
My unprofessional opinion is that some entrepreneurs seek risk and rationalize their risk-seeking behaviour as a quest for maximal financial gains. I'd extend that risk-seeking and rationalizing behavior to explain why many people join early-stage startups. When you look at the average risk-adjusted return of being a startup employee, a great deal of rationalization is necessary to claim that you're in it for the money.
"Gains" obviously did not refer to strictly cash. Gains refers to anything you value. Cash is just a tool. I would go further than you and say the vast majority of entrepreneurs seek autonomy. Use occam's razor when interpreting the statements of others.
I don't agree. "Sane" people introduces something which is frequently a false dichotomy; whether you are driven by your rationality, or by more visceral and deep-seated spirits. The problem is the difference between rationality and enlightened rationality: what is good for you personally in the short term, versus what is good for everybody in the longer term. These things can give contradictory advice, and both can be labeled as rational. But the second isn't intuitive to most people: it generally needs to be supplied from somewhere else, usually emotions.
Let's say you need to get drinking water from the river, but there are crocodiles down there too. If you minimize your risk, you'll get just enough water for yourself. On the other hand, if you get more than enough water, you'll be able to share your surplus, increase your social standing, find a more attractive mate, etc.
Ah! you say - but the "minimizing risk" here isn't actually minimizing risk, but increasing another risk, a risk that you'll never get anywhere in life, won't raise a family and pass on your genes, etc.
But that's an intellectual risk. It's not likely to dawn on you unless you're quite introspected, or perhaps until it's too late and you're in relative middle age.
What if there was a different mechanism? What if exploring the boundaries of your capability, your talents, was its own reward? You can't explore those boundaries without risk of failure, even where failure might include death. A simple mechanism for that could be risk homeostasis, whereby a certain manageable amount of risk becomes its own visceral reward, attracting you to those boundaries and encouraging you to expand them.
In other words, getting utility out of risk itself - "liking adventure".
> What if exploring the boundaries of your capability, your talents, was its own reward?
Then you're gaining something (discovery of new assets, freedom, happiness, sense of accomplishment, self-actualization, etc.) and seeking to maximize gains. Risk does not magically become utility. If gains are held constant, nearly everyone chooses the one with less risk because it has a higher expected value. Entrepreneurs are gain maximizers. Risk is only _ utility _ in the case of masochism.
I think you're missing my point. There are first order effects and second order effects. Animals (including people) don't usually understand second order effects very well; so we have evolved mechanisms to encourage us towards desirable second-order effects, even when the first order effects may be negative.
I'm pointing to the rewards of manageable risk as a mechanism - probably an evolutionary mechanism - for exploring boundaries and thereby gaining things, even if you didn't know they existed.
It's all very well to talk about the rewards of new assets, freedom etc., but the reward from a risky venture isn't necessarily obvious; it may even be utterly unknown in the history of human kind. But if the risk itself being rewarding is a mechanism, it may encourage the discovery of such rewards.
The key misunderstanding problem here is the overloading of language. We have this talk of rationality, of evolutionary psychology, of emotions and drives. The key thing to understand, though, is that all may simply be different ways talking about the same things.
I'm saying that both things can be true: that it's rational consideration of long-term goals that cause us to risk things; and that it's the intrinsic utility of risk itself as encoded in the genome and proteome for the self-directed organisms we call humans. What I think is wrong is to take only a single terminology, and use it to say the other terminology is mistaken.
>What I think is wrong is to take only a single terminology
Utility is a basic textbook economics term and the context of Arrington's article. Also, Arrington was an economics major. I think sticking to one terminology is highly preferable over acontextual obscurism.
Well, even staying within the bounds of economics, there are problems: are you talking about homo economicus, or behavioural economics? The latter brings in portions of the other systems into the economic model in order to better reflect the workings of the real world organism.
Economics is about the study of choice; but we can split that up into at least two broad categories, the most efficient choices, and the actual choices made by people. You can stay strictly within a so-called rational model for the first - and you must, in order to justify the inputs to your utility function - but the second is experimental, and relies on observed inputs necessarily defined by disciplines other than economics.
The insight of behavioural economics is that it's not so much rational maximization of gains that drives us, but rather imperfect mechanisms implemented in the organism, whose outcomes have been tuned by evolution to approach rational maximization. Leaving out the behavioural aspect means your model won't correspond as well with the real world, the only thing worth talking about. And I'm asserting that seeking a certain amount of risk is just one of those mechanisms.
If you want to go to Kahneman & Tversky, yes different heuristics people may use for estimating risk and reward are probably biased estimators (although I hope all pirate-entrepreneurs are using a little math and obtaining feedback ). That certainly does not mean risk, independent of gains, becomes intrinsic utility for the non-masochistic.
I studied cognitive psychology not economics ;) I'm familiar with the topics your bringing up and agree they're interesting but also think they are irrelevant to the issue of correlation vs causation and the red herring of "risk" when looking at what motivates entrepreneurs.
What's the opposite of risk? Most people would say safety; but I would say boredom. Stimulation becomes repetitive if there's nothing at stake. It wasn't for the expected gain of money that I bet on games during the World Cup last summer; it was to make the games in which I had no personal stake interesting. I don't know where you are, but internet betting is legal where I am.
I don't think appetite for risk is sufficient for entrepreneurial activity; but I do think it's necessary. So I don't think it's a red herring.
> Ah! you say - but the "minimizing risk" here isn't actually minimizing risk, but increasing another risk, a risk that you'll never get anywhere in life, won't raise a family and pass on your genes, etc.
This is a useful insight, that minimising risk at one level can mean maximising it at another.
I don't seek to reduce risk all the time. Sometimes I actually enjoy taking a risk. For instance, when given the opportunity to leave this house to go shopping I take a risk that I don't need to take. After all, I could get mugged, driven over or any other one of a thousand things that could go wrong on a trip to the shopping mall. It would be much safer to mail order everything in.
And that's not counting my decision to maybe do it on a bicylce, which we all know is less safe than my car (but I enjoy being out there). And I might not even wear UV protection risking skin cancer. And not wear a breathing mask to enjoy the not-so-fresh air.
Life is risk. Sane people (or at least I hope they are the sane ones, if not I'll be off to the funny farm tomorrow) will balance the risks they take against the upsides and will not seek to reduce risk per se.
Entrepreneurs are not unique in this respect, everybody does it, all the time.
Increased risk and knowing how to balance risk is a requirement for a normal life.
And that's not getting in to things like skydiving and bungee-jumping yet.
I think you're missing the parent's point. Your focus going to the mall isn't "oh man, I may get killed! That's so awesome!" but rather you're focusing on what ever you think you'll get by going to the mall. Likewise, skydiving and bugee-jumping provide an adrenaline dump. A "high".
- defining anyone who likes adventure as "not sane" in your book
- including "risk of regret over having had an unadventurous life" into your risk calculation
- just run an analogy with casinos, where the gamblers are purchasing 'fun'
Or you could explore the idea that rational choice theory isn't a set of fundamental axioms of human behavior but a (mostly successful) attempt at a descriptive theory.
That's simply not true (or else your definition of sane is very limited)
Just as a random example, the death rate for climbing Mount Everest is 10% (!) If sane people always seek to reduce risk then only insane people would climb Everest.
You seem to be implying a mountain climber makes no gains by tackling Mt. Everest. The risk is not what is preferred, the sense of accomplishment and fulfillment is what is preferred. Without gains to be had, the mountain climber would cease to choose decreased lifespan unless they intrinsically enjoyed suffering.
I have read an analysis of the teaching profession where highly motivated people choose against become teachers not just because of a low salary, but because of low salary variability. Expected teacher salaries don't get much worse even if the teacher does not try, and is not good at teaching.
Most people will not choose an outcome distribution which is highly concentrated on one spot, even if that has the highest expected return. People who are highly motivated are even less likely to choose it.
Most people will choose less expected return (sacrifice gains), and more risk by buying a lottery ticket.
I think the notion of lower risk being good comes from the financial world where you can use leverage on a lower risk position to create a higher return one.
I think people choose outcome distributions that are quite different than a low risk one.
That is obviously not true and it's easy to prove: betting on sports is wildly popular. And here the thing, people don't want to win per se; they want to share in the outcome for their team, positive or negative.
Seems unnecessary considering some of the most thoughtful discussions come from throw-away accounts doing Tell\Ask HN. It could also exacerbate other issues: http://en.wikipedia.org/wiki/Argument_from_authority
[ ie. comment upvotes co-vary with name of commenter instead of content communicated ]
That's why I believe there should be no real disadvantage to using an anonymous account, and no way of telling in the comments if a comment is anonymous or not. All it does is encourage people to link their real life account in a behind-the-scenes manner, which will raise the general discourse level.
On one hand you want to avoid trolls, but on the other hand you want to avoid a circle-jerk and "my boss can google everything I wrote" timidity and un-hackerish conformism. Tending toward either extreme can disincentivize thoughtful discussion and negatively impact a community. So I would be hesitant to declare that a safe assumption.
keep track of submission karma and comment karma separately. People should not be able to downvote in comments unless they have earned the karma threshold required to do so from comments.
Ex [of what to prevent]. someone submits the latest tech-crunch article first and gets upvotes from everyone else trying to submit it. They can then downvote in comments without first being subjected to a socialization period of earning upvotes from thoughtful comments.
But then what's the incentive for good submissions? If it's gotten to the point where essentially every article from certain sites (eg Techcrunch) gets submitted by karma gold-diggers, it might make sense to just have them auto-submitted so no one gets the karma, but they can still get discussed.
EDIT:
Another thought: Only allow submissions from people above a given karma. That way you would always have to earn your stripes through commenting. Downsides: Wouldn't allow sensitive anonymous 'ask HN' posts from throwaway accounts, and we would lose all the good submissions from new accounts.
> a sort of wildly self-improving problem-solving algorithm that has no real consciousness and simply goes on an optimization rampage through the world
> I meant the human species as a collective has no single consciousness.
Neither has an AI species, but that's not the issue. The point being made here was that a danger could arise from a very efficient and powerful automaton that has neither self awareness nor recognizes other beings with minds as relevant. From that I argued the threat of it happening is actually low because by its nature this kind of AI would probably lack the means to instigate an autonomous takeover of our planet.
> And also that individual humans do not have metaphysical consciousness.
Ah, I finally see where our misunderstanding comes from. Science doesn't talk about consciousness (or metaphysics) in the spiritual sense. The question whether people have metaphysical consciousness or not really depends on your definition of those terms, so arguing "for" or "against" isn't really gonna do anything besides getting you karma for oneliners.
As far as practical AI research is concerned, the definition of consciousness is the same for humans and non-humans and while there are different degrees of consciousness possible, there certainly is an agreement that the average human has one.
My mind is still thinking in 3NF though. I understand dernormalization and avoiding joins will be useful from a performance standpoint. However I am unsure when to either go ahead and include a foreign key, retrieve it and perform a second query from the application layer, and when to go ahead and duplicate\embed all the field data. I'm leaning towards just making 2 simple sequential key lookup queries, the 2nd on the retrieved foreign key, rather than duplicating fields everywhere and keeping track of massively cascading changes. Instead of performing 1 MySQL Join. Although I usually think in terms of minimizing roundtrips to the database server.
Wondering if anyone has a heuristic for this or suggested reading?