Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Statistical Portrait of a Y Combinator Batch (statwing.com)
118 points by glaugh on Aug 20, 2012 | hide | past | favorite | 38 comments


This is a really excellent example of using your own product to generate interesting content as a way to drive traffic back to your product.

Great work Statwing. Can't wait until I have some data that needs analyzing so I can use your service.


"Higher values for Number of Employees/Contractors (FTEs) are weakly associated with higher values for Average Age of Company's Founders (Rounded)"

I had a suspicion that that might be true, but I wonder why that is? Perhaps older founders tackle problems that need more domain expertise and more people? Or perhaps they can rely on savings and have been able to bootstrap a little better than high-school/college grads?

Anyway, good job on StatWing, I love playing around with numbers and graphs. Perhaps some public datasets will help people get more familiar with the app and serve as demo.


I wonder why that is?

Let's not rush to find an explanation, now!

Looking at the data, we see that two outliers (age 30 with 10 employees, age 33 with 12 employees) are driving what is admittedly a "small" correlation. These two companies are about 4 standard deviations away from the mean of 1.4 employees/company.

They're arguably outliers, and I suspect they're skewing any effect we might be seeing.

Of course, a small, skewed sample (tech companies in YC) obviously means we can't infer a damn thing beyond YC members to begin with, but its worth pointing out those outliers.


It would be interesting to know if the average hours worked per week would also be correlated with # of employees and age. My hypothesis is that older founders will not be as likely to get involved in a startup that would have them doing everything themselves and working the kind of hours thus entailed. They will tend to go for those startups where for whatever reason (early revenue streams ) they can staff up sooner.


My hypothesis is that older founders will not be as likely to get involved in a startup that would have them doing everything themselves and working the kind of hours thus entailed.

I predict that marital / family status would have more to do with it than just raw age (allowing that age may somewhat correlate with being married / having children). That is, a single founder who is 39 is, as far as I can tell, not much different than a 25 year old founder. Being in that position myself, I can at least tell you that the "now or never" effect mentioned below is very motivational for us old farts. I know I work harder on Fogbeam Labs now than I would have when I was 25, and I'd be comfortable saying I work approximately as hard as most 19, 23, 27, 32 or whatever year old founders.

Now, if I were married with children at home, I probably would not be willing to put in those hours... in that regard, I kinda agree with your hypotheses.


Regarding older founders. I'm currently working with one of the founders that is a statistical anomaly on the upper rage of the graph. Reality as I perceive it is far opposite from the case, perhaps this is why they are a statistical anomaly within YC.


Maybe older founders are more likely to have previous experience with hiring full time or freelance employees, and are hence less likely to doing everything themselves.


Thanks for posting. It would be fun to see the ages of accepted YC applicants compared with the rejected applicants. I'm not sure how easy it would be to get the data of the rejected applicants though. Maybe they would self-report their information if you posted something here on HN.


Really good idea. We should definitely do that.


I wonder, is that that spike at 39 thinking "Shit, I'm about to turn 40. It's now or never!"


As a 39 year old who is definitely feeling the "Shit, I'm about to turn 40. It's now or never!" thing, I'd say that sounds pretty likely to me. (note: I'm not in the YC batch, but am a startup founder, so I'm just referring to the general issue here, of feeling the need to go the entrepreneurial route now and not later).

Granted, it's just one anecdote, but that definitely rings true here.


Non-YC founder here as well. Please don't read results like this and think there is no way you will have success as a founder if you do not get into an incubator by this age. Rather, what should be looked at is that YC looks for a very specific profile when grooming their prospective companies. They want young, vibrant, fearless candidates whose idealism allows for a significant chunk of ownership to be forked over in exchange for connections and press. As founders get older, we tend to learn there is more than one way to skin a cat. So you'll see us learning our own ways to network, or get press, or get traction.


Non-YC founder here as well. Please don't read results like this and think there is no way you will have success as a founder if you do not get into an incubator by this age.

Not sure if that was meant for me, or just for everyone else reading this thread, but I definitely don't think that way. When I say "it's now or never," I mean "it's now or never to launch this startup, and make it work by hook or by crook." YC isn't even on our radar now, for various reasons, but we're confident we'll succeed with or without any given incubator, or anybody else, aside from the only people who matter - customers.


Yes, it's not like YC is discriminating against older founders. It's more like older founders realizing it doesn't make sense for them since they've developed a network, savings, management experience, etc.

Many of the benefits of an incubator (or even of angel investing/VC/etc in general) are the benefits of experience. No need to spend time gaining something you've already got and certainly no need to give up equity for it.


I wondered the same about the peak at 27 and the spike at 29. But, this is a pretty small dataset, so shouldn't read too much into it.


The 'spike' is the difference between 1 datapoint and 2 datapoints.


If you're going to needlessly nitpick, at least be correct.

It's 1:3.


difference between 1 percentage of the datapoints.


Yes. It looks like slightly less than 2% vs. slightly less than 1%. The statwing dataset for companies shows 80 companies with an average of 2.38 founders, meaning there are 190 founders in the batch. So I think it's 3 39-year-old-founders vs 1 each at nearby ages.


It would be interesting to put this against the ages of Gen Y distribution. I believe this grouping would actually look relatively old to the peak in population if we assumed only Gen Y would apply.

I am basing this on my memory that 1990 was the peak year for those born in Gen Y. (I cannot find the data set to back it up, but I bet someone else knows where to get it).

+ a few outside of Gen Y.


Good stuff. I imported and played with a regulatory dataset.

The results mostly confirm industry suspicions that enforcement differs the most based on what region an operator is in (poor regulatory performance operators are mostly located in the same regulatory region).

What was neat was how little individual manufacturer's designs mattered. But over time, it was either hugely advantageous or hugely disadvantageous to simultaneously operate multiple types of designs. Example: in 2007 it was about 5% better to simultaneously operate multiple designs, but in 2010, it was about 17% worse.

Also confirmed that it was much, much better (from a penalty standpoint) to find and self-disclose regulatory non-compliance rather than to let the regulator find it.

Awesome work guys! Will there be an ability to play with the time dimension soon?


Awesome! That's really cool.

Time is tricky. That's among our most requested features, though. So we won't get to it in the very very near future, but its definitely on the roadmap.

Thanks for the comments, really appreciate it!


I'm interested in the 15% of YC startups with a single founder. I wonder if that number is higher than average compared to other classes? And if so, how much higher? I'm also curious if there is a correlation with the average age.

Anyone privy to this information and willing to share?


So I rifled through the source of the webpage and found there is no way to download the actual data from that page. All the calculations are being done on the server side and the summary results are getting sent over via HTTP.

An interesting model for sure and one that will ultimately make for technical sense but enterprise woes in the future. I'm not sure if businesses will want to upload the data that would most benefit from the StatWing treatment. It looks like they have realized that though. Maybe aiming to cut their teeth on people who generate a lot of data and then take a stab at going enterprise via partnerships with other companies that already have a strong presence in big companies but are lacking in the analytics.


Just out of curiosity, what made you think you would find the raw data on the client?


For most of the visualizations I've found on Hacker News, the data is usually available directly to the client, either through an API or a static file (.csv,.json,etc.). StatWing is already using d3 to display their work, so it is possible they were also using crossfilter to do filtering as well.

So, really, past experience with this sort of thing and seeing that they were using d3.


Any thoughts on why the ages of 26 and 27 appear to be the mode (especially for "social" startups)?


Speaking as a 27-year-old, it seems fairly reasonable to me. This is the age where you've had roughly 5 years of career experience in software or web development (if you went to college), and in the adolescence of your career you may have encountered a problem or a market that seems interesting and may also have a desire to be your own boss and avoid the tedium/politics/etc. of the places you've been so far (because, naturally, you won't make the same poor decisions!).

Not sure that I can speak to the prevalence of social startups in this age range, apart from the obvious "kids these days" take on it. Bear in mind, though, that it's not purely a representation of what 26- and 27-year-old founders are doing -- it's also reflective of YC's position.


As a member of that age cohort (but not a founder, much less one funded by YC), I feel qualified to speculate wildly about this!

We were exactly the cohort who got facebook off the ground - I joined in March of '04, spring of my freshman year - so while we may not be "social web natives" depending on your interpretation of such a noxious term, we're definitely well-acquainted with it. Probably well enough acquainted to feel that we know what features are missing, what niches are underserved, or something along those lines, with existing social services.


Statistical data? I mean, the few graphs are interesting, but that's VERY little data being displayed at all. I was expecting much more before clicking this link. Tufte would be mad at the abuse of space for the ridiculously small amount of data actually displayed.


Statwing is doing it right: it is super easy to navigate to their home page from the blog, by clicking on their prominent logo, which brings you directly to their main page.


What exactly is a "social" company vs. a non-social one? Some companies are clearly in one bucket or another but I'm wondering what kinds of companies are near the boundary.


Sorry that's a little confusing, especially since most companies nowadays have social components. For this dataset we categorized social based on whether or not the social component is critical to their business.


I love this product. It makes a hard concept easy, it looks pretty, and it's fast.

I can't wait to see what this team does next!


It makes a hard concept easy

I love what Statwing is doing here, but they could be providing people with enough information to be "dangerous."

The employee count vs. founder age analysis in another thread is a perfect example. Posters are trying to explain why employee counts rise with founder ages, when a glance at the plot suggests the effect results from two companies with abnormally-high (~4 standard deviations from the mean) employee counts.

Statwing is definitely pretty and fast! I'm curious, however, to see how they'll work to help people with diverse backgrounds interpret results.


Can you let us download the data and run our own analyses?


Unfortunately not. While there's nothing particularly identifying about the dataset, we collected this data with the understanding that we wouldn't do that. Sorry!


Why make it harder for users to do something which is already possible using your own filter feature?

For example, data on one company:

# of Founders: 1 Founder Age: 43 Number of Months Worked: 20 Number of Employees / Contractors (FTE): 2 Social? No Mobile? No




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: