Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Programming-language popularity by GitHub pull requests (lemire.me)
55 points by ibobev on April 7, 2023 | hide | past | favorite | 70 comments


I ... don't think this is a good metric for 'popularity' but rather a good metric for stability/volatility. I would say that from working on a JS project, it's volatile af. You can use a package that will be abandoned tomorrow and you have no idea.

Meanwhile, the languages near the "bottom" are quite stable languages to work in (as well as a pleasure). They even /almost/ noticed it themselves when mentioning golang 2.0 coming out soon, but assumed that that would make it 'popular' again.


I think this is a bad metric for stability/volatility. It’s just one way to measure public activity. I don’t think it’s a good proxy for any of the things we actually want to measure, except maybe the likelihood that we can find collaborators for our project in language X who are willing to make pull requests.


It's a good proxy for both. Both popularity and volatility have misleading statistical failure modes if measured purely by pull requests.

The author did narrow it down into what it mostly accurate represents though:

> Nevertheless, in my view, the number of pull requests is an important indicator of how much people are willing and capable of contributing to your software in the open source domain.


There has to be something to contribute to first -- i.e., new features or bug fixes. Once software reaches a level of stability, there aren't new features to be built, or bugs to be fixed. That doesn't mean there aren't contributors out there, willing to contribute to something new.


Completely agree. This is like measuring productivity by lines of code written.


A ratio between the number of lines of codes of all those projects and number of pull request could be interesting to measure this volatility.


Clicking through to the underlying data, it seems that 2% of all code is written in Nix. I doubt that is the current state of the industry. For example, I doubt that 2% of programming jobs are for Nix codebases.

For that reason, I am consuming a very large salt crystal alongside this information.


I'm guessing that Nix's 2% is nearly entirely driven by pull requests to the NixOS/nixpkgs repository, which has 4,053 open PRs and 190,167 closed ones at time of writing.


That makes sense and that was exactly what was going through my mind.

I guess, PR data shows you what needs to be updated frequently. Personally, I like getting code right and shipping it and never thinking about it again. But for "here's how to reproducibly install the newest possible version of foobarbaz" yeah, a lot of updates are going to be needed, simply because "the latest version of foobarbaz" is likely a moving target.


Better than TIOBE at least, which seems to have gathered a huge amount of popularity for no good reason.

Their metric is just to search for "$LANG programming" in various engines, then assign arbitrary weights depending on the search engine used. It doesn't really measure anything, and has already been gamed intentionally, since it's so simple.

Every metric is going to have its pitfalls, but at least this one seems to be measuring something that approaches reality.


I think job listings are the least biased statistics on language popularity, as there is no too significant difference in productivity between high level languages, so the number of people required will be roughly linear to the jobs.

According to that the top 3 are js, python and java, in some order. I think it’s a good Litmus-test to fail statistics that have a different top ranking (TIOBE being beyond useless, having listed Visual Basic as 6th some time ago?!)


having listed Visual Basic as 6th some time ago?!

That actually does not surprise me. It's considered a step up from (and often used with) Excel, which is also everywhere.


My memory is hazy, but it was ahead of goddamn javascript.

Also, the 3 lines of scripts it is used for is hardly a big amount. Excel is big, but when companies grow out of it they go for integration with “proper” applications instead.


> It doesn't really measure anything, and has already been gamed intentionally, since it's so simple.

If you replace “has already been” by “can be”, I think all of that applies to this, too.


None of the charts say they represent fraction of code in existence, right? It’s just GitHub PRs, issues, stars, etc.

Or did I miss something?


Agreed, but I’m looking for what’s next, not the current state of the industry or jobs. I’m riding the JS/TS wave for a while but I can already imagine improvements. If a language is rising, I’ll check it for those needs.

Nix has been getting attention among my friends. I’m still in “wait and see”. The next big thing after TS won’t be an incremental improvement. It will solve big problems like:

- Too many ways to do things. (Remove ambiguity. Instead, make choices for me so others’ code is predictable. Apple approach. Does Swift do this?)

- Naming suffering from backwards compatibility. (For…in vs for…of etc.)

- AI integration?

PHP is losing to JS due to 7.0 or whatever. Scala surprised me!


Scala is at 1.7% in the actual data, and it’s generally appropriate for a niche language like this (although Rust is below it, even though the HN bubble might make you think otherwise).


Lovely last sentence: believing c# being an underrated language.

Very true, also public pull request are not friendly to dark matter enterprise software :)


I recently got into c# and it seems really nice, but the "customs and traditions" of the dotNet world horrify me (eg: never write a function when you can write three classes instead)


You can actually write an entire, complex C# application without using a single class via top-level statements. Named tuples make it easy to wire together functions without invoking the full OO type system.


Very true, but there remains a large gap between what’s technically possible and customarily applied.


Which customs and traditions? I work for nearly two decades in the .NET space and it seems I missed that indoctrination. What I see is a developer population moving through time, adapting the technology trends as they come and go and taking the best for the current projects. And so did also the language and platform itself. When I started I needed a Visual Studio on Windows, 3 xml files and at least 10 code classes to startup a simple app. Today, I (can) code/deploy on Linux, use Visual Studio Code (or any editor), my project has two files, zero classes and looks more like nodejs than anything else.

IMHO, there is a huge misconception regards C# regards their OOP enforcement. Most classes you ever write for business logic in C# are nothing more than namespaces/grouping containers. And that is for good. There are so little business logic object hierarchies (aka the Pattern world or OOP fantasy world). The other reason to write classes are object to transfer data (no logic). There are the sinful years of DTOs (which are an OOP abomination) but that is obsolete for some years already in favor of records.


C# is a follower of DDD, mostly. Which is a 'standard' practice for OOP design. I highly recommend reading the book about it, but that is how you end up with many classes instead of a simple function. Also, in C#, you can't generally mock a function (or static method?, not 100% sure on that, it's been nearly 10 years since I've written a unit test in C#).


> you can't generally mock a function

With `dynamic` and `DynamicObject`/`ExpandoObject` proxies (or even lower level System.Dynamic/System.Linq.Expression fun) you can mock anything you want to in C#.

Those tools go all the way back to the early days of Linq (and useful but somewhat broken DLR visions like IronPython). If you need to time travel even further back in the .NET stack, or if you are just allergic to/deathly afraid of the DLR as some people seem to be, System.Reflection.Emit has been there since day 1. It's awful to work with and even worse low-level experience than the DLR, but it is capable of a lot of things. If you've got an up-to-date compiler you can go the other direction and use the recent Source Generators to do all the same low-level things but this time in the context of Roslyn and at build/compile-time.

Obviously, that doesn't necessarily make it a good idea that just because you can do such things that you should do such things, but C# has far more powerful raw tools at its disposal than many people realize.

A lot of the boilerplate in DDD styles is simply a preference for it and (over-)design patterns as comfort food.

It's a further aside, but hand-written "Fakes" patterns require more up-front work but often seem to me much better than automated Mocks. I've never seen a good DDD pattern focus on good "Fakes", though, and sometimes I find DDD complexity gets in the way of good "Fakes".


I'm on of those people that had no idea about all this extra functionality- thanks for opening that door. Do you have any resources on the hand writting "Fakes" patterns?


I don't have any resources directly off-hand, but the basic concept is implementing "shareable across multiple tests" versions of your dependencies that implement things relatively similar to the end product but in a way that uses fewer resources during testing and is hopefully more reproducible/unlikely to encounter transient environment bugs. (Though still overall more "fake" than "real", otherwise you are just building artisanal integration test harnesses.) Things like using in-memory or SQLite data stores instead of your production database type. Ideas like true secondary, simplified implementations of your abstractions. (There's no reason to have an interface that is only ever implemented by one class, so at least this is one reason to have a second implementation that fakes doing something useful.) In some ways I feel "the fakes pattern" really just means "the old way of writing tests before auto-mocking frameworks became popular", but testing patterns love to have names that change every couple of years.

There are obviously good reasons auto-mocking frameworks became popular as it can be too easy to fall into performance traps or to try to maintain two separate dependency stacks (and get dangerously close to all of your units tests as just baroquely complex integration tests), one of which may easily get out of date/diverge and is extremely fragile, especially if you don't have good abstractions up front. It's too easy for how easy you can build your "fake" data sources to accidentally create a lower common denominator of what you can safely test, either limiting the types of queries that you feel like you can add to production code (forcing you to avoid things that your production DB supports, but an SQLite or In Memory storage can't easily fake) or create growing missing coverage boundaries between "testable" and "production" code.

On the flip side though, the benefits of hand-written fakes should be that you better prove out your abstractions and how they are factored (if it is too hard to manually fake a dependency, then maybe that becomes a sign that the dependency needs to be refactored and/or a better abstraction found for it), and the tests overall more resemble your production code and how it operates in the wild. (Versus how I feel excessively mocked code starts to resemble "stage plays" that don't necessarily approach or model real world usage and behavior and it often remains too easy to "stage play" even when your abstractions are wrong/not helping you enough.)


The scenario I was referring to is one of many gems found in app we outsourced.

The piece of code in question had a very straightforward task: look at some bytes in the input and produce a string label to be stored alongside the whole input value. There are 5 different labels tied to equal number of fixed byte sequences.

I would like to think that most people would solve this problem using an if/else or a switch statement inside a function. Instead, what we got is a group of matcher classes, a mapping of matchers to enum values representing the labels, another mapping of enum values to actual strings, and a class that actually calls those matchers and does the mapping.

I really hope this is not the DDD way and instead we just managed to find a team that's prone to massively overcomplicate solutions to simple problems.


My unsolicited, socially unacceptable pet theory on this: that’s what happens when you’re building your 15th e-commerce web store. C# is a great language, but it’s a worker’s language. Its domains tend to not be the most exciting ones (e-commerce, calcified Excel replacements, Windows desktop applications, …). [1] Hence, so my theory goes, the smart but bored engineering minds wander out and conjure up complications to fill their days and minds. A common pattern in IT I’d say.

1: https://www.reddit.com/r/csharp/comments/qomcps/comment/hjo1...


> you can't generally mock a function

Yes, this is generally true, so the workaround is to put your function in a class, and use an interface + dependency injection to mock what you need. Sometimes it's a hassle.


What is DDD?


"Domain-driven design", a modeling/software architecture design approach


And to follow it up, read the actual book by Eric Evans. He tells you when, and just as importantly, when NOT to use the things in the book. I have to point that part out whenever I see people replacing CRUD with DDD.


That culture is largely shared with Java.


> also public pull request are not friendly to dark matter enterprise software

Exactly - if you included my organization's private scope alone (an org of ~10 people), there would be 90 additional repositories and ~1.5k PRs/yr worth of C# action available to pad the stats.


Yeah. The runtime has a few warts, but the language is pretty solid. I've come to appreciate it a smidge more than Java. I think it falls in the "underrated" category as it "feels" a lot like Java and many people consider it just MSFT's attempt to avoid tying their technical future to a property "owned" by a competitor instead of a decent language in it's own right. Truth is, it's probably both.


I've never learned C#, but I'd rather kill myself before using Microsoft Java. :D (Or, to a slightly lesser extent, Oracle Java)


JS is in a different league because it solved the distribution problem. The one that everyone tried to solve, and spent big on it. Turned out you just needed a working interpreter on every device that can speak to the screen and to the internet (the browser), and the labor will appear to shape it into whatever it needs to be.

This, to me, is a lesson worth hearing: sometimes the feature you think is most important, the ones your customers are saying are most important, is not, and is not even close. The biggest software problem after PARC solved GUIs was distribution. The browser won 80% of apps out of the gate, but when the rest of the way with XHR.

It's really a startling turn of events when you think about it.


It solved the run anywhere part, not the distribution one — funnily enough js-based builds can trivially fail between even major OSs.

Java is much better in that regard.


> Go is holding at nearly 10%: it underwent a fast rise but seems to have plateaued starting in 2018. I imagine that the imminent release of Go 2.0 could help.

Is Go 2.0 imminent? Any sources and details of this?


No: definitely not imminent, and probably never. Originally "Go 2" was the name for a real version 2.0 where the Go team would make breaking changes and fix any warts. However, over time they decided a big breaking release wouldn't be necessary (or a good idea), and decided just to keep iterating on 1.x in a backwards-compatible way, even for large features like the addition of generics in 1.18. At this point "Go 2" is a "useful moniker" for an abstract future version of Go with additional features. See https://go.dev/blog/go2-here-we-come

I think this is great for stability, and as a result Go looks to be avoiding the Python 2 to 3 ten-year-migration.


I wonder why JavaScript fell from 19% to 9% in 2022?

According to the graph here: https://madnight.github.io/githut/


Author here. I started to filter pull requests from bots like dependapot, because dependabot massively inflates the number of pull requests, especially for JavaScript, in recent years.

Unfortunately, I don't have enough BigQuery Credits ($) to re-run the bot filter for the entire history, that's why you see the down spike.


Interesting, that C++ went up quite a bit around that time. But I have no idea how that correlates.


Perhaps a large project or two switched to (only accept new work in) TypeScript? Still seems a lot.


TypeScript did show an increase from 5% to 8% in that timeframe


You're all wrong. FORTH is the best language ever. You kids get off my lawn!

(just kidding)

JavaScript has warts, but if you drink enough, you can pretend it's Scheme. Python isn't a language but a family of languages. Guido thought it was a good idea to change the semantics AND syntax between minor revs, so... no... having to re-code my apps every two years is a deal-killer for me. Sadly, the same applies to rust. I'm not the biggest fan of C++, but a friend pointed out some of the more recent language additions make it a much better language. And they added them without removing reverse compatibility. Or at least to a greater degree than python and rust.

Also, if you're picking a language to use based mostly on a popularity contest, you get what you deserve. Start with understanding what your needs are, then pick a language that allows you to model your computational requirements in a way that meets those needs. Maybe after that look at associated tools and libraries (I'm mostly concerned with debuggers since I usually debug things with a debugger rather than print statements). Then maybe think about popularity, you probably do want to be able to hire people, but it always seemed to me to be a premature optimization to say "oh. Java is real popular, so we're going to use Java 'cause we can hire people who claim to know it."

But there are some niche financial engineering teams I know who are very happy with using niche OCaml derivatives. They're completely okay with hiring people with Lisp or Scheme or "popular" ML derivative experience and then giving them a couple months to learn what's going on locally.

And heck, I've been on projects where we used COBOL and it was exactly the right language for that job. Was thinking about this the other day... I sort of miss Pascal and Delphi. meh. Life moves on.


Happy to see someone down-voting a comment that suggests you examine your requirements for a language because it dares to challenge the supremacy of python and suggests that adults are perfectly capable of choosing a language appropriate for the task, even if that language is COBOL. Stay classy, HN.


I didn’t vote, but your take on Python doesn’t match my experience. Do you have more context? It seems like after the transition to version 3, the lesson was learned. That was about a decade ago now.


2.3 to 2.5 - changed the syntax and semantics of the language by adding class variables. 2.5 to 2.7 - changed semantics by changing class variable to package variables.

Certainly you're not claiming Python hasn't made breaking changes in the syntax or semantics over the years?

I mean... this still compiles with my c compiler:

  ??=include <stdio.h>

  int main( argc, argv )
    int argc;
    char *argv[];
  ??<
    printf( "hello %d %s\n", argc, argv[0] );
  ??>
  
Though... gcc makes you add the -trigraph option.


Clojure has had pretty remarkable stability over the years. Maybe Common Lisp even more so, since it's based on a standard that hasn't been updated.


> Then you get the second tier languages: Java and Scala, C/C++, and Go. They all are in the 10% to 15% range.

I read that as Java and Scala both are second tier languages. After looking at the graph I realized he was lumping the two together. They are 11.3% and 1.7% respectively.

If Scala were a "better Java" that grouping would make sense. But some Scala projects are so different in the paradigm that the slight interoperability between the two languages is completely irrelevant.


IF anything, he should've grouped Java with Kotlin and Groovy. Scala has always been its own thing, and was never really accepted as an integral part of the JVM ecosystem, unlike Kotlin and Groovy.


> I find that building and publishing Java artefacts is unnecessarily challenging, compared to JavaScript and Python.

I'm not a big fan of Python's jumble of competing build tools but personally, I stick to venv + pip and it generally works out. However, having dealt with both maven and npm, the headaches with npm's build process are so annoying, its not even funny. Comparatively maven is rock-solid even if you may not like its xml-based configuration.


There’s a link to the chart at the end. I was curious to compare C and C++. In 2022, there was a huge bump in C++, and a small bump for C.


I wonder how they count files with the .h extension. You can probably have a heuristic that is pretty close just looking for use of certain keywords, but it is possible to write a file that works in both contexts, so it's definitely a grey area.


Great project, nice to see a more detailed view :)

See also: https://octoverse.github.com/2022/top-programming-languages (which we publish annually and has a few related stats)


Those numbers seem more believable and probably better sourced than those in the post.


I think this is not the best representation. Although I appreciate the commentary of the blog. Is github a valid platform for this kind of sampling? I dont know I have hundreds of java repos elsewhere and would never host my code on a Microsoft platform. But somehow Kotlin is not represented when it's the default language for android....because mobile projects don't host on github publicly. And what about PRs makes them special anyway? As one commenter mentioned this could be more about volatility/stability.

As a side note, I'm pretty shocked that the OSS community is "just fine" with leaving their projects on github after MS took over and scanned all their code for machine learning.


Shouldn't it be: in Public repositories.


Odd to see php and Ruby being at the same level. I would have thought php was much, much higher. Like by a factor of five to ten.

Another thing, it seems like Python developers might be apt to push/pull more often, in smaller commits compared to say, Java programmers.


Where do those numbers actually come from, as they don’t seem to match the linked page very well? Why is Java and Scala considered to be the same thing, even though they definitely aren’t and Scala is much smaller than Java?


I feel like Dependabot means the JavaScript/typescript is over represented.


Glad to see C++ remains strong. I wonder if AI and graphics heavy work contribute the most. I used to architect general purpose applications in C++ but now mostly use it for number-crunching.


Definitely not proportional to HN articles which are 90% Rust.


The same what I thought when looking at the article :) Go is higher than I expected for such a comparatively new language.


What language do you recommend purely for getting a job, GO or Java? I have few years of experience with JavaScript.


This is very local dependent. Check jobs in your area.

I've seen places where you would never find a Go job and Java is nearly 80% of software jobs... but OTOH if you're working on stuff related to cloud tooling, Go is very dominant and you will almost never see Java.


Purely for getting a job and not FAANG or a startup?

Java is everywhere in any enterprise that does internal software development. The quality of that code is another discussion. You will also be likely working on very legacy code bases that are in the process of finally moving from Java 8 to Java 11.


What is "DM"? Does anyone have experience with "hoon" and its "urbit" platform?


Pleasantly surprised to see Hoon in the top 50.

Poke around this site for some info about Hoon and Urbit: https://developers.urbit.org/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: