It's not just that they dropped the ball, they actively sabotaged whatever goodwill they had built by adding malware to software. Not only was this a massive hassle, it ruined the reputation of lots of FOSS projects with folks who just wanted to use some of the most popular consumer-ish open source software like Filezilla.
While SF was crapping where they eat, GitHub built a lot of trust and goodwill with a lot of people.
It should be noted that the malware bundling was done when SourceForge was owned by DHI Group, Inc. And now, for many years already, SourceForge has switched owners (BIZX/Slashdot). They have undone the bundling and are now trying to manage it like it was managed before. It seems like it's going well.
I would consider SF a viable Github alternative, but the bad reputation caused by a temporary owner just seems to stick forever.
The Sourceforge UI and overall experience is still a decade or more behind the experience on GitHub and most of the other modern and maintained equivalent sites/services like Fossil, GitLab, Gitea, etc
That ancient feeling UI doesn’t win them a lot of forgiveness, if the only notable change has been “we took away the malware” and the site continues to remain stagnant, SourceForge will continue to feel very inferior to more modern alternatives.
SF's development was also essentially stalled during that period, and the current owner has to do much more things other than (proper) management to cope with competitors. If you need a proof, compare SF with Gitea/Forgejo.
Open Source = a Microsoft property... I don't want it to be true and I will act like it isn't true for things I have influence over - i.e. it's not the first choice for hosting a repository.
Some people don't even know the difference between Git and GitHub...
Is it? I don't know, I don't use it for personal (actually personal) stuff, but if I actually want to publish something — I'd do it there. If I want to contribute to something — it's way easier to do it on Github than most other places. It makes searching for code easier too. If Microsoft decides to abuse the monopoly, or if Gitlab/etc. would actually be much better, I don't imagine it would be very difficult to switch. Well, yeah, sure, Github Actions and issue history are somewhat of a vendor-lock, but it's not that bad, I suppose.
Maybe Copilot (made possible by the huge non-commercial codebase on Github) being somewhat unfair advantage to other commercial alternatives is a bit troublesome, yeah. But otherwise I just don't see why Github being a de-facto standard is bad. In fact, I am somewhat annoyed when a really popular project doesn't have a Github repository (mostly because it makes filing an issue, or even reading existing issues much more difficult in most cases). So I'm actually glad to hear that some big projects feel pressed to migrate to github. What's even a problem with that, apart, maybe, for github actions, that honestly suck?
(Maybe I should add: I am a git hater, and do think that mercurial is just unquestionably better, but this battle is lost long time ago, so I don't suppose it's the topic of this discussion.)
I love having this central location even though git is distributed, mainly because having to go to multiple Git hosts of varying quality would be a pain in the ass.
If you are confused (like me) that this was about PyPI (Python packages repository) then no. It is about a project called PyPy (one can argue it is bad name) that is an implementation of python interpreter but without cpython. Instead they rely on a JIT compiler. And it is syntax compatible but if your code uses any library or method relying on C extensions then you are out of luck (Goodbye NumPy.. etc).
Edit: They have C-layer emulation, but I don't know its limitations or current status, but you can use those libraries [1][2]
I've been using git happily for many years. Strangely enough the provenance of a commit i.e. which branch did a commit originally come has not really mattered to me very much. Mercurial provides this and they are using `git notes` to add this provenance meta-data to each commit during migration to git.
I would have thought I'd need this much more, but I have not. In plain git I'll just `git log` and grep for the commit in case I want to make sure a commit is available in a certain branch.
The point is giving branches a meaning (e.g. "implementation of this feature") and being able to at least keep the information that such commit was part of that (well at least that's why I'd want Mercurial's named branches, I'm not sure that's how this project used them)
Wouldn't good merge commit conventions work to preserve as much of this sort of information as desired? All the commits of the branch contained in it with the the merge commit message preserving that info.
When I want to see inside a piece of software I look for (1) the source code; (2) the git-blame; (3) the code review for significant commits. I have never wanted to see into the history before that point, namely how the developer drafted and polished their idea prior to the final code review approval.
What practical use case am I missing out on when these work-in-progress draft commits are lost? I can’t see one.
But 33% of PyPy packages contain the potential for extreme security flaws and you don't know which ones until it gets you. How bad do you have to want to use Python to tolerate that?
"“When we actually examined the behavior and looked for new attack vectors, we discovered that if you download a malicious package — just download it — it will automatically run on your computer,” he told SC Media in an interview from Israel. “So we tried to understand why, because for us the word download doesn’t necessarily mean that the code will automatically run.”
But for PyPi, it does. The commands required for both processes run a script, called pip, executes another file called setup.py, that is designed to provide a data structure for the package manager to understand how to handle the package. That script and process is also composed of Python code that runs automatically, meaning an attacker can insert and execute that malicious code on the device of anyone who downloads it." https://www.scmagazine.com/analysis/a-third-of-pypi-software...
my bad .. handicapped by the way I auditize .. point remains the same tho. Need to clean up PyPI or stop the mortals from using PyTHON. In the meantime, maybe put your venv's into a single non-emulated vm.
There are benefits to having branches be an inherent property of a commit as opposed to the Git model of a dynamic property of the graph.
Suppose I have a branch A with three commits, and then I make another branch B on top of that with another few commits. The Git model essentially says that B consists of all commits that are an ancestor of B that aren't the ancestor of any other branch. But now I go and rebase A somewhere else--and as a result, B suddenly grew several extra commits on its branch because those commits are no longer on branch A. If I want to rebase B on the new A, well, those duplicated commits will cause me some amount of pain, pain that would go away if only git could remember that some of those commits are really just the old version of A.
> If I want to rebase B on the new A, well, those duplicated commits will cause me some amount of pain
Not really. Git will recognize commits that produce an identical diff and skip them. Your only pain will be that for each skipped commit, you will see a notification line in the output of your `git rebase`:
If I had a nasty rebase of A, then git isn't smart enough to figure out that the new A' commits are similar enough to the old commits to know to skip the old-A commits.
> There are benefits to having branches be an inherent property of a commit
And drawbacks, naturally. Advanced branching/merging workflows become extremely painful if not impossible, which makes mercurial unusable as a "true" DVCS (where everyone maintains a fork of the code and people trade PRs/merges).
> Advanced branching/merging workflows become extremely painful if not impossible
That's really, really not true. First off, I used the word "inherent", which doesn't mean "immutable"; you can retain all the benefits of mutability if you so desire. Of course, Mercurial historically focused a lot heavier on immutable commits than Git did, but hg eventually found a different path that really makes using git feel antediluvian in comparison.
The second thing to note is that there's no requirement that the 'branch' property of a commit correspond to only one head. Actually, I don't think any of the mercurial repositories I've contributed to ever bothered with branches; there's just simply no need in mercurial to create multiple named branches, the way there is in git.
Finally, mercurial solves the workflow problem in another way, by essentially realizing that there is a dichotomy between public, immutable commits and work-in-progress draft commits. The problem with PRs is that you end up in a situation where you have the unenviable choice between making updates with 'address fixes' commits that pollute history or rebases that risk making comments go into the ether (especially on GitHub). You might have extra squashes or rebases that make PRs that depend on other PRs painful. Mercurial instead makes a rebase or other history edit simply mark the old commit as dead and link to the new version, so that any other commits that depend on it can know how to be updated to the new version. And this information is spread to anyone who pulls from your repo, but need not be retained when pushed to anyone who didn't know about the old dead versions!
> The difference between git branches and named branches is not that important in a repo with 10 branches (no matter how big). But in the case of PyPy, we have at the moment 1840 branches. Most are closed by now, of course. But we would really like to retain (both now and in the future) the ability to look at a commit from the past, and know in which branch it was made. Please make sure you understand the difference between the Git and the Mercurial branches to realize that this is not always possible with Git— we looked hard, and there is no built-in way to get this workflow.
> Still not convinced? Consider this git repo with three commits: commit #2 with parent #1 and head of git branch “A”; commit #3 with also parent #1 but head of git branch “B”. When commit #1 was made, was it in the branch “A” or “B”? (It could also be yet another branch whose head was also moved forward, or even completely deleted.)
In this post they say that "Github notes solves much of point (1): the difficulty of discovering provenance of commits, although not entirely"
> It can be interpreted as either 2 OR 3 unique branches depending on how you read it.
The question isn't how many branches, it's what branch the commit was on at the point in time it was created. That's not up to interpretation. It's information that was not recorded.
> Consider your same example with forking instead of branching, how would the issue be resolved?
Forked repositories don't have IDs, don't generally keep track of each other, and there's no way to even count them. So that's not solvable.
But branches do have names, and you almost always make commits onto branches. We shouldn't give up on tracking branches just because tracking forks is hard.
I mean you can't really compare them since git doesn't even _have_ branches as Mercurial understands them. git's branches would perhaps better be called twigs in comparison. git's lightweight branches better map to Mercurial's topics or bookmarks, though neither perfectly. And Mercurial has even lighter weight branches since you can just make a new head by committing without having to name anything, and it won't yell at you about a detached head like git will.
Speaking of git, for mega monorepro performance, we're gonna need synthetic FSes and SCM-integrated synthetic checkouts. Sapling (was hg in the past but was forked and reworked extensively) will be able to do this if EdenFS will ever be released, but Git will need something similar. This will require a system agent running with a caching overlay fs that can grab and cache bits on-the-fly. Yes, it's slightly slower than having contents already, but there is no way to checkout a 600+ GiB repo on a laptop with a 512 GiB SSD.
That already exists. It’s called Scalar[1] and it has been built-into Git since October 2022[2], dates back to 2020[3] and is the spiritual successor or something Microsoft was using as far back as 2017[4].
Scalar explicitly does not implement the virtualized filesystem the OP is referring to. The original Git VFS for Windows that Microsoft designed did in fact do this, but as your third link notes, Microsoft abandoned that in favor of Scalar's totally different design which explicitly was about scaling repositories without filesystem virtualization.
There's a bunch of related features they added to Git to achieve scalability without virtualization, including the Scalar daemon which does background monitoring and optimization. Those are all useful and Scalar is a welcome addition. But the need for a virtual filesystem layer for large-scale repositories is still a very real one. There are also some limitations with Git's existing solutions that aren't ideal; for example Git's partial clones are great but IIRC can only be used as a "cone" applied to the original filesystem hierarchy. More generalized designs would allow mapping arbitrary paths in the original repository to any other path in the virtual checkout, and synchronizing between them. Tools like Josh can do this today with existing Git repositories[1].
The Git for Windows that was referenced isn't even that big at 300GB, either. That's well within the realm of single machine stuff. Game studios regularly have repositories that exist at multi-terabyte size, and they have also converged on similar virtualization solutions. For example, Destiny 2 uses a "virtual file synchronization" layer called VirtualSync[2] that reduced the working size of their checkouts by over 98%, multiple terabytes of savings per person. And in a twist of fate, VirtualSync was implemented thanks to a feature called "ProjFS" that Microsoft added to Windows... which was motivated originally by the Git VFS for Windows they abandoned!
I worked on source control at Facebook/Meta for many years. On top of what aseipp said, I remember the early conversations we had with Microsoft where the performance targets for status/commit/rebase they wanted to hit were an order of magnitude behind where we wanted to be.
But most repositories are not that big so this is hardly an issue for most people. Personally, the system I'm most optimistic about in 2024 is Jujutsu. I've been using it full time with Git repos for several months and it's overall been a delight.
Every provider out there can talk a standard Git protocol, but all the features that don't have a standard Git protocol become a proprietary API. I think if Git (or a project like it) made a standard protocol/data format for all the features of a SCM, then all those providers could adopt it, and we could start moving away from GitHub as the center of the known universe. If we don't make a universal standard (and implementation) then it'll remain the way it is today.
Codeberg is also working on federating, or maybe they already do. My experience using them was quite unpleasant, though, they're very feature-incomplete.
I used to use Mercurial as well and greatly preferred it, but for better or worse, Git won. I started using Git several years ago and haven't looked back.
No matter what people might say, I think this stuff matters for contributors and users who might be looking at your project, and git/github is the typical expectation. This is likely the right decision, as they are now ubiquitous.
Same story for us, started with mercurial many years ago, eventually the tooling around git and just "using the standard" was too big to ignore and we migrated along with a bunch of other CI/CD and DevX improvements. Mercurial was cool, but lacking support meant little things like Jenkins having to "pull" 3x instead of 1x for git natively along with many of these little things meant just using git, generally saved us a bunch of work.
I've used that in the past, but it doesn't really work that well on large projects--it basically works by keeping a hg and a git version of the same repository, and storing a mapping between the two, which scales really poorly with multi-million commit repositories.
What I really want is something that will let me use the interface of hg's power tools (revsets, phases, changeset evolution) on an existing git repository.
You will probably like Jujutsu, which takes much inspiration from Mercurial, and even has a few prior Mercurial hackers working on it. It uses the Git data model underneath (so feel free to use GitHub), but has an entirely rebuilt UX and set of algorithms: https://github.com/martinvonz/jj
It isn't a 1-to-1 hg clone, either. But tools like revsets are there, "anonymous branching", log templates, cset evolution is "built in" to the design, etc. There is no concept of phases, we might think about adding that, but there is a concept of immutable commits, so you don't overwrite public ones. The default output is designed to be succinct and beautiful, so it remains relevant on high-traffic repositories with lots of work-in-progress patches, and many developers.
It also has many novel features that make it stand out, like the working-copy-commit. We care a lot about performance and usability; to the extent performance is bad, some of it comes down to piggybacking on Git's data model and existing performance issues. Give it a shot. I think you might be pleasantly surprised.
Disclosure: I am a developer of Jujutsu. I do it in my spare time.
P.S: You might alternatively like Sapling, from Meta. It actually is a fork of Mercurial (you can see it in the UX and features) but is very different now; in particular it also uses the Git data model for the storage layer, so it works with GitHub. It will probably feel more familiar than Jujutsu at first. And it has some absolutely amazing features like `sl web` we can't match yet. https://sapling-scm.com/
Me too. It feels great to see such a clear example of building a successful business on an open source tool and ecosystem. Git is massively popular and actively used, GitHub built a huge community for discovering and interacting with open source projects, and since the acquisition, Microsoft-owned GitHub has continued improving their platform without breaking interop with the open spec.
This is a tragic, wrongheaded move, and I say that as a big Git enthusiast (but a Github hater, to be fair...)
I don't think PyPy gains anything from this, not even a reduction in the annoying messages that have been psychologically torturing the maintainers. If anything, you're just opening yourself up to more common and frequent low-investment pestering.
It's kind of sad that this is true.
I'm guilty myself, I contribute to projects on GitHub more often than on any other platform.
And when I search for open source projects the first page I use is GitHub.