Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Issue Grooming (github.com/microsoft)
81 points by goranmoomin on Oct 21, 2020 | hide | past | favorite | 68 comments


I had a quick check and it looks like Microsoft doesn't have an automatic clean-up bot running which is really nice to see.

Compare it to the Istio project which just deletes your issue if someone doesn't respond it to after 28 days. Despite the issue often being quite real.

Also I wonder how these larger projects will fare once Github rolls out the Discussions feature.


Auto cleanup bots are one of my biggest pet peeves in recent years. As another commenter mentioned, angular is the worst offender for me. Ive seen countless _real_ issues get closed over the years because of inactivity by core maintainers.

The only reason I knew those github issues existed in the first place was because I wasted hours (sometimes days) of my life looking into an issue only to eventually stumble upon them and realize its a bug and I'm not "holding it wrong". Good luck to the next person that goes spelunking on the internet for help on those issues.


If only there was a way you could mark an issue as "old, but still possibly relevant." Perhaps by adding a label, or something.... Doesn't all issue tracking software support either labels, tags, or custom statuses beyond "open, todo, in progress, closed"?

I think having a bot tag, but not close, issues over a certain age, seems to make a lot of sense.


The "age" of an issue should be the most recent version number of the software in which it reproduces, and not the date when the ticket was created.


Why? If you want to know how old it is, just look at the filing date.


>Good luck to the next person that goes spelunking on the internet for help on those issues.

Reminds me of: https://xkcd.com/979/


The @angular repositories do that too... The worst are the "lock bots" which prevents any new information being added to the issue... Which is super annoying.... But it must be hard for a maintainer to deal with the notifications


Yeah, but sweeping the issue under the rug is not the way to go.

Sure, you have closed the issue because you did not pay attention to it for the last year. But the problem is still there. So the next person who stumbles upon the old, locked issue will now open a new one. Now you have two issues.


Doesn't this just cause more noise and notifications, not less? A stale issue can be ignored, a closed issue may be duplicated.

Whenever I came across these issues, I'd just open up a new issue and link to the old one and politely ask if they planned on addressing it or not.


Seems they do have automatic cleanup but it's not documented. Seems if the label "needs more info" was added more than a week ago, they close the issue.

The quest for the perfect issue grooming continues.

Example: https://github.com/microsoft/vscode/issues/86409#issuecommen...


I think this is perfectly acceptable. If the user hasn’t demonstrated a reproducible problem and more information is required, the user has a week to submit said information. If not, the issue is cleaned up automatically. I don’t see why the maintainers should have to manually revisit each ticket if the OP fails to respond.

And auto-closing doesn’t mean the issue can’t be reopened. If OP posts additional information after the grace period, the maintainers will be notified and the ticket can be reopened.


Yeah, guess we all have different processes for dealing with reports/GitHub Issues. For me, a closed issue signals a closure of some sort, either that it's not accurate, been fixed, or any other reason. Simply that time has passed does not count as "resolution" for me.

But in the end, whatever works for you!


> For me, a closed issue signals a closure of some sort, either that it's not accurate...

In this case, wouldn't it fall under being "not accurate" if the issuer doesn't supply requested information showing that it's accurate?


No, as I wouldn't have the full picture of the issue, I can't say if it's accurate or not.


Maybe another status "stale" would be the solution here. That way, maintainers would be able to focus on "active" issues while the people who reported the issue could still add information to the issue later on to move it from "stale" back to "open" / "active".


I'd reverse it and say you should have a status for when something is done discussed about, and implementation path is clear "Ready for development" or something like that. Then people who are looking for things to develop, they filter by that. Anything that is not "Ready for development" either needs triaging, or another type of closure.


An automatic clean up bot could help but is usually not thought through (as numbers examples in this thread describe).

There are two common cases where some automation can help:

- User reports bug, dev explains that "it's not a bug, here's why, xxx" (or otherwise provides a satisfactory response) and user, even if satisfied, fails to return to close bug. So instead devs have to remember to close the bug which they usually do immediately after responding, which comes off as abrupt, even though it's not meant that way.

- User reports bug, dev responds with request for more info ("I can't duplicate; could you provide a better test case?"), user never responds.

In both cases it's OK to time out these old reports (which are essentially clutter). You could provide a way for the dev to flag this as one not to be auto-cleaned up.

Another improvement, while I'm at it: provide a Godbolt-like way to submit a test case that can be outrun (i.e. a CI case). Then the bug could be auto-closed if it's in the current release but already fixed in the dev sources. Not all bugs or applications could take advantage of this of course.


Many projects have auto-closing issue bots. More than I realized.

I also don’t like them, but it does force the reporter to stay engaged and bump issues they really care about.

Issue management seems like a major surface for Github to improve on.

I have a work category for each of my projects named simply “ticket laundry” and I know it eats up a lot of time for maintainers and reporters.


I've not heard about the Discussions feature!

Here's a nice write-up I found: https://dzhavat.github.io/2020/04/04/my-thoughts-on-github-d...


Microsoft's automatically closed tickets are a running gag joke with my development team. They are pathetic - problems don't go away because you ignore them.


They don’t. They do however have a policy of closing down tickets which are unpopular after beating customers down.

Like the whole CLR telemetry thing.


I remember when the issue tracker for TypeScript or VSCode was still relatively new, the issue quality was much better. Now if you look at their issues, it's like trying to wade through spam, I hope they have specialized people who don't get demoralized from reading the issue list.

For example just one issue last hour "Slow performance" and basically list of extensions, OS info. To track that one down, they should test each and every extension listed, and still might not have a good understanding what is going on.


The bigger your OS project gets, the more you have a need for help curating and triaging issues; you need a team of customer support, basically. And patience. So much patience.


I would close the issue because the OP didnt search enough. A MCVE should be provided with steps to reproduce. OP should do the work of testing each extensions one by one


This is one area where I feel github is really letting us down.

Everything ends up in a giant "bag", where you end up with 1000+ "issues". While you can go into issues then sort by a given label, this has to be done actively, and constantly kept up to date.

I'd prefer something more like an 'inbox', where issues can be filed into different categories which moves the out of the 'new' section. This is what most large projects are doing anyway with labels.


A large part of this comes down to the venue. The early part of GitHub really exemplified the "move fast and break things" mentality. GitHub broke the software world's ability to grapple with bugs, both in affordances i.e. the tools that it exposes for managing them, and, it would seem, cognitively. The way that GitHub approached bug tracking is one of the most frustrating examples of throwing away all progress just to start over from scratch and ignore everything that came before.

A majority of GitHub's users' first experience interacting with a bug tracker was probably on GitHub, so they never really knew any better. The rest seem to be experiencing some collective amnesia about how to effectively file and otherwise triage/manage bugs. Basically none of the issues ever stumbled upon start with a clear and anodyne set of steps to reproduce. Every "issue" is a conversation. (And this seems to be not only tolerated but encouraged. It's madness.) About half are support requests, maybe more. Even when GitHub is used "correctly" to file bona fide bugs, on the whole, project maintainers seem to treat it as a general intake area for everyone else to file issues, and the project authors themselves hardly use it as their own database for known bugs. They're all jotting down vague descriptions in a text file that stays on their local machine or something, who knows.

The entire phenomenon and attitude has to be one of the top 5 most annoying things about software development in 2020.


True. Comparing issue dependency handling in bugzilla and github is like comparing healthy person's lungs to smoker's lungs.



You are right, but I'd prefer myself something folder based. You could use the same argument to say mail clients don't need explicit folders, but most people seem to want them :)


Gmail doesn't have folders. Even the Inbox is just a label.


You can do this with GitHub Projects (the Trello-like interface). For my personal projects, I use GitHub Issues as a dumping ground for ideas, which automatically go into a "Triage" (inbox) column, which I go through and assign labels and priorities every week or month depending on the project.

The downside with GitHub projects is that you can't automate based on labels so the issues need to be organised into columns manually if you have more than the simple one board with To-Do/In-Progress/Done columns setup. Though search and filters slightly helps with that.


This might be helpful for you! https://github.com/philschatz/project-bot


> I'd prefer something more like an 'inbox', where issues can be filed into different categories which moves the out of the 'new' section. This is what most large projects are doing anyway with labels.

I think they use the backlog milestone for this, which acts as the folder of issues they will eventually work on. https://github.com/microsoft/vscode/milestone/8


I definitely prefer this over trackers that for some inexplicable reason have different flows for bugs, requests, API comments, features and what not. The "inbox" you speak of already exists: unlabelled issues are your inbox. :)


Well I think Github Issues is the best issue tracker out there. No need to complicate things, one list is all you need.

It has labels but most people just the powerful search abilities... It's not that different from Gmail in a lot of ways...


Sounds an awful lot like email, if you ask me.


I can understand why so many issues get created. I've had to read through quite a few issues for node.js debugging issues because they've made changes to the feature.

Before I could just do `node --inspect index.js` and get the debugger to auto attach with my node version set using NVM. There's a few different flags to try. I ended up wasting a few hours to just get debugging working again. Now I have to actually set a launch profile (I've got one which works) but wading through all the issues because they've changed, to me, a fundamental feature was just plan frustrating.


I'm solidly in 'that' crowd.

The is _so_ much churn and worthless changes in the JS ecosystem. AFAIK it's the only ecosystem that has this much churn, which proves that it's not necessary.

My best example of this is the fact that I can now no longer yarn install <package>. Instead, yarn informs me, the command is now yarn add <package>. For gods sake.

Every time I upgrade or build a project more than a week old I'm almost virtually guaranteed to have a deprecation warning somewhere in my stack, that's infuriating too.


For what it's worth, "yarn add" has been the "correct" way to add new dependencies from the beginning.

https://engineering.fb.com/web/yarn-a-new-package-manager-fo...


>AFAIK it's the only ecosystem that has this much churn, which proves that it's not necessary.

ah its funny what people qualify as "proof" these days. Javascript is in the unique position of being the only language integrated into browsers, surely you recognize that this is a big factor in the ecosystem?


That doesn't explain churn in the tooling though. This isn't some new, pre 1.0 ecosystem; node has been around for 11 years now.


It pays well, at least, right?


I always hate explaining these kinds of issues to whomever is paying me. I always feel guilty when I have to log "4 hours debugging tooling".

And it takes me out of my flow. I'm solving difficult problem and these sorts of things, constant and unpredictable, make that task harder.


What do "dev question" and "user question" mean? Are these support questions that get asked using a GitHub Issue as the medium, rather than bug reports?


Yup, people do this all the time.


And GitHub kinda endorses this by having a stock "question" label in newly created repos.


The VSCode recommend this as a way to get help. They used to have multiple channels, e.g. UserVoice, but closed them all in favour of just using GitHub Issues. I can't see anything obviously wrong with that.


Finding duplicate issues should be an O(N^2) problem, I wonder how they manage to do it with 5.3k issues (28 million pairs)?


Institutional memory. If you work on a project long enough, you tend to remember these things. You may not recall the exact issue id, obviously, but you know you've seen something like this and there's a limited number of keywords you have to search to get there.


Nobody does this in an N^2 way. See simhash, minhash, etc

Trivial faster-than-N^2 algorithm:

1. Compute simhash of each issue - O(1) in N, the number of issues.

2. Sort the issues by hamming distance of simhash (N log N in the number of issues)

3. Pick the K issues before or after the current issue in the results (O(1) in the number of issues)

You can precompute/incrementally update any of these steps as issues are added. This would make find duplicates itself O(1) because it would just be step 3.

If you put this in term of size of text in the issues instead of number (which is what you used) it changes the time bounds, but, for example, it's not any worse than sorting/comparing the issues as text strings.


You're not finding duplicates within open issues, you must find them among ALL issues. There are 20 times more, making that 400 times more pairs (10 billion).


Hopefully they can edit or label issues in such a way that helps with searching for duplicates; older issues become a knowledge base / documentation. IIRC it's what Stack Overflow tried to do, turn questions into wiki pages / references.


The problem is that project like VSCode racking up these bugs, and letting them to snowball. It's not that they can't sort them out with 30+ full time professional QAs at Microsoft.

Once upon a time, GNOME had 300 bugs in evolution bugzilla, and it was felt as "the end of the world."

What they need to do is just like GNOME projects a decade+ ago did: make a really hard, like HAAAARD! feature freeze, and keep doing long series of "housekeeping" releases as long as needed before unfreezing work on features.


Wow MS pc police is asleep ? We have not being allowed to use term grooming for like 2 years now.


The wiki is a great reference, but a search of the word "grooming" across Bing [¹], Google [²] and DuckDuckGo [³] all lead to references to child pedophilia.

The definition of that word is changing and not accepting that fact looks dated. Might just be my opinion, feel free to comment if you disagree.

[1] https://www.bing.com/search?q=grooming

[2] https://duckduckgo.com/?q=grooming

[3] https://www.google.com/search?q=grooming


Just because others are using a word for a different meaning doesn’t mean you should abandon its original meaning and just give up on it.

Grooming is not an irregular word. The act of grooming oneself is hopefully a regular activity. Brushing your teeth, trimming your nails, getting hair cuts, etc. The word is also regularly used with pets.

IMO this is like when a while back all the news sites were talking about how the OK hand sign was a white supremacist symbol. If everyone had stopped using it, then perhaps it would still be seen as a symbol for that. But everyone that that was dumb and kept using it the way it always has always has been and that new association was lost.


> The definition of that word is changing

No, it just has multiple definitions, like many words do.


This is nothing new.

Dozens and dozens of terms in our industry have "other" meanings.

For example it is common to talk about "pegging" a server -- a reference to how physical speedometers and other such gauges used to work. "Pegging" has a more recent sexual meaning as well.

And then of course there is "penetration" testing in the security sphere.

And so on and so on. It's fine. There is no conflict.


My favorite: cryptography frequently uses a nonce.


Ah. I was never even aware of the slang definition here: "prison slang a rapist or child molester; a sexual offender" Thank you for the knowledge!

https://www.thefreedictionary.com/nonce

(I see that it is primarily a UK term)


Now I want to get a job in the server room. Boot strapping the master sounds like fun suddenly.


Huh, I thought 'pegging' a server came directly from the sexual meaning. TIL.


Etymology is never really an exact science, but here's a pretty good discussion: https://english.stackexchange.com/questions/202318/etymology...

Anecdotally...

I remember hearing and reading "pegged" in an IT context back in the 1990s, at least 10-15 years before the sex term seemed to gain widespread usage. It was definitely used in entirely work-safe contexts where one would never dream of making anal sex references.

Of course, now that you mention it, I think your belief is likely common enough to make me reconsider its use! I think a lot of folks hearing "pegging" in 2020 probably think of the sex practice before they think of speedometers -- a lot of people don't have cars, and many that do never notice the little pegs.

However, I do think "pegging" (a somewhat obscure IT jargon term, with obscure etymology) is in a different situation than "grooming," an incredibly common everyday word. Retiring a bit of jargon is entirely different than retiring a an everyday word.


Nobody tell them about dog groomers...


The number two also means poop. Should we stop using number two?


[flagged]


UK here. It’s really not sensitive. 99% of us are quite happy with words having different meanings based on context.

The other 1% seem to worry about this and welcome an age of newspeak and book burnings.

I know who I’m afraid of.


And the problem is that we do cater to the 1%. See GitHub trying to replace master branch to main branch, or Python replacing the master-slave terminology, and so forth. This is just two examples, there are billions. There is a guy (or a bot) who has opened over 3k issues to projects about the master-slave terminology change[1]. GitHub is doing nothing against it, and yes, they know about it. You could replace GitHub with loads of other major companies and whatnot, they are not only allowing it, but encourage it, they are speaking up with the 1% and you can clearly see the effects.

[1] https://github.com/bopopescu (check out his "contribution" activity: "Joined GitHub", and then "Opened 3,226 other pull requests in 3,208 repositories")


Yes github actually fucked me up with the whole master/main thing the other day. I was trying to push a local repo I had been working on for a few days to a new github repo. Cost me an hour of head scratching until I realised what the deal was. I've defaulted all my repos back to "master" in line with git CLI which is the single source of truth.

Github is on my personal shitlist for a number of reasons though and this is nowhere near #1.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: