It is common practice in psychometrics to use two levels in a forced choice and model responses as a logistic regression, which is what's done here. Adding an N/A option turns the thing into an ordered logistic regression with unknown levels, which is tricky to fit, but it's possible. Having done a lot of psychophysics, having more options generally doesn't make the task easier.
Sounds like psychometrics is unsuitable for modeling this problem, according to what you're saying. When you have a hammer everything looks like a nail.
The way that XKCD did it is the best, you ask people to give a name to each color then the responses are entirely natural and unprompted.
I don’t think that forced choice can give accurate results if a substantial number of people perceive green and blue as being non-adjacent - i.e. there exists a color between green and blue (turquoise/cyan/teal).
Otherwise it’s like asking people whether a color is red or yellow, when it’s clearly a shade of orange.
Are you sure that it is common practice for a problem that has three valid answers A, B and C, to only allow people to answer A or C?
Your website is not talking about "levels" of colour.
It's asking "is this blue or green", not "is this closer to blue or closer to green".
The question (1) "is this blue or green" has three valid answers: blue, green or neither.
The question (2) "is this closer to blue or green" only has two valid answers.
I would assume that with these types of surveys, the first thing to do is to qualify the proper categorization of the question.
Sorry to say, but to me it seems that almost all of the confusion in the discussion here is because you're asking question (1) (which has three valid answers) but expecting an answer from (2) (which indeed has two valid answers).
But teal isn't a single point, it's a range. You can have teals that are more blue or more green than each other; they can't all be zero. Whichever one you choose to be the true transition point between blue and green, there will be teals that are more blue or green than that one.
Then by that framing, the test is asking you to decide what hue value is the "zero" between the positive/negative blue/green. Is the wording imperfect? Sure, but the intent was still entirely clear.
Saying it’s a subrange implies you can perceive differences in tone within it. In which case, reframe the question as “is this shade of teal closer to the blue or green end of the subrange” if you like.
Maybe if I'm given two colors inside that range, I can say which is bluer and which is greener.
Given just one color, I simply cannot say that it's green or blue, or even if it's more green than blue or vice versa.
I stopped at the 3rd or 4th come because I couldn't give a honest answer. That makes the test useless.
I can't complete it with correct answers, and if I give incorrect answers, the conclusion is useless.
It's a well know fact that people are unable to distinguish colours that are too close together.
You could even have a smooth gradient from colour 'a' through colour 'b' to colour 'c', where it's possible to distinguish 'a' from 'c' but not to distinguish 'b' from either 'a' or 'c'.
I think the main point of this test was to determine the position of teal in your case, as your definition of teal is the midpoint(-ish range) between blue and green. (For me it's more blue though.)
I mean, a good test would be able to detect that neither-blue-nor-green range and approximate midpoint as well, and it should be fair to say the midpoint is indeed the threshold between blue and green. (I don't think the current version of test can do this, though.)
I actually checked that at the end of the test (when it shows the gradient image with the response overlay).
There were two distinct points, one for blue and one for green, where my mind would place the transition to the colour in between.
(And yes, on one end it's bluer and on the other end greener, but (much like a shade of orange is neither red nor yellow) the colours are still not either green or blue.)
No, I'm saying that the sliver of a chasm between the colour in isolate, and what I subconsciously imagine the midpoint to be, is so damned thin that were I to look at the colours side by side, I could not distinguish one from t'other.
And (even if I could) a bluish teal would no more be a blue than a reddish orange a red.
It's not about how an RGB monitor produces the color, it's about how it's perceived. #00ffff ("Cyan" or "Aqua" [1]) looks bluer to me than green, while #008080 ("Teal") looks significantly greener, despite both colors using equal amounts of blue and green in RGB.
With the third colour, I just thought "no, that's teal", and my decision was (as you suggested) semi-arbitrary.