> He immediately gets angry at Facebook messing up his name. The company changes...

teekert · on Oct 13, 2020

Yeah, in the Netherlands we also don't use the "de" and "van" for sorting so it was confusing to find my badge under the "v" at American conferences the first time. Ah well, what can you do? A friend of mine had his FB acount blocked for failing to provide a real name (his last name is "Fun"), ironically after he gave FB a false name they did accept it.

hnarn · on Oct 13, 2020

> A friend of mine had his FB acount blocked for failing to provide a real name (his last name is "Fun")

Don't they have a process where you can submit a government ID and have your name accepted? Honestly I don't blame the minimum wage person responsible for name screening for flagging a name that is both uncommon and one of the most common adjectives in the English language, if the policy is that fake names aren't allowed.

randallsquared · on Oct 13, 2020

> I don't blame the minimum wage person responsible for name screening

My expectation is that the person who wrote the code responsible for rejecting my name in the mid-to-late 2000s was paid somewhat more than minimum wage.

oh_sigh · on Oct 13, 2020

I think OP was talking about the support staff using the system that the highly paid engineers wrote.

feoren · on Oct 13, 2020

I think GP was implying there was no support staff using that system -- it's entirely automated. That's why the engineer was so highly paid to begin with.

BuffaloBagel · on Oct 13, 2020

I know a guy from Africa named Test. That's a no-go on Facebook's platform so he is known by Tesst there.

OkGoDoIt · on Oct 13, 2020

They do have a process for providing copies of your government ID to have your account unblocked, but I have 2 friends who did that and neither one ever heard anything back or successfully got their accounts unblocked. Normal American names, nothing funky. As far as any of us can figure out, they hadn’t done anything wrong in the first place, just randomly unluckily somehow displeased The Algorithm.

Facebook is fine until you get caught in the machine at no fault and with no recourse. I would have closed my account by now but I work in low-budget theater and Facebook events and Facebook advertising are unfortunately required to make it in the industry.

teekert · on Oct 13, 2020

I don't blame them, but I would prefer to provide an accepted fake name myself over sending a document.

dwighttk · on Oct 13, 2020

>minimum wage person

You mean algorithm?

MeinBlutIstBlau · on Oct 13, 2020

That's kind of interesting. I grew up in a heavily dutch town (lineage only) in the US and we always organized last names with "van" in the V's.

teekert · on Oct 13, 2020

Yeah Belgians also do that (afaik), so they have big D and V categories.

input_sh · on Oct 13, 2020

One example I remember from high school is a person whose first name is Admin. Granted, it's an uncommon name, but he's unable to use his real name in many, many online services (Facebook being one of them of course).

ClikeX · on Oct 13, 2020

Sites used to refuse the "van der " in my last name all the time. Had to remove the whitespaces to get it to work.

Also, sites out my last name under v, which isn't how names are sorted here.

"van der Name" is usually written as "Name, van der" in printed lists to make checking for names easier.

And many dutch IT systems have a separate field for this. We call it a "tussenvoegsel". Which would roughly translate to a "middle addition". It's not a middle/second name either. Cause those are processed separately here as well.

dontchooseanick · on Oct 13, 2020

And of course don't forget Pablo Picasso :)

Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso (https://wikipedia.org/wiki/Pablo_Picasso)

pteraspidomorph · on Oct 13, 2020

...of Ulm ;)

Name forms really should just be:

Your full legal name: [accepts anything, modifies nothing]

Shortened name of your choice for our UI: [can have restrictions, esp. length]

iudqnolq · on Oct 13, 2020

I feel like the pattern of names in the UI comes from wanting to justify collecting the name in the first place. What is it actually good for?

tomp · on Oct 13, 2020

Indeed. I consider "gender" even sillier - e.g. Facebook adding 100s of different genders to their website - how about just removing the field altogether? Same about "legal sex" - why would the government care in the first place?! (Doctors might, but the rest of the government, not really.)

ponker · on Oct 13, 2020

This is for Facebook's ad targeting. They want advertisers to be able to advertise to 21-24 year old women in California.

jfengel · on Oct 13, 2020

They can probably infer it as accurately as asking people. The inference isn't always accurate, but people don't always tell the truth, either.

ecnahc515 · on Oct 13, 2020

They can do both. Someone lying or their inference being wrong about their gender is yet another new data point.

wang_li · on Oct 13, 2020

Different sexes have different civic obligations. E.g. Selective Service, a.k.a the draft, in the US.

jfk13 · on Oct 13, 2020

Sounds like it's ripe for a discrimination lawsuit.

dragonwriter · on Oct 13, 2020

> Sounds like it's ripe for a discrimination lawsuit.

Such a lawsuit has occurred:

https://en.wikipedia.org/wiki/National_Coalition_for_Men_v._...

InitialLastName · on Oct 13, 2020

I can see wanting to know what pronouns someone uses so you can autogenerate reasonable sentences about that person (i.e. "Tomp marked himself safe from the rabid bears in Honolulu")

will4274 · on Oct 13, 2020

Phone, email, messenger, basically any collaboration software? My university assigned email addresses by initials - e.g. if you name was John Anderson Smith and you were the 167th person with the initials JAS, you'd be jas167@example.edu. Which is easier to find, search, and read in a contact list - John Smith or jas167?

toast0 · on Oct 13, 2020

Hello X near the content you're looking for is a confirmation that you're logged in; without having to look at the header (which is usually more explicit). Having a friendly name there is an attempt to use less space.

ponker · on Oct 13, 2020

When I'm looking at my contacts in Google Docs or Slack or whatever it's critical to have the names there. If it was bigchungus12@gmail.com and db23423@exxon.com it would be very annoying.

brnt · on Oct 13, 2020

Another thing is overlooked, which is strange since these companies hire so many statisticians and much about their work is about understanding populations and individual preferences.

When you have huge populations (2B) you're outliers are going to be similarly huge. The meaning and usefulness of means and medians over large and/or disparate populations loses its meaning. There are tons of distinguishable subpopulations (Dutch) which are pointless to lump in with Americans. Etc etc etc. You'd think someone would be familiar with subpopulations and the limitations of treating 2B users as a normal distribution, but yet that seems yet to be discovered.

Ridiculous.

m4lvin · on Oct 13, 2020

I am still surprised that the two-character icons which Zoom uses for accounts without profile pictures become "A�" for anyone called "Alice Ørland" or similar.

It is even more surprising that Ö, Š, etc. become �, but 文军 shows up fine.

capableweb · on Oct 13, 2020

Not so weird when you consider the context. Zoom is an North American company with most (if not all) of it's product/engineering development happening in China. Just reading public information, it seems they have no development teams in the North of Europe nor Eastern Europe, so it's only natural that some character sets ends up better supported than others.

_-___________-_ · on Oct 13, 2020

It's likely that all the characters are being transferred and stored in the same charset (probably a Unicode one) regardless of what character is entered. The replacement glyph (the question mark) might be caused by their use of a font with incomplete Unicode coverage, or (more likely) by a buggy "take the first character of each word in their name" algorithm, for example using the first byte and then special-casing Chinese under the mistaken impression that nothing else needs more than one byte.

capableweb · on Oct 13, 2020

From someone who doesn't know a lot in that area of software, thanks for the (possible) correction, learned something new today!

raverbashing · on Oct 13, 2020

I agree, probably mishandling of combining characters

Xelbair · on Oct 13, 2020

Yeah, that would work as an excuse back in ANSI days.

we have UTF8 as a standard nowadays.

jdmichal · on Oct 13, 2020

Unicode doesn't magically make this stuff go away, much less any specific encoding of it. A glyph can consist of multiple codepoints, and then those can sometimes be standardized to other codepoints. For instance, Unicode has codepoint U+00C4 for an A with a diaeresis (aka umlaut). But it also has codepoints U+0041 U+0308 for an A with a combining diaeresis, which should then map to the combined U+00C4 for font rendering.

Xelbair · on Oct 13, 2020

and both cases should be handled properly, as it is in Unicode standard.

microcolonel · on Oct 13, 2020

Specifically for that example, Zoom is written and maintained in China, so I would expect the developers made special code for segmenting chinese names but not non-English latin ones.

rusk · on Oct 13, 2020

Are these characters strictly speaking "Latin"? Would they be categorised as "Nordic" or "Cyrillic"?

Standard "Latin" characters to me are those that are supported on a Latin keyboard, and provided for by the latin codepage.

pedrosorio · on Oct 13, 2020

They are all Latin according to Unicode 1.0

https://en.m.wikipedia.org/wiki/Latin_script_in_Unicode

Bayart · on Oct 13, 2020

They're Latin characters with diacritics. They're not usually treated as separate letters.

skocznymroczny · on Oct 13, 2020

I always worry with plane tickets because I have a middle name on my passport but many buying services don't take middle names into account.

Also the name of my town has a "Ą" letter in it, which also is problematic in online forms and I often just write A instead just to be on the safe side.

doikor · on Oct 13, 2020

By definition there is no legal middle names here in Finland (all of the 1 to 4 names given to you are called "first name" in the law). And thus all of your names will be in the passport too.

Filling some foreign (or poorly done/ported local) it system forms can be bit of a guess work.

Also as a bonus space " " is a legal character in a name. Both "Jukka Pekka" and "Jukka-Pekka" are valid names (also you could have 2 names "Jukka" and "Pekka")

detaro · on Oct 13, 2020

> "Jukka Pekka" [...] also you could have 2 names "Jukka" and "Pekka"

How is that distinguished in legal documents/while registering the name/...?

doikor · on Oct 13, 2020

In most important official documents name is not the only identifier as you also add the national identification number. I guess in most other use cases you just trust humans to get it right.

As for how to register such a name correctly for a baby I have no clue. Also you don't have to give the baby a name at birth (you have 60 days) but the national identification number is given to the baby at birth (it is just date of birth, sequence number and a checksum character). All I know is that I have a colleague which such a name and have seen people before with space instead of - on a 2 part name

edit: How we usually do forms for this stuff is just 2 fields. One for all of your first names and the second for your family name(s) (you can have multiple for example both of your parents or some foreign with de/von/etc). Validation is mostly "check that they are not empty"

PeterisP · on Oct 13, 2020

Ah, it's so interesting to contrast different approaches.

For example, in Latvia, the legal treatment is that if you have two first names (two is the legal limit) then they are space delimited and if you have two surnames (it's becoming popular to join the surnames after marriage instead of changing the surname of one spouse) then they are hyphenated.

So if you see "Alpha Beta Gamma" then that means Alpha and Beta as given names and Gamma as the surname; and "Alpha Beta-Gamma" means that Alpha is the given name and Beta-Gamma is the surname, so the name in any official documents can be unambiguously parsed.

mcv · on Oct 13, 2020

> "all of the 1 to 4 names given to you are called "first name" in the law"

Is 4 a hard maximum? Because in Dutch it's not unheard of to have more. Not common (most people have 1 or 2), but at least one famous politician had 5 first names.

doikor · on Oct 13, 2020

Current law only allows 4 but allows exception for foreigners (but then the name has to fulfill that other countries naming conventions). Though there is a superseding law that roughly says "the name must not intentionally bring harm to the child" meaning they can block you from giving weird/stupid name to a child (so name like Elon Musks youngest "X Æ A-12" would never be allowed to be given to a baby here)

brnt · on Oct 13, 2020

I have three first names, and round these parts they are space separated. Turns out that in a neighbouring country I should be comma separating them, or they are treated as a single first name (that happens to have spaces).

thiagocsf · on Oct 13, 2020

It’s incredibly ironic that the author also misspelled Hans de Zwart’s name as De Zwart, just like Facebook did.

Ballas · on Oct 13, 2020

It is capitalized when used without a first name.

(and the author is Hans de Zwart)

mcv · on Oct 13, 2020

More precisely: the first letter of a name is always capitalised.

So in "Hans de Zwart", the 'H' is the first letter, and therefore capitalised. But in "De Zwart", the 'D' is the first letter, and therefore capitalised, even if it normally wouldn't be.

There might be an exception to that if that letter is not part of a whole word. "De" is a word, but I once knew a guy whose last name was 't Zet. The "'t" is not a word (it's short for "het", the neutral version of "the", whereas "de" is gendered[0]), so probably wouldn't be capitalised[1]. Now imagine 200 countries and languages with exceptions like that and imagine having to write software to handle all of that correctly. This stuff was never a problem before the internet, but I expect the next century is going to see a lot of simplifications in language.

[0] Yeah, in Dutch, there are two articles: "de" which is gendered, and "het" which isn't. Compare to French "le" and "la", which are both gendered, one male and one female, and there's no neural article. In Dutch there is, but the gendered article doesn't care whether it's male or female; it works for both and doesn't actually care about gender, just that it's there. So it's gendered in a neutral way. Wrap your head around that.

[1] This is also true at the start of a sentence: always capitalised, except when it's not a complete word. If you start a sentence with 's avonds ("in the evening), you capitalise the 'A', not the 's', which is actually the last letter of the archaic possessive 'des'.

kmm · on Oct 13, 2020

> [0] Yeah, in Dutch, there are two articles: "de" which is gendered, and "het" which isn't. Compare to French "le" and "la", which are both gendered, one male and one female, and there's no neural article. In Dutch there is, but the gendered article doesn't care whether it's male or female; it works for both and doesn't actually care about gender, just that it's there. So it's gendered in a neutral way. Wrap your head around that.

That's a curious way to present it. "het" is gendered, it corresponds to the neuter gender. There is nothing neutral about it, that's just a (bad) name. In Russian for example, they call it the middle gender. Dutch nouns take one of two genders (three in some Belgian dialects), just like in French.

Both Latin and Proto-Germanic had three genders, which correspond to what we would now call male, female and neuter. Over the span of centuries Latin merged male and neuter into one, leaving French with male/female, whereas in the Germanic languages usually male and female merged into a common gender, leaving common/neuter. But apart from that, it's exactly the same phenomenon.

Vinnl · on Oct 13, 2020

Since we're bringing up fun facts about Dutch capitalisation, when the first two letters are "IJ", which pronounced as a single letter (a vowel) in Dutch, then both are capitalised.

mcv · on Oct 13, 2020

There are (a lot of) people who consider the 'ij' to be a single letter. When asked whether that means Dutch has 27 letters in the alphabet or the 'y' is not a Dutch letter, the discussion becomes very confused. Best explanation is probably that the 'ij' is a letter that's not in the alphabet, or it is, but shares the 25th spot in the alphabet with the 'y'. But it is still a different letter, because "symbool" and "royaal" are also valid Dutch words. The situation isn't helped by the fact that some names and words that currently contain an 'ij' used to contain an 'y'.

(Personally I think it's two letters, but there are very serious sources, including a major encyclopedia as well as primary schools, that disagree. In games and puzzles it's also usually considered to be a single letter.)

https://en.wikipedia.org/wiki/IJ_%28digraph%29

robertlagrant · on Oct 13, 2020

According to Medium. At the top of the article another name is given.

eythian · on Oct 13, 2020

At the bottom it says:

> This article was written by Reinier Kist and originally appeared in Dutch in NRC on August 3rd, 2020. It was translated into English by Hans de Zwart.

In the original source: https://archive.is/g6H0F it is also "De Zwart" when without the first name, and I would reasonably trust NRC to get it right.

naniwaduni · on Oct 13, 2020

I would hope it would be even more reasonable to expect Hans de Zwart to get it right!

barrkel · on Oct 13, 2020

The Dutch language version of the article was written by Reinier Kist but the English version was translated by Hans de Zwart.

barkingcat · on Oct 13, 2020

Note that this sentence (and the whole article) contains the same mis-capitalization the articles complains Facebook is doing. It is most likely an aggressive final pass of spellcheck or auto-correct that wasn't re-incorrected before publishing, but really ironic all the same.

superjan · on Oct 13, 2020

No, if you only write the last name in isolation then you should capitalize the ‘D’. So it’s Mr De Zwart, but Hans de Zwart.

a1369209993 · on Oct 13, 2020

Honestly, this sounds less like "falsehoods programmers believe about names", and more like "natural language processing is terrible". A real name policy is unconscionable, of course, but "Found 1 sheeps."-isms on the display side are only a serious problem (rather than a nuisance) to the extent that they trigger (possibly latent) serious problems in something else.

fantod · on Oct 13, 2020

I'm not a web developer but whenever this kind of thing happens I just wonder why there isn't a single standard library in every common web language to deal with this and if there is, why it's not being used more often.

enriquto · on Oct 13, 2020

Do you really need a library for that? Why can't you leave the name as it was entered by the user, without making further assumptions?

thelean12 · on Oct 13, 2020

Because people will abuse everything.

Whether it's curse words or porn websites or whatever.

enriquto · on Oct 14, 2020

> Whether it's curse words

I guess most plain english words would be curse words in other languages. Conversely, many plain words and names in other languages will be incorrectly interpreted as english curse words. There's no way to avoid this. Forbidding english curse words would be extremely offensive to people whose name coincides with those words.

thelean12 · on Oct 14, 2020

And these are discussions to be had when developing, including what to do when someone does abuse the lack of curse word filters. Because people WILL abuse it.

The same discussion will also have to talk about whether Mr. SomepornwebsiteDOTcom or Mrs. Hitlerdidnothingwrong are real people and what to do about it.

minitech · on Oct 13, 2020

Making assumptions about name formatting doesn't stop name fields from being abused in those ways.

mytailorisrich · on Oct 13, 2020

Names are too complicated and too personal.

The issue is that many developers have an urge to do some 'clever' processing on them when, really, my conclusion is that they should be left alone. Just sanitise them for security purpose and that's it. This is a typical 'less is more' scenario.

The best person to write a specific name as it should be is the person the name belongs to, so just let them do it.

dmurray · on Oct 13, 2020

> The best person to write a specific name as it should be is the person the name belongs to, so just let them do it.

It's inconvenient to have the person you are writing about copyedit all of your writing. How would that even work?

And if you don't allow them to do that, how do you account for the situation here where Mr de Zwart's name is capitalized differently depending on context? I'm not even sure if I have it right here.

cryvate1284 · on Oct 13, 2020

In this case, the problem arises, because they are using Mr de Zwart [sic], but can this not by avoided by simply using the full name? Mr Thomas de Zwart?

Previously I wasn't a big fan of this style, but it has grown on me, and I see it now as similar using "they" instead of he/she for an unspecified person. Why assume anything about the name, when you can reuse it verbatim when needed?

dmurray · on Oct 13, 2020

This seems like a reasonable try but there are some problems here too. What if you're quoting someone else who called him (in speech) Mr de Zwart? Or when you find out that it's correct to use the Emperor of Japan's full name in some circumstances, but extremely insulting in another?

I think at some point you abandon looking for a general solution, and employ an expert editor who can advise you on the right style. If you're writing for a small town newspaper far away from the Netherlands and get this wrong because you don't have that expert editor, that's OK, you tried. But if you're Facebook and you have 10 million Dutch customers and you get this wrong setting the policy for the most prominent few words on the whole site, you can afford to have better standards than that.

rusticpenn · on Oct 13, 2020

Why do you need a library. Name is an open input field. As long as the inputs are sanitized, you should just accept whatever is typed there.

kace91 · on Oct 13, 2020

Depends on the language and what you'll use the name for.

Some names change capitalisation when the last name is used by itself vs full name for example.

In some languages a name affect the words around it in a sentence (for a random example, an "o" in Spanish (or) becomes "u" when the next word starts by the "o" sound -> "Carlos o Maria", but "Carlos u Oscar".

In general, for a sufficiently large and well-localised application you will need to modify or parse the name at some point. Not sure that a library can do that properly though.

rusticpenn · on Oct 13, 2020

I agree with you, however these are small issues compared to having arbitrary requirements for names which create real life problems.

hutzlibu · on Oct 13, 2020

There is no need for a libaray, there is just need for a consistent standard.

Meaning, if all programms would use UTF-8, fine. But they don't. But character -encoding is somewhat complicated, because there are a lot of languages out there. Some who even need more than 2 bytes per character, so you get special cases, which break on another system etc.

moksly · on Oct 13, 2020

Being in the public sector, doing our 3rd party integration and being from Denmark gets me the joy of having to deal with a lot of relatively different APIs of very varied quality and æøå in names.

Sometimes when people think they are using UTF-8, they aren’t. I’m not sure if that’s because they are incompetent or if their tooling lies to them, but I’ve gone through soooooo many weird encodings over the years trying to match what was specified as standard UTF-8.

z3t4 · on Oct 13, 2020

Most programs use something that is compatible with UTF-8. And UTF-8 is also backwards compatible with ASCII which is why A-z usually works. I actually think UTF-16 is the most common. The issues start to arise when three or more code points/bytes are combined into one character/glyph.

numpad0 · on Oct 13, 2020

UTF-8 exists, just not recognized its necessity where it’s needed ʕ´•ᴥ•`ʔ

rusticpenn · on Oct 13, 2020

I got this problem with several payment processors as the allowed name and name on my credit/debit cards never match. After a few years, I changed(modified) my name.

fer · on Oct 13, 2020

> which is especially ironic as you are usually explicitly requested to provide the name as stated in the passport.

Mine either doesn't go through, or goes through and is silently trimmed down to whatever amount of chars they support.

namdnay · on Oct 13, 2020

There are actually a whole series of algorithms used to help match the name on your passport with the name in the airline reservation system, due to absolute mayhem when dealing with asian names for example

scott31 · on Oct 13, 2020

People already started having emojis in their name, having hyphen is just a minor annoyance, and I'm willing to bet it being dropped does not change anything for you