Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> He immediately gets angry at Facebook messing up his name. The company changes his name to ‘Hans De Zwart’, with a capitalized D. A small annoyance, but for De Zwart it signifies something bigger: “It is the arrogance of a giant American corporation which considers the correct spelling of the names of millions of Dutch people an edge case.”

This reminds me of the excellent "Falsehoods Programmers Believe About Names" [1].

As a person with a hyphen in his first name I also get regularly mistreated by all kinds of web forms, worst of all flight tickets, which is especially ironic as you are usually explicitly requested to provide the name as stated in the passport.

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...



Yeah, in the Netherlands we also don't use the "de" and "van" for sorting so it was confusing to find my badge under the "v" at American conferences the first time. Ah well, what can you do? A friend of mine had his FB acount blocked for failing to provide a real name (his last name is "Fun"), ironically after he gave FB a false name they did accept it.


> A friend of mine had his FB acount blocked for failing to provide a real name (his last name is "Fun")

Don't they have a process where you can submit a government ID and have your name accepted? Honestly I don't blame the minimum wage person responsible for name screening for flagging a name that is both uncommon and one of the most common adjectives in the English language, if the policy is that fake names aren't allowed.


> I don't blame the minimum wage person responsible for name screening

My expectation is that the person who wrote the code responsible for rejecting my name in the mid-to-late 2000s was paid somewhat more than minimum wage.


I think OP was talking about the support staff using the system that the highly paid engineers wrote.


I think GP was implying there was no support staff using that system -- it's entirely automated. That's why the engineer was so highly paid to begin with.


I know a guy from Africa named Test. That's a no-go on Facebook's platform so he is known by Tesst there.


They do have a process for providing copies of your government ID to have your account unblocked, but I have 2 friends who did that and neither one ever heard anything back or successfully got their accounts unblocked. Normal American names, nothing funky. As far as any of us can figure out, they hadn’t done anything wrong in the first place, just randomly unluckily somehow displeased The Algorithm.

Facebook is fine until you get caught in the machine at no fault and with no recourse. I would have closed my account by now but I work in low-budget theater and Facebook events and Facebook advertising are unfortunately required to make it in the industry.


I don't blame them, but I would prefer to provide an accepted fake name myself over sending a document.


>minimum wage person

You mean algorithm?


That's kind of interesting. I grew up in a heavily dutch town (lineage only) in the US and we always organized last names with "van" in the V's.


Yeah Belgians also do that (afaik), so they have big D and V categories.


One example I remember from high school is a person whose first name is Admin. Granted, it's an uncommon name, but he's unable to use his real name in many, many online services (Facebook being one of them of course).


Sites used to refuse the "van der " in my last name all the time. Had to remove the whitespaces to get it to work.

Also, sites out my last name under v, which isn't how names are sorted here.

"van der Name" is usually written as "Name, van der" in printed lists to make checking for names easier.

And many dutch IT systems have a separate field for this. We call it a "tussenvoegsel". Which would roughly translate to a "middle addition". It's not a middle/second name either. Cause those are processed separately here as well.


And of course don't forget Pablo Picasso :)

Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso (https://wikipedia.org/wiki/Pablo_Picasso)


...of Ulm ;)

Name forms really should just be:

Your full legal name: [accepts anything, modifies nothing]

Shortened name of your choice for our UI: [can have restrictions, esp. length]


I feel like the pattern of names in the UI comes from wanting to justify collecting the name in the first place. What is it actually good for?


Indeed. I consider "gender" even sillier - e.g. Facebook adding 100s of different genders to their website - how about just removing the field altogether? Same about "legal sex" - why would the government care in the first place?! (Doctors might, but the rest of the government, not really.)


This is for Facebook's ad targeting. They want advertisers to be able to advertise to 21-24 year old women in California.


They can probably infer it as accurately as asking people. The inference isn't always accurate, but people don't always tell the truth, either.


They can do both. Someone lying or their inference being wrong about their gender is yet another new data point.


Different sexes have different civic obligations. E.g. Selective Service, a.k.a the draft, in the US.


Sounds like it's ripe for a discrimination lawsuit.


> Sounds like it's ripe for a discrimination lawsuit.

Such a lawsuit has occurred:

https://en.wikipedia.org/wiki/National_Coalition_for_Men_v._...


I can see wanting to know what pronouns someone uses so you can autogenerate reasonable sentences about that person (i.e. "Tomp marked himself safe from the rabid bears in Honolulu")


Phone, email, messenger, basically any collaboration software? My university assigned email addresses by initials - e.g. if you name was John Anderson Smith and you were the 167th person with the initials JAS, you'd be jas167@example.edu. Which is easier to find, search, and read in a contact list - John Smith or jas167?


Hello X near the content you're looking for is a confirmation that you're logged in; without having to look at the header (which is usually more explicit). Having a friendly name there is an attempt to use less space.


When I'm looking at my contacts in Google Docs or Slack or whatever it's critical to have the names there. If it was bigchungus12@gmail.com and db23423@exxon.com it would be very annoying.


Another thing is overlooked, which is strange since these companies hire so many statisticians and much about their work is about understanding populations and individual preferences.

When you have huge populations (2B) you're outliers are going to be similarly huge. The meaning and usefulness of means and medians over large and/or disparate populations loses its meaning. There are tons of distinguishable subpopulations (Dutch) which are pointless to lump in with Americans. Etc etc etc. You'd think someone would be familiar with subpopulations and the limitations of treating 2B users as a normal distribution, but yet that seems yet to be discovered.

Ridiculous.


I am still surprised that the two-character icons which Zoom uses for accounts without profile pictures become "A�" for anyone called "Alice Ørland" or similar.

It is even more surprising that Ö, Š, etc. become �, but 文军 shows up fine.


Not so weird when you consider the context. Zoom is an North American company with most (if not all) of it's product/engineering development happening in China. Just reading public information, it seems they have no development teams in the North of Europe nor Eastern Europe, so it's only natural that some character sets ends up better supported than others.


It's likely that all the characters are being transferred and stored in the same charset (probably a Unicode one) regardless of what character is entered. The replacement glyph (the question mark) might be caused by their use of a font with incomplete Unicode coverage, or (more likely) by a buggy "take the first character of each word in their name" algorithm, for example using the first byte and then special-casing Chinese under the mistaken impression that nothing else needs more than one byte.


From someone who doesn't know a lot in that area of software, thanks for the (possible) correction, learned something new today!


I agree, probably mishandling of combining characters


Yeah, that would work as an excuse back in ANSI days.

we have UTF8 as a standard nowadays.


Unicode doesn't magically make this stuff go away, much less any specific encoding of it. A glyph can consist of multiple codepoints, and then those can sometimes be standardized to other codepoints. For instance, Unicode has codepoint U+00C4 for an A with a diaeresis (aka umlaut). But it also has codepoints U+0041 U+0308 for an A with a combining diaeresis, which should then map to the combined U+00C4 for font rendering.


and both cases should be handled properly, as it is in Unicode standard.


Specifically for that example, Zoom is written and maintained in China, so I would expect the developers made special code for segmenting chinese names but not non-English latin ones.


Are these characters strictly speaking "Latin"? Would they be categorised as "Nordic" or "Cyrillic"?

Standard "Latin" characters to me are those that are supported on a Latin keyboard, and provided for by the latin codepage.


They are all Latin according to Unicode 1.0

https://en.m.wikipedia.org/wiki/Latin_script_in_Unicode


They're Latin characters with diacritics. They're not usually treated as separate letters.


I always worry with plane tickets because I have a middle name on my passport but many buying services don't take middle names into account.

Also the name of my town has a "Ą" letter in it, which also is problematic in online forms and I often just write A instead just to be on the safe side.


By definition there is no legal middle names here in Finland (all of the 1 to 4 names given to you are called "first name" in the law). And thus all of your names will be in the passport too.

Filling some foreign (or poorly done/ported local) it system forms can be bit of a guess work.

Also as a bonus space " " is a legal character in a name. Both "Jukka Pekka" and "Jukka-Pekka" are valid names (also you could have 2 names "Jukka" and "Pekka")


> "Jukka Pekka" [...] also you could have 2 names "Jukka" and "Pekka"

How is that distinguished in legal documents/while registering the name/...?


In most important official documents name is not the only identifier as you also add the national identification number. I guess in most other use cases you just trust humans to get it right.

As for how to register such a name correctly for a baby I have no clue. Also you don't have to give the baby a name at birth (you have 60 days) but the national identification number is given to the baby at birth (it is just date of birth, sequence number and a checksum character). All I know is that I have a colleague which such a name and have seen people before with space instead of - on a 2 part name

edit: How we usually do forms for this stuff is just 2 fields. One for all of your first names and the second for your family name(s) (you can have multiple for example both of your parents or some foreign with de/von/etc). Validation is mostly "check that they are not empty"


Ah, it's so interesting to contrast different approaches.

For example, in Latvia, the legal treatment is that if you have two first names (two is the legal limit) then they are space delimited and if you have two surnames (it's becoming popular to join the surnames after marriage instead of changing the surname of one spouse) then they are hyphenated.

So if you see "Alpha Beta Gamma" then that means Alpha and Beta as given names and Gamma as the surname; and "Alpha Beta-Gamma" means that Alpha is the given name and Beta-Gamma is the surname, so the name in any official documents can be unambiguously parsed.


> "all of the 1 to 4 names given to you are called "first name" in the law"

Is 4 a hard maximum? Because in Dutch it's not unheard of to have more. Not common (most people have 1 or 2), but at least one famous politician had 5 first names.


Current law only allows 4 but allows exception for foreigners (but then the name has to fulfill that other countries naming conventions). Though there is a superseding law that roughly says "the name must not intentionally bring harm to the child" meaning they can block you from giving weird/stupid name to a child (so name like Elon Musks youngest "X Æ A-12" would never be allowed to be given to a baby here)


I have three first names, and round these parts they are space separated. Turns out that in a neighbouring country I should be comma separating them, or they are treated as a single first name (that happens to have spaces).


It’s incredibly ironic that the author also misspelled Hans de Zwart’s name as De Zwart, just like Facebook did.


It is capitalized when used without a first name.

(and the author is Hans de Zwart)


More precisely: the first letter of a name is always capitalised.

So in "Hans de Zwart", the 'H' is the first letter, and therefore capitalised. But in "De Zwart", the 'D' is the first letter, and therefore capitalised, even if it normally wouldn't be.

There might be an exception to that if that letter is not part of a whole word. "De" is a word, but I once knew a guy whose last name was 't Zet. The "'t" is not a word (it's short for "het", the neutral version of "the", whereas "de" is gendered[0]), so probably wouldn't be capitalised[1]. Now imagine 200 countries and languages with exceptions like that and imagine having to write software to handle all of that correctly. This stuff was never a problem before the internet, but I expect the next century is going to see a lot of simplifications in language.

[0] Yeah, in Dutch, there are two articles: "de" which is gendered, and "het" which isn't. Compare to French "le" and "la", which are both gendered, one male and one female, and there's no neural article. In Dutch there is, but the gendered article doesn't care whether it's male or female; it works for both and doesn't actually care about gender, just that it's there. So it's gendered in a neutral way. Wrap your head around that.

[1] This is also true at the start of a sentence: always capitalised, except when it's not a complete word. If you start a sentence with 's avonds ("in the evening), you capitalise the 'A', not the 's', which is actually the last letter of the archaic possessive 'des'.


> [0] Yeah, in Dutch, there are two articles: "de" which is gendered, and "het" which isn't. Compare to French "le" and "la", which are both gendered, one male and one female, and there's no neural article. In Dutch there is, but the gendered article doesn't care whether it's male or female; it works for both and doesn't actually care about gender, just that it's there. So it's gendered in a neutral way. Wrap your head around that.

That's a curious way to present it. "het" is gendered, it corresponds to the neuter gender. There is nothing neutral about it, that's just a (bad) name. In Russian for example, they call it the middle gender. Dutch nouns take one of two genders (three in some Belgian dialects), just like in French.

Both Latin and Proto-Germanic had three genders, which correspond to what we would now call male, female and neuter. Over the span of centuries Latin merged male and neuter into one, leaving French with male/female, whereas in the Germanic languages usually male and female merged into a common gender, leaving common/neuter. But apart from that, it's exactly the same phenomenon.


Since we're bringing up fun facts about Dutch capitalisation, when the first two letters are "IJ", which pronounced as a single letter (a vowel) in Dutch, then both are capitalised.


There are (a lot of) people who consider the 'ij' to be a single letter. When asked whether that means Dutch has 27 letters in the alphabet or the 'y' is not a Dutch letter, the discussion becomes very confused. Best explanation is probably that the 'ij' is a letter that's not in the alphabet, or it is, but shares the 25th spot in the alphabet with the 'y'. But it is still a different letter, because "symbool" and "royaal" are also valid Dutch words. The situation isn't helped by the fact that some names and words that currently contain an 'ij' used to contain an 'y'.

(Personally I think it's two letters, but there are very serious sources, including a major encyclopedia as well as primary schools, that disagree. In games and puzzles it's also usually considered to be a single letter.)

https://en.wikipedia.org/wiki/IJ_%28digraph%29


According to Medium. At the top of the article another name is given.


At the bottom it says:

> This article was written by Reinier Kist and originally appeared in Dutch in NRC on August 3rd, 2020. It was translated into English by Hans de Zwart.

In the original source: https://archive.is/g6H0F it is also "De Zwart" when without the first name, and I would reasonably trust NRC to get it right.


I would hope it would be even more reasonable to expect Hans de Zwart to get it right!


The Dutch language version of the article was written by Reinier Kist but the English version was translated by Hans de Zwart.


Note that this sentence (and the whole article) contains the same mis-capitalization the articles complains Facebook is doing. It is most likely an aggressive final pass of spellcheck or auto-correct that wasn't re-incorrected before publishing, but really ironic all the same.


No, if you only write the last name in isolation then you should capitalize the ‘D’. So it’s Mr De Zwart, but Hans de Zwart.


Honestly, this sounds less like "falsehoods programmers believe about names", and more like "natural language processing is terrible". A real name policy is unconscionable, of course, but "Found 1 sheeps."-isms on the display side are only a serious problem (rather than a nuisance) to the extent that they trigger (possibly latent) serious problems in something else.


I'm not a web developer but whenever this kind of thing happens I just wonder why there isn't a single standard library in every common web language to deal with this and if there is, why it's not being used more often.


Do you really need a library for that? Why can't you leave the name as it was entered by the user, without making further assumptions?


Because people will abuse everything.

Whether it's curse words or porn websites or whatever.


> Whether it's curse words

I guess most plain english words would be curse words in other languages. Conversely, many plain words and names in other languages will be incorrectly interpreted as english curse words. There's no way to avoid this. Forbidding english curse words would be extremely offensive to people whose name coincides with those words.


And these are discussions to be had when developing, including what to do when someone does abuse the lack of curse word filters. Because people WILL abuse it.

The same discussion will also have to talk about whether Mr. SomepornwebsiteDOTcom or Mrs. Hitlerdidnothingwrong are real people and what to do about it.


Making assumptions about name formatting doesn't stop name fields from being abused in those ways.


Names are too complicated and too personal.

The issue is that many developers have an urge to do some 'clever' processing on them when, really, my conclusion is that they should be left alone. Just sanitise them for security purpose and that's it. This is a typical 'less is more' scenario.

The best person to write a specific name as it should be is the person the name belongs to, so just let them do it.


> The best person to write a specific name as it should be is the person the name belongs to, so just let them do it.

It's inconvenient to have the person you are writing about copyedit all of your writing. How would that even work?

And if you don't allow them to do that, how do you account for the situation here where Mr de Zwart's name is capitalized differently depending on context? I'm not even sure if I have it right here.


In this case, the problem arises, because they are using Mr de Zwart [sic], but can this not by avoided by simply using the full name? Mr Thomas de Zwart?

Previously I wasn't a big fan of this style, but it has grown on me, and I see it now as similar using "they" instead of he/she for an unspecified person. Why assume anything about the name, when you can reuse it verbatim when needed?


This seems like a reasonable try but there are some problems here too. What if you're quoting someone else who called him (in speech) Mr de Zwart? Or when you find out that it's correct to use the Emperor of Japan's full name in some circumstances, but extremely insulting in another?

I think at some point you abandon looking for a general solution, and employ an expert editor who can advise you on the right style. If you're writing for a small town newspaper far away from the Netherlands and get this wrong because you don't have that expert editor, that's OK, you tried. But if you're Facebook and you have 10 million Dutch customers and you get this wrong setting the policy for the most prominent few words on the whole site, you can afford to have better standards than that.


Why do you need a library. Name is an open input field. As long as the inputs are sanitized, you should just accept whatever is typed there.


Depends on the language and what you'll use the name for.

Some names change capitalisation when the last name is used by itself vs full name for example.

In some languages a name affect the words around it in a sentence (for a random example, an "o" in Spanish (or) becomes "u" when the next word starts by the "o" sound -> "Carlos o Maria", but "Carlos u Oscar".

In general, for a sufficiently large and well-localised application you will need to modify or parse the name at some point. Not sure that a library can do that properly though.


I agree with you, however these are small issues compared to having arbitrary requirements for names which create real life problems.


There is no need for a libaray, there is just need for a consistent standard.

Meaning, if all programms would use UTF-8, fine. But they don't. But character -encoding is somewhat complicated, because there are a lot of languages out there. Some who even need more than 2 bytes per character, so you get special cases, which break on another system etc.


Being in the public sector, doing our 3rd party integration and being from Denmark gets me the joy of having to deal with a lot of relatively different APIs of very varied quality and æøå in names.

Sometimes when people think they are using UTF-8, they aren’t. I’m not sure if that’s because they are incompetent or if their tooling lies to them, but I’ve gone through soooooo many weird encodings over the years trying to match what was specified as standard UTF-8.


Most programs use something that is compatible with UTF-8. And UTF-8 is also backwards compatible with ASCII which is why A-z usually works. I actually think UTF-16 is the most common. The issues start to arise when three or more code points/bytes are combined into one character/glyph.


UTF-8 exists, just not recognized its necessity where it’s needed ʕ´•ᴥ•`ʔ


I got this problem with several payment processors as the allowed name and name on my credit/debit cards never match. After a few years, I changed(modified) my name.


> which is especially ironic as you are usually explicitly requested to provide the name as stated in the passport.

Mine either doesn't go through, or goes through and is silently trimmed down to whatever amount of chars they support.


There are actually a whole series of algorithms used to help match the name on your passport with the name in the airline reservation system, due to absolute mayhem when dealing with asian names for example


People already started having emojis in their name, having hyphen is just a minor annoyance, and I'm willing to bet it being dropped does not change anything for you




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: