Going byte by byte is useless. You can't do anything with a single byte of a uni... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		dahfizz on March 26, 2021 \| parent \| context \| favorite \| on: String length functions for single emoji character... Going byte by byte is useless. You can't do anything with a single byte of a unicode codepoint (unless, by luck, the codepoint is encoded in a single byte). Codepoint is the smallest useful unit of a unicode string. It is a character, and you can do all the character things with it. If you wanted to implement a toUpper() function for example, you would want to iterate over all the codepoints.

masklinn on March 26, 2021 [–]

> If you wanted to implement a toUpper() function for example, you would want to iterate over all the codepoints.

Nope. In order to deal with special casings you will have to span multiple codepoints, at which point it's no more work with whatever the code units are.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact