Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Going byte by byte is useless. You can't do anything with a single byte of a unicode codepoint (unless, by luck, the codepoint is encoded in a single byte).

Codepoint is the smallest useful unit of a unicode string. It is a character, and you can do all the character things with it.

If you wanted to implement a toUpper() function for example, you would want to iterate over all the codepoints.



> If you wanted to implement a toUpper() function for example, you would want to iterate over all the codepoints.

Nope. In order to deal with special casings you will have to span multiple codepoints, at which point it's no more work with whatever the code units are.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: