Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Beating up on this example some more:

Multi-lingual support seems really really hard, especially in six minutes. I would think most people would need to look at technical (i.e., unicode) and linguistic references to get it right.

Should does the ligature f l match itself, or the ASCII constituents 'f' and 'l'? How about combining vs. pre-composed characters? Some Chinese characters show up in other languages (Japanese, Korean) and are sometimes split between Hong Kong/Taiwan/Mainland language tags too. In fact, there's a mess of work devoted to this ("Unihan" https://www.unicode.org/versions/Unicode13.0.0/ch18.pdf). Having figured out what you can do, you then need to decide what you ought to do. Not being a Chinese-speaker, I have no idea which options would seem natural....

In fact, having written this all out, there's no way someone "solved" it from scratch in six minutes. It would be a great discussion question though....



    for (int i = 0 to text.length() - substring.length()) {
      boolean found = true;
      for (int j = 0 to substring.length()) {
        if (text.charAt(i) != substring.charAt(j)) {
          found = false; break;
  }
      if (found) return true;
    }
We’re not taking rocket science here. This code already properly handles surrogates and Chinese characters. The question about characters that can be written in two different ways should only be raised as a second level, once the first implementation is done.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: