Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Lack of real string type. String manipulation is a pain and you have to allocate everything by yourself which result in inefficient and dangerous code.


Any code can be inefficient if performance wasn't a concern while writing it. I'd say in that case C string manipulation is more efficient by design because you see how many times you are copying something around. When things are abstracted away in something like std::string then you can wind up with code that creates multiple unnecessary copies by accident (e.g. forgetting a '&' to take a reference). Yes, it is more dangerous, but that can be a tradeoff with performance. For example, assume you have a std::hash_map<std::string, int>. There is no way to insert into this without one string copy (C++11 changes this to make zero-copy possible).


You're imagining a false dichotomy. Having a proper string type doesn't stop you from representing strings as pointers to null-terminated buffers when you need it for the flexibility. Meanwhile, more or less encouraging all software written in the language to use an unsafe idiom (so unsafe that the library itself includes functions that should never be used, like gets()), definitely is responsible for a huge amount of harm.


Well what are your thoughts on the example I gave for hash_map? There is always a tradeoff hidden away somewhere. I guess I'd like to see an example of a concrete string type that doesn't incur performance penalties when used in more complex structures. C++11 move semantics / emplace() on containers will fix some of the performance issues with std::string, but support for that everywhere is a ways off. But in C, you'd be left with something like glib's GString which isn't more than API over a struct.


I think your example is irrelevant - to the general case, which is where a string type is useful. If you need to avoid copies that badly in a specific case, don't use std::string or anything like it.

For example, I know of one compiler architecture which scanned strings from the source, all the while calculating a hash, and basically interned the string (turning it into an index per unique string) without ever actually copying it. Thereafter, the program used the index (an integer) to represent the string, making for fast lookups and comparisons.


I guess I just take issue with saying the "general case" usage of strings does not have to avoid copies. That is how we generally wind up with slow, bloated software.


I don't think most software is particularly slow or bloated. It's been quite some time since I thought to myself, "gee, this software could do with being a lot faster", outside of games and video transcoders (and perhaps iTunes on Windows). On the other hand, a lot of software has buffer overflow vulnerabilities; I see a lot of crashes when input data is fuzzed or corrupted slightly.


Fast enough is a very recent development. Until 2005 and dual core, computers were not fast enough, especially computers running objective-C.


Most C++03 std::strings (including g++'s) are copy on write, so inserting into a hash_map will not copy the string, only update some reference counts.

(Pedanticness: Assuming a few things, like you don't have any references or pointers into the string).


Yes, it just so happens the STL I use switched from a ref counted imp to a "short string" optimized one where tiny strings are kept on the stack. Again, it was done for large scale tradeoffs in the app as a whole once memory usage was analyzed. So without the C++11 enhancements which allow any string imp to perform as best as possible, apps must be aware of how their particular STL works under the hood. (or at least my apps do :))


Shouldn't string manipulation be more efficient if you have to everything by hand? Besides, using a recursive memory allocator takes away most of the pain and it's more efficient than a garbage collector.


It's quite easy to overuse strlen or strcat or otherwise turn an O(n) algorithm into O(n^2). But the efficiency I'd be more concerned with is programmer efficiency finding and fixing security bugs caused by off-by-one buffer lengths and input length checking, and the like (e.g. copying k characters into a char buf[k]).


surely every time you re-invent an algorithm you make it more efficient? part of the problem is that there isn't a standard of "maximized efficiency" algorithms for programmers to use, so anything they come up with could be better or worse and for no real reason ..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: