Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This behavior is documented here: http://php.net/manual/en/language.operators.comparison.php

  If you compare a number with a string or the comparison
  involves numerical strings, then each string is converted
  to a number and the comparison performed numerically.


Ah, an obscure point of absurdity which utterly kills my pending interest in the language. If this sort of thing exists under the hood, revealed only by a detailed analysis of the specification, what other nonsense is there? Going so far as analyzing a string to determine whether it consists entirely of numbers for the non-sequitur process of then and only then converting it to what it isn't for logical evaluation is working pretty hard to do something counter-intuitive; might be tolerable if it actually preserved all digits, but not only does it work hard to convert a string to an integer, it then converts large integers in to floating-point values - not just one, but two layers of explicitly undesired and unnecessary and unreasonable typecasting.

I'm currently working with barcodes: numerical strings from 6 to 55 digits. In no way can I risk having one barcode be evaluated as equal to a literally different barcode just because the symbols in that string just happen to exhibit a passing resemblance to data of a different type.

Again, it's not just that it has loose typing. It's that it's taking what is OBVIOUSLY a string, converting it to an integer, THEN converting it to yet another data type which imposes data loss.

Intolerable for real-world use. A toy language. Alas, PHP, we hardly knew you...

ETA: Oh, I'd love to know the justification for the downvoting.


> Going so far as analyzing a string to determine whether it consists entirely of numbers

I was about to give an outraged reply that, if PHP is like Perl, then it doesn't scan the string afresh, just keeps a flag indicating whether or not it thinks a string is numeric. However, it turns out that's not true at all. `Perl_looks_like_number`, defined in `sv.c`, calls `Perl_grok_number`, defined beginning on l. 577 (as of v5.14.2) in `numeric.c`, which (after some book-keeping) does this:

    if (s == send) {
      return 0;
    } else if (*s == '-') {
      s++;
      numtype = IS_NUMBER_NEG;
    }
    else if (*s == '+')
    s++;

    if (s == send)
      return 0;

    if (isDIGIT(*s)) {
      UV value = *s - '0';
      if (++s < send) {
        int digit = *s - '0';
        if (digit >= 0 && digit <= 9) {
          value = value * 10 + digit;
          if (++s < send) {
            digit = *s - '0';
            if (digit >= 0 && digit <= 9) {
              value = value * 10 + digit;
              if (++s < send) {
                digit = *s - '0';
                if (digit >= 0 && digit <= 9) {
                  value = value * 10 + digit;
and goes on and on and on and on in the same vein. Sheesh! (I didn't forget to close that last brace; the next line is de-dented, but that seems to be a mistake.)


Perl does cache whether an SV contains something usable as an integer (the IOK flag) or a floating point number (the NOK flag). That's why you almost never see `looks_like_number` on its own and always called after using one of the appropriate flag checking macros.


"Intolerable for real-world use. A toy language."

Seems like a silly thing to say. I wouldn't write avionics software with it, but there are a billion websites demonstrating that it's pretty decent for real-world use. At least as good as any other language, I'd guess.


You could use strcmp like you're supposed to I suppose, or ===.


The language could not throw away data on an obscure whim.


It does not throw it away on an obscure whim. It's clearly documented, and well known. === is pretty much the standard.

Your lack of knowledge does not make it obscure.


> It does not throw it away on an obscure whim.

So you could have predicted this yesterday? Just because it's codified somewhere, it doesn't make it clear, or anything other than a whim, or a product of circumstances, at best. That's not how languages should be defined, even if PHP clearly demonstrates that they can end up that way by chance.

Rasmus' lack of foresight does not make it reasonable.


Yes, you could have predicted this yesterday.

In fact, this behavior has been documented explicitly for a year and a half: http://web.archive.org/web/20100808122711/http://www.php.net...

Earlier versions have said the comparison converts the numbers to integers though, which may be incorrect, and misleading if it was. Did PHP not convert float-like strings to floats in, eg, 2009? http://web.archive.org/web/20091024233139/http://www.php.net...


Why, it does indeed say, a link or two down from there, that strings will be coerced to float: http://web.archive.org/web/20091024234517/http://www.php.net...

The point, however, is that you shouldn't pepper your language with operations which have consequences as hard to foresee as this with no good reason, and I really don't think that saving yourself some type conversions here and there would do.


"no good reason" is entirely subjective, though. If your purpose is to make the language simpler to newcomers, implicit conversions everywhere are a great way to get things done. And the popularity of PHP (especially for new-to-programming people) heavily supports that they made the correct decision to work well for that market.

The same kind of logic is used to make `false == ""` true. Or any 'falsy' language. If you want strictly typed behavior, yes, it's stupid to do that. If you don't, then it makes some things simpler, at the expense of more edge cases that are unlikely to happen - note that this bug was reported in 2011, and people are acting like it's a new thing. Because it comes up so rarely that, while it technically exists, many people never encounter it.


> "no good reason" is entirely subjective, though. If your purpose is to make the language simpler to newcomers, implicit conversions everywhere are a great way to get things done. And the popularity of PHP (especially for new-to-programming people) heavily supports that they made the correct decision to work well for that market.

You are right, in a way. Sure, it may attract and retain more newcomers, but that's like saying that tobacco is "teenager friendly". I think it's not beginner friendly at all if you must have years of experience to avoid the innumerable pitfalls which PHP lays for you all over the place, learning, e.g. the range of Integers in PHP, which defines when a string will be either a float or an int, or that you should actually use strcmp.

In Python, Ruby, or heck, Haskell, you'd just have to do == and there would be no surprises.


I agree entirely, but we're thinking like programmers. Grab someone who's never programmed at all and ask them if `123 is equal to "123"`.

This essentially breaks down to the top-down vs bottom-up education style debate. You can learn the gritty details and get caught up in minor details that may not matter in other languages, or learn how to do something, and get tripped up by the details in other languages. Similarly, we could teach kids abstract algebra, or basic +-*/ and then over-simplify when they try to divide by zero.

Neither is ideal, both have useful traits and problems, so we have to pick one. Or try to come up with something radically different.

edit: to ask it another way: if PHP is a massively-popular gateway drug to the world of programming, but it gives some people horrible flashbacks for the rest of their lives, do you want to make it illegal and close the door to a huge number of people?


You've never used floats, have you?


Yes, I have, thank you. How's the health?

Oh, and "everyone knows that PHP lights the upper-rightmost pixel in you screen purple and will crash if there's no screen" would not, in fact, justify such a thing.


> Yes, I have, thank you.

You must hate programming then.

    9223372036854775807.0 == 9223372036854775808


Alright, I'll spell it out for you: the behaviour may be what you'd expect from floating-point comparisons, but it doesn't have to be a floating point comparison in the first place.


No, it doesn't. Language designers make lots of decisions that end up being silly. But they make them. In PHP, if it looks like a number, it will get treated like a number when being compared via ==. It's a simple, well-established, fundamental rule.


Sadly, while '===' is the quick fix, you then have to litter your code with type casting operators if you're comparing numbers, particularly those sourced from, say, a database, where everything is returned as a string. Or from GET or POST data, where everything is a string.

This tripped me up when I was trying to compare two numbers, one of which was the result of a COUNT query via PDO. Of course, that COUNT result was a string.

I suppose if you worked entirely with strings it's alright. Or it wouldn't be so bad if you could make the reasonable assumption that functions returned appropriately typed data.


It does what it's intended to do. Use the methods you are supposed to use. You're just arguing for the sake of arguing now, or you just don't know what you are talking about at all.


> Intolerable for real-world use. A toy language. Alas, PHP, we hardly knew you...

I'm sorry, I would like that to be true, but programmers rarely are half as smart as they think they are. We have many more years of PHP and its resulting insanity ahead of us.


Downvoting because you're railing against a language without understanding it. That, and your C++ comment above, make you sound like the programmer version of internet tough guy. Whatever your real life skills may be, it certainly sounds like a lot of posturing.


Just do the /bin/sh thing and prepend "x" on the strings before comparing them. :)


"Intolerable for real-world use. A toy language. Alas, PHP, we hardly knew you..."

Have you been out on the internet the past decade? Do you have a tendency to make extremities of things and trying to stand the needle on its tip?


It throws away data on an obtuse, obscurely-documented, whim.

Making extremes? My medical application would fail FDA approval in minutes if ported to PHP precisely because of this issue.


All languages have their idiosyncrasies. You can pick out some obscure aspect of any language and say $LANG sucks.

Btw, PHP's behavior doesn't totally make sense to me either. But I'm willing to assume that its users and designers have thought this through and it makes sense for PHP's intended use cases, because I don't know PHP.

Javascript (which most people on HN seem to like) also has similar issues (null vs. undefined, == and === etc). It got so bad that "Javascript: the good parts" had to be written to define a de-facto sane subset of the language. People are actually writing in Coffeescript (in part) to avoid Javascript's pitfalls.

YMMV.


Your medical application should probably be using === for comparisons. I'm not defending PHP's language design, but I think it's pretty well known among professional PHP developers that you should almost always use === and avoid implicit type conversions.


How does one ensure that they do not accidentally use ==?


Got me. How do old-school C programmers ensure they don't accidentally use = when they mean == ?

There are actually some decent commercial PHP IDEs, believe it or not. I wouldn't be surprised if some of them are able to Warn on loose equality comparisons. I don't have much direct experience with them though.


How do old-school C programmers ensure they don't accidentally use = when they mean == ?

By making the constant the expression's lvalue. But they don't have to do this anymore; gcc warns when you accidentally use = instead of == now.


"obscurely-documented"? That seems pretty clear to me, given that it's a fundamental feature of the language, and documented (floating point problems too) in an obvious location.


Would your medical application fail FDA approval if you used a language like C or C++ that contained strncmp? Because that function throws away data, too.


There's a difference between strncmp existing precisely so you can specify how many characters to compare, vs. throwing away trailing characters in a string just because, by sheer chance, it contains only numerics.


Who ever thought of this? I can understand "don't use == for strings", or implicit conversion when one of the arguments is a number, but this is extremely sneaky as it will only behave this way with two numerically-looking strings. Ouch. Why does it even do that check, wasting cycles beyond a normal string comparison? It looks like an elaborate and cruel trap for novice programmers.

Edit: I know == is not a string comparison. But you'd expect it to fail in a predictable way when passed strings that are not parseable as numbers, instead of trying to fall back on a string comparison so that people get the wrong idea.


PHP gets input from various places, and one might want "1.00" from a form or URL to equal "1.0" read from a cookie or via database adapter that stringifies everything.


If only we had some sort of way of explicitly telling our computers that we wanted to to convert a sequence of characters into a number.


What I don't understand is why all these weak-typed languages don't optionally allow one to strong type a variable.


Weak typing does not require bonkers coercion of types.


That's like saying why doesn't someone create a combination hammer and screwdriver. A proper tool for every job.


And as much as I love python and javascript, I'd really like to use a screwdriver now and then rather than leave hammer marks everywhere, especially if they are being used for something beyond formatting HTML.

This hamdriver you speak of: tell me more.


This thing already exists, it is called "strong typing" (in C++, Go, Java etc).


Yes, that was the joke.


Not defending the bad design decision, but string comparison in PHP is strcmp, not ==.


IMO this conversion should fail if the number represented is not valid, or fall back to arbitrary precision math (GMP library for instance), instead of silently making such a questionable conversion.

I generally avoid exceptions/error_levels in all languages but this is probably a good cause for them, in order to keep the rest backwards compatible.


How do you test that it isn't valid? I think you may be underestimating the difficulty in predicting whether a particular decimal number can be accurately represented as a floating point type. It may be non-obvious, but the representation of precise numbers changes depending on the number base. For example, in base 10, we can't precisely represent 1/3. In base 3, we can (0.1). In base 2, we can't precisely represent 0.1, or 1/10. A simple number such as 0.1 has no precise representation in base 2.

In this case in php, the truncation happens due to loss of precision in the mantissa of the double precision float. But there are so many other ways to lose precision, I don't think it's reasonable to ask a language to attempt to account for them.

This is why languages should have clear rules about when type conversion occurs, and allow the user to prevent it when it isn't desirable.

edit: in fact, amusingly, php seems to be doing some non-standard stuff with its floats. I was going to make a point about how you can't determine if a double is a "correct" representation of a string decimal, but in mocking an example I discovered something odd. Check this out:

This is what one should expect:

$ ruby -e'puts "%5.25f" % 0.1'

0.1000000000000000055511151

$ perl -wle'printf "%5.25f\n", 0.1'

0.1000000000000000055511151

But in php:

$ php -r'printf("%5.25f\n", 0.1);'

0.1000000000000000000000000

$ php -r'printf("%5.25f\n", "0.1");'

0.1000000000000000000000000

Is php changing the type conversion? Or not using double precision at all?


The idea of casting everything to float is just wrong; a string of digits without a dot should be converted to a (big) integer, without any loss of precision. Anyway I just can't fathom how anyone could think weak typing is a good idea; it might make some superficial things "easier", but you'll soon shoot yourself in the foot with it.


Precision in floats is accepted as a fact of life. Converting exceedingly big INT string literals to bigger float types is a hack to win some naive benchmarks against languages doing native proper arbitrary precision. This shouldn't have happened in the first place, but since it's there and backwards compatibility is important, it could be shown in a warning error_level[1] that the conversion happened, so the user could at least check that and hack a solution together.

[1] This doesn't really happen in PHP, but you have $php_errormsg that can be set without stopping execution (as happens with some errors/warnings when error_level is not set to E_STRICT, and below that depending on the error). This errors could be triggered in a new level, let's say "E_PEDANTIC".


You just reiterated my point. This is precisely why your original suggestion of failing an "invalid" conversion is untenable. ALL conversions lack precision -- there is no such thing as an "invalid" conversion.


Nope, there are conventions.

We have two distinct problems here:

- strings converting to numbers without there being any number on any side. "Peculiar" of PHP but easy to circumvent using string comparison. IMO belongs in PHP4 but not at all in PHP5, which is an attempt at a "general-purpose" language. To be frank, I thought PHP4 made more sense because it was 1st of class at what it did, while PHP5 falls short to a number of languages in basically everything.

- automatic integer-to-float comparison to accomodate bigger integers. A horrible hack to squeeze a little extra performance in naive benchmarks in computers with no native 64 bit integer support. This really makes no sense whatsoever now and may have had some partial justification in the early 90s, prior to PHP4 even.

Both ideas are terrible and pretty much unique to PHP of all popular languages.

This is not a philosophical debate about typing styles or the existence of perfect type conversions. PHP's problems in this regard are relics from a dubious past.


- automatic integer-to-float comparison to accommodate bigger integers. A horrible hack to squeeze a little extra performance in naive benchmarks in computers with no native 64 bit integer support. This really makes no sense whatsoever now and may have had some partial justification in the early 90s, prior to PHP4 even.

No, this is not unique to php. Many popular, comparable languages perform an int -> float conversion. For example, Perl:

$ perl -wle'print "20938410923849012834092834" + 0 if "20938410923849012834092834" == "20938410923849012834092835"'

2.0938410923849e+25

- This is not a philosophical debate about typing styles or the existence of perfect type conversions. PHP's problems in this regard are relics from a dubious past.

Conversion from string -> number, and loose numeric types which auto-convert to float are near universal in loosely typed languages, out of necessity -- if such a scheme doesn't work consistently it can't be used at all. This brings me back to my point. You said "IMO this conversion should fail if the number represented is not valid, or fall back to arbitrary precision math". My response is that you cannot provide such a rule on the basis of "is it valid" because there is no such thing as a "valid" type conversion -- ALL have precision loss. It is inherent in the datatype. When I said "you may be underestimating the difficulty in predicting whether a particular decimal number can be accurately represented as a floating point type" you should perhaps read that as "you cannot do this, it is not possible".

Instead you might suggest that no loose conversion, no loose typing be permitted in a language design -- and I would agree wholeheartedly. But your suggestion that this be handled on a case-by-case basis depending on the numeric value is fundamentally unworkable. Big integers are not the only area this type of problem presents.


Perl5 is old enough so this behaviour has a niche. Possibly even PHP4 is old enough for that. But PHP5 was born when both 64 bit ints and good, open arbitrary precision libraries were available and very fast.

This, now, where it's being used, is absurd. There is no two ways to that. And this doesn't happen elsewhere to this extent.

I will leave you the last word though. Cheerio.


Testing validity should be pretty easy. Remove insignificant zeroes from both ends, then only accept the conversion when it is precisely correct. This can be done by simply converting to double and then back and seeing if you get the same thing. If there's any difference, it's not sufficiently accurate, bail out.


You're missing the point -- there is no such thing as a precisely correct conversion to a floating point number.

Your suggestion would make type conversion utterly unusable as it would fail seemingly randomly -- for example on simple numbers such as "0.1"


Sure there is such a thing. 0.25 can be precisely converted to float. You're right that such a thing would fail a lot, but that doesn't mean the goal is impossible, merely that achieving it is not very useful.


I did not say it was "impossible," I said it would be "utterly unusable."

I am happy to see you agree with me.


What's the low-level cost to determine whether a given string is "numerical" or not? Also, would "001" be considered a numerical string?

This reminds me of Excel-like programs that by default, automatically detect (and convert) fields that appear to be dates/strings...often to catastrophic effect.


You just hit upon by far my least favorite bug in Excel. Any integers around I think 40000, which are quite easy to come upon in various datasets, are automatically "detected" as a date. It makes Excel very dangerous for reading CSV files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: