Thursday, December 23, 2010

Problems with PHP

PHP is a nice language for some tasks. Lots of good software uses it. No other language makes it so convenient to mix code and html, which is great for lone web developers who are also programmers. I've found it pretty useful for running my site, mainly because I can so easily put code in the middle of my content, and keep the overall per-page authoring overhead down. However, from a pure programming or information theory standpoint, it's got some serious problems:
  1. Namespaces don't exist at all. (this is similar to keeping all your files in one directory) There have been discussions about adding namespaces, but the proposed separator is \? because "there isn't any other character left"...
  2. Exceptions didn't exist until PHP5, and aren't implemented in a useful "deep" fashion.
  3. Built-in and library APIs are a disorganized mess.
    1. There are thousands of symbols in the PHP namespace. Cleaner languages only have a few dozen. "Everything is built in" just means it has way too many functions in its core, especially since many are minor variations of each other.
    2. No consistent naming convention is used. Some functions are verb_noun() and others are noun_verb(). Some are underscore_separated, while others are CamelCase or runtogether. Some are prefixed_byModuleName, and others use a module_suffix_scheme. Some use "to" and others use "2". And if you take a random set of ten library functions, chances are half a dozen different conventions will be included.
    3. PHP tends to use a lot of similar functions, instead of just one, powerful one. For example, PHP has sort(), arsort(), asort(), ksort(), natsort(), natcasesort(), rsort(), usort(), array_multisort(), and uksort(). For comparison, Python covers the functionality of all of those with list.sort().
    4. PHP includes lots of cruft or bloat. Do we really need a built-in str_rot13() function? Also, a lot of other built-ins are just trivial combinations of each other. Users don't really need case-insensitive variants of every string function, since there is already a strtolower().
    5. Many parts of PHP either deviate from standards, or otherwise don't do what users would expect.
      For example, exec() returns the last line of text output from a program. Why not return the program's return value, like every other language does? And further, when would it ever be useful to get only the last line of output?
      Another example: PHP uses non-standard date format characters.
  4. The language was generally thrown together without any coherent design, accreted in a messy and complex fashion.
  5. Functions...
    1. Functions cannot be redefined. If I want a set of includes which all use the same interface, I can only use one of them per page load -- there's no way to include a then call a.display() then include b and execute b.display(). I also cannot transparently wrap existing functions by renaming/replacing them.
    2. Functions cannot be nested. (actually, they can, but it has the same effect as if they were not. All functions are global, period.)
    3. Anonymous functions (lambda) don't exist. create_function() is not the same thing. Given two strings, it compiles them into code, binds the code to a new global function, and returns the new function name as a string.
      $foo = create_function('$x', 'echo "hello $x!";');
      $bar = "\0lambda_1";
      $bar("bar"); // sometimes prints "hello bar!", sometimes fails
      Note that the number after "\0lambda_" is not predictable. It starts at one and increments each time create_function is called. The number keeps incrementing as long as the web server process is running, and the counter is different in each server process. The memory for these new global functions is not freed, either, so you can easily run out of memory if you try to make lambdas in a loop.
    4. Functions are case insensitive.
  6. No "doc strings". Documentation must either be maintained separately from the code, or by (rather finicky) 3rd-party code-level documentation interpreters.
  7. The documentation...
    • ... is often incorrect or incomplete, and finding relevant information tends to require reading pages and pages of disorganized user-contributed notes (which are incorrect even more often) to find the details the documentation left out. Sometimes really important details are left out, such as "this function is deprecated -- use foo() instead".
    • ... is (as of PHP 5.1.2) not included with the source, nor typically installed along with the binary packages. Downloadable documentation is available, but does not match the docs on PHP.net. Specifically, it leaves out all the user-contributed notes, which are important because of reasons mentioned above.
    • ... is not built in. You can't just point an introspection tool at a PHP module and get usage information from it.
    These issues are important because it's not very feasible to use PHP without referring to the documentation frequently. There is very little internal consistency, and even less consistency between modules, so you'll probably spend a lot of time looking through the docs. Simply guessing how things work, based on conventions, usually doesn't work in PHP.
  8. Default to pass-by-value. (php5 now defaults to reference, for objects, though I'm not sure if it's "real" references or reference-by-name)
  9. Default error behavior is to send cryptic messages to the browser, mid-page, instead of logging a traceback for the developer to investigate.
  10. Many errors are silent.
    For example, accessing a nonexistent variable simply returns nothing. Whether this is a Bad Thing is debatable (I believe it's bad), but it can nevertheless interact badly with some other aspects of PHP -- such as the inconsistent case sensitivity (variables are sensitive, but functions are not):
    function FUNC() { return 3; }
    $VAR = 3;
    print func(); // produces "3"
    print $var; // produces nothing
  11. The combination list/hash "array" type causes problems by oversimplifying, often resulting in unexpected/unintuitive behavior.

    For example, PHP's weak type system interferes with hash keys:
    CodeResult
    $a = array("1" => "foo", 1 => "bar");
    echo $a[1], " ", $a["1"], "<br />\n";
    print_r($a);
    bar bar
    Array ( [1] => bar )
    After a little experimentation, I see that hash keys cannot be functions, classes, floats, or strings which look like integers. There are likely other invalid types as well. The only usable key types I've found so far are integers, and strings that do not parse as integers. (note that the parsing used here is different than the automatic str-to-int coercion used for the "+" operator) .
  12. Awkward / overlapping names can exist... foo and $foo are completely unrelated.
  13. Magic quotes (and related mis-features) make data input needlessly complex and error-prone. Instead of fixing vulnerabilities (such as malformed SQL query exploits), PHP tries to mangle your data to avoid triggering known flaws.
  14. The server-wide settings in PHP's configuration add a lot of complexity to app code, requiring all sorts of checks and workarounds. Instead of simplifying or shortening code (which the features are supposed to do), they actually make the code longer and more complex, since it must check to make sure each setting has the right value and handle situations when the expected values aren't there.
  15. PHP's database libraries are among the worst in any language. This is partially due to a lack of any consistent API for different databases, but mostly because the database interaction model in PHP is broken. The SQL injection issues in PHP deserve particular attention.

    How can it be that hard for web developers to check data before it is submitted? I wouldn't imagine trusting the data that an anonymous user can enter into my website.. so maybe I'm just trained to check data. Of course, I'm also glad I use MySQL with PHP where a simple mysql_real_escape_string can prevent any popular SQL Injection attempt.


    You're glad that you use pretty much the only langauge where this is not done automatically for you, but which instead forces you to use a function with a name like mysql_real_escape_string()? And that actually has a similarly-named function without the "_real_" that doesn't do the job right? Just kidding with that other one, here's the real one!
  16. The performance is crippled for commercial reasons (zend). Free optimizers are available, but aren't default or standard.
  17. Bad recursion support. Browse bug 1901 for an example and some details. BTW, ever heard of tail recursion? They might have mentioned it in the "Intro to Computer Science" course.
  18. Not thread safe.
  19. No unicode support. It's planned for PHP 6 but that could be a long time away.
  20. Vague and unintuitive automatic coercion; "==" is unpredictable, and "===" does not solve all the problems caused by "==". According to the manual, "==" returns true if the operands are equal, and "===" returns true if the operands are equal and of the same type. But that's not entirely true. For example:
    Two different strings are equal... sometimes.
    "1e1" == "10" => True
    "1e1.0" == "10" => False
    So, they're "equal and of the same type", right?
    "1e1" === "10" => False

    Unexpected results:
    "1 two 3" == 1 => True
    1.0 === 1 => False
    "11111111111111111117" == "11111111111111111118" => True

    Equality is (apparently) not transitive:
    $a = "foo"; $b = 0; $c = "bar";
    $a == $b => True
    $b == $c => True
    $a == $c => False
    Further, the coercion rules change depending on what you're doing. The behavior for "==" is not the same as used for "+" or for making hash keys.
    "22 cream puffs" == "22 bullfrogs" => False
    "12 zombies" + "10 young ladies" + "bourbon" == "22 cream puffs" => True
    Even though math asserts that, if A minus B equals zero, then A must equal B, PHP disagrees:
    "bourbon" - "scotch" => 0
    "bourbon" == "scotch" => False
  21. Variable scoping is strange, inconsistent, and inconvenient -- particularly the notably unusual "global" scope which gave rise to kludges like "superglobal" or "autoglobal" as workarounds.

    Further, variables cannot be scoped beyond global or function-local.
  22. The mixture of PHP code with HTML markup tends to make code difficult to read. Readability is important.
  23. Various "features" cause very unusual behavior and add complexity. This tends to cause bugs for programmers who expect it to behave like other languages.
    For example, this will fail sporadically: Open a file. Write to it. Close it. Open it. Read from the file. To make this actually work, the programmer must A) know it will fail, B) have some clue why it fails, and C) call the correct function (clearstatcache()) before re-opening the file. Note that the online docs aren't much help -- searching for "cache" takes the viewer to the docs for cosh(), but returns nothing at all related to files or caches.
  24. It provides no way to log errors verbosely, but only display critical errors to the user. Further, some of the most critical errors (such as running out of memory) give absolutely no response to the user -- not even a blank page.
  25. Poor security, and poor response to security issues. This is a large and detailed topic, but regardless of whether it's caused by inexperienced programmers or by PHP itself, the amount of PHP-related exploits is rather high. And according to a PHP security insider, the effort is futile.
  26. Its object model is (still) very lacking, compared to other systems.
  27. Most of the development since v3 seems to be devoted to damage control, and dealing with earlier mistakes... not a good sign.
  28. In general, has a tendency to create more problems than it solves.
I would not recommend using PHP, except as a template language for HTML. It's very good at that, so long as you keep the complexity of related code down. It's more powerful and (IMHO) more convenient than strict template languages like TAL, but cannot compete with "normal" scripting languages like Python, Perl, Ruby, and Lisp. PHP is a language optimized for a purpose, at the expense of all other uses. It's very good at what it was originally designed for, but has become stretched way too far since then.

This is waxing philosophical, but in my experience, PHP has an uncomfortably low ceiling. Programming isn't just about putting one instruction after another; it's about building abstractions to better represent and solve problems. The more complex the problem, the higher the level of abstraction needed to solve it cleanly. With PHP, I often hit my head on its low ceiling of abstraction, and it seems to require a great deal more effort and discipline (than in other languages) to avoid ducking down into the details of implementation when I should be focusing on the upper-level design.

No comments:

Post a Comment