Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Code Like a Pythonista: Idiomatic Python (python.net)
237 points by ashishgandhi on Oct 22, 2011 | hide | past | favorite | 56 comments


One thing that has always bugged me about PEP8 is the advice about trying to keep line length to <80 characters. It seems a little dated.

I basically just completely ignore this part of the guidance, even though I know that in theory it makes it hard to edit code via a terminal. But this is not something we ever do in practice where I work - code on servers gets there only by being checked out of SVN.

I personally find it incredibly irritating when I see people who've split a function call over a couple of lines just to try and meet this criterion, e.g.:

  def __init__(self, first, second, third,
             fourth, fifth, sixth):
Note that if you're creating functions with very long signatures you're probably doing something wrong.

About the only time I make any kind of concession to the line length guidance is when I initialise a big dictionary, e.g.:

  my_dict = {
      ham: eggs,
      bar: quux,
      # Snip 10 lines of code
      rhubarb: custard
  }
(There are often better ways to do this too, but sometimes it's necessary for clarity to build a dictionary all at once like this.)

I'm perfectly happy to initialise dictionaries with a small number of keys just like this:

  my_dict = { ham: eggs, bar: quux, rhubarb: custard }


It's easier to read something quickly if your eyes don't have to travel far. This is why newspapers (remember them?) are written in multiple columns across the page.

This may be less of an issue with code, due to its structure, but I still find it much easier to read 80 column code than code that stretches across the screen.

If I'm reading code in an 80 column terminal I prefer the line breaks to come from the human that wrote it, to show and reveal structure, rather than having it arbitrarily broken at 80 columns by the editor.

Another reason is that most of your code won't stretch across the screen, but you need your window sized for the small amount of code that does, losing a bunch of screen estate to empty space.

But I don't worry too much about it.


Code is not prose. The layout rules that apply to a written article do not (necessarily) apply.

I prefer code that can be quickly scanned, and hard wrapping at 80 columns (IMO) doesn't facilitate this.

If someone doesn't like it, they can turn on text wrap.


"I prefer code that can be quickly scanned, and hard wrapping at 80 columns (IMO) doesn't facilitate this."

Then you shouldn't hard wrap at 80. Subject to any agreements with your team members. :)

I'm just speculating on why some people recommend to break at 80, which was originally the length of a physical punch card and so has nothing much to do with readability in and of itself. Fundamentally, 80 is likely just the most likely common denominator (MLCD).


This is an interesting bit of information. I sometimes find that an equation is harder to read after breaking it in many parts due to the 80 columns limit.

It seems there is a compromise to make between the readability of single statements versus the readability of the code as a whole.


Personal preference can confound any rule of thumb.

Cherry picked from here: https://duckduckgo.com/?q=readability+%22column+width%22

https://en.wikipedia.org/wiki/Column_%28typography%29

"For best legibility, typographic manuals suggest that columns should contain roughly 60 characters per line.[1] One formula suggests multiplying the point size of the font by 2 to reach how wide a column should be in picas[2] — in effect a column width of 24 ems. Following these guidelines usually results in multiple narrow columns being favored over a single wide column.[3]"

http://psychology.wikia.com/wiki/Readability

"Ease-of-reading is the result of the interaction between the text and the reader. In the reader, those features affecting readability are 1. prior knowledge, 2. reading skill, 3. interest, and 4. motivation. In the text, those features are 1. content, 2. style, 3. design, and 4. structure[1]. The design can include the medium, layout, illustrations, reading and navigation aids, typeface, and color. Correct use of type size, line spacing, column width, text-color-background contrast and white space make text easy to read. "


People seem disregard this rule as "dated" without actually providing much argument! The reason is that with a larger screen you can get more than one file on the screen at once, and the usability of that massively trumps any minor linebreaking issues.


Except if you're over 40 and have to use a large font. My font is set so that 80 characters is full-width, those who assume 132 characters and wrap mostly around 100 make for incredibly horrible code reading experience.

I also print out complex code fragments or read them on a tablet. In this format 80 characters or less is also quite excellent.

Finally, to I try limit to 76 characters. It permits you to do "> > " in e-mail conversations about the code without having it auto-wrap in mutt.


i love the 80char limit. it makes things nice in a terminal, it lets me tile lots of windows across my screen, and it prevents excessively clever but entirely unreadable one-liners.


The original reason may be dated, but there are other places where limiting line length is still helpful.

For example, I'll often have my editor open and docs open in a web browser right next to it.

It also helps when viewing side by side diffs.


Some people like to split windows vertically, shorter line help them, but splitting lines break greppability, and I am always wondering how to indent them properly.


I used to disregard this for the same reasons you are disregarding it. But at some point I realized that

(a) I prefer big font sizes (~13 pt regardless of with/without glasses)

(b) I like to have two columns of code next to each other.

(c) IDEs love to cramp up my work space with vertical side bars.

All that conspires to put horizontal space at a premium. I usually go for 80 chars now. Some languages are more verbose than others though. Cocoa at 80 chars just looks wrong, so in that case, I will take 100 chars.

Addendum: (d) Vim sucks at horizontal scrolling.


I dislike the 80 character limit, too. But having no limit at all would be a disaster for readability.

I believe a 100 or 120 character limit is a nice compromise.


The 80-char limit recommendation is a thing I regularly sin against; most of the time caused by a comment behind a statement.


I have just skimmed this so far, but it looks very good.

A lot of the Python scripts that I see at work look like either C programs or glorified batch files. I'll definitely point people to this when they are ready and willing to move on to the next level.

Well done.

Suggestion: one thing I didn't see mentioned is switching from "if s.find(c) == -1: ..." to "if c in s: ...". I see people do that a lot.


You just made my weekend. I've been doing Python for a while and am well acquainted with the "item in list" idiom but did not know it could be used for find. That str.find syntax has always bothered me.

Still, I had to test just to be sure:

    test_cases = [
        # haystack, needle, expect
        ('abcdefg', 'a', True),
        ('abcdefg', 'b', True),
        ('abcdefg', 'bcd', True),
        ('abcdefg', 'h', False),
    ]
    
    for haystack,needle,expect in test_cases:
        # find version
        is_found = haystack.find(needle) != -1
    
        # in version
        is_in = needle in haystack
    
        # confirm
        print haystack, needle, expect, '-->', (is_found, is_in)
        assert is_found == expect
        assert is_found == is_in
Passed! Thanks.


one thing I didn't see mentioned is switching from "if s.find(c) == -1: ..." to "if c in s: ...". I see people do that a lot.

I think this is covered implicitly by the two "Use in where possible" sections. Although as Tim Peter's The Zen of Python states, which is also quoted in the tutorial, "explicit is better than implicit." If people are doing that despite knowledge of the in keyword then I suppose that just proves Tim's point.


I don't know. I'd argue that

  if c not in s: ...
is more explicit (and readable) than

  if s.find(c) == -1: ...


Read from the perspective of someone very much still learning to code, this was excellent. I particularly valued the idiomatic comparisons to 'naive' implementations (the kind I would initially reach for). One of the later sections on lazy generation also opened up some new doors for me, and gave me a nice solution to a problem I've been having with a piece of code.


Is it really idiomatic to use intrinsic truth values (eg, using "if x" instead of "if x != 0")?

Here are my arguments against it:

- "Explicit is better than implicit."

- If there's need for a table as reference for what is intrinsically true and what is intrinsically false, then it's clearly not "elegant" enough for people to just understand from looking at the code.

- Conventions vary across languages. You're basically forcing anyone who looks at your code who's not used tons of Python (and who would otherwise completely understand the code) to look up that table.


- But readability counts. Having written a lot of Java and Obj-C lately, I honestly prefer the ability to write "if (foo && foo->bar && foo->bar->baz)" in the latter.

- Frankly, it's not a very complicated table. "Zero, empty, or none" summarizes it pretty well. http://docs.python.org/release/2.5.2/lib/truth.html

- Of course conventions vary across languages. If you're familiar with perl, awk, or ruby, some of these conventions will be familiar to you. If you're more familiar with Java, they'll feel odd and you'll feel an uncertainty using them.


In case there are Pythonistas here who will indulge a question: one thing I've never figured out how to do elegantly ("idiomatically", I suppose) is create a list of predetermined size initialized with a constant(usually zeros). I always end up with something kludgey like:

    x = [0 for i in xrange(0..100)]
which seems so roundabout it can't possibly be the right solution.


    x = [0] * 100


Just make sure that when you do this, you're not initializing it with mutable objects, e.g. lists.

    x = [[0]] * 2
    print x[0], x[1] # prints [0] [0]
    x[0].append(1)
    print x[0], x[1] # prints [0, 1] [0, 1]
That's because the objects inside the repeated list are simply references to the same object internally.


Thanks! At first I wondered about efficiency but it seems like I can allocate lists with tens of millions of elements this way without any spike in CPU so it is obviously implemented efficiently. I knew there would be a better way :-)


This is much better than my own comment - I never even knew about that syntax!

Is it efficient though?


My guess is that it's optimized, but I don't know. Semantics-wise, it's consistent with list addition.

    >>> [0] + [0] + [0]
    [0, 0, 0]
    >>> [0] * 3
    [0, 0, 0]


I just read, that syntax is sugar for the list.extend() method so it's still making copies of the list - I wouldn't use it for a list containing complex objects; but that is very idiomatic for simple use cases (like this one?).


It isn't syntax - help(list.__mul__)


Yeah, that is kludgy (you're creating two lists now!).

First thing I would do is ask why I need that; I've never come across a case where I needed a pre-built list of items all initialized to a common or empty value - if you're doing that, I would explore some sort of custom generative process (building the items of your list as you need them). But, just because I can't think of a good use case for this (where I couldn't use generative recursion or iteration) doesn't mean you haven't. Weigh what you are doing.

I would do this one of two ways: create a function that will produce an object that will generatively build the list (like xrange, but pass an "initializer" value, like myxrange(0, 100, initialize=0)). See the source for Python's xrange to do this it should be quite easy.

Or do this in a while loop (flat, easy to understand, and you aren't being too naughty by producing more than once list):

    ls  = []
    cnt = 0
    
    while cnt <= 100:
      ls.append(0)
      cnt += 1
    
Alternatively you can do the above with a for loop but you would have to iterate over a sequence (which means creating a list to do it); the while loop is going to be the most efficacious way of doing what you want I think, excluding building a custom xrange style implementation.


Actually [x for x in xrange(..)] will produce only one list. In Python 2.X, range() returns a list while xrange() an iterable.

If you just need an iterable, itertools.repeat(0, 100) will do the trick.


I suppose I didn't say it correctly, you're still maintaining two objects though (xrange the iterable, and the new list being built by the comprehension). The "optimization" I suppose is a moot point because Python's handling of lists is pretty solid and this isn't the 90's anymore, but at least with the while loop you've only got one list object and an integer (which is faster, I would think, to increment than it is to produce the next sequence in the iterable object?).

Anywho, splitting hairs. Goladus had the most helpful comment (I even learned something).


This is way premature optimization, and it's based on a lot of assumptions. Did you know that, on CPython, your integer increment creates a new object each time (outside of the range -5 to 255 or so, IIRC)? Now, xrange might do its own increment, so let's call it even on object creation.

If either xrange or list comprehensions are implemented in C instead of Python, do you still think your version will run quicker? What do you think the likelihood of either or both of these being the case is on your Python implementation? How many name lookups and function calls do you think each version does? Do you think you should find out before writing longer and more complicated code to attempt to out-perform it?

On my machine, your implementation performed ~3 times slower than the naive idiomatic "[0 for _ in xrange(100)]" and closer to 4 times slower when I bumped the list size up to 20000. And your version was ~32 times slower than "[0] * 100" and around 60 times slower when I bumped the list size up to 20000.

So please, don't optimize without measuring and instead just write idiomatic code the first time.

The code, for reference:

    def mk_list_1(size):
        ls  = []
        cnt = 0

        while cnt <= size:
            ls.append(0)
            cnt += 1

        return ls


    def mk_list_2(size):
        return [0 for i in xrange(size)]

    def mk_list_3(size):
        return [0] * size

    from timeit import timeit
    args = {
        "number":1000000,
        "setup":"from __main__ import mk_list_1, mk_list_2, mk_list_3"
    }
    print "Executing %i runs:" % args["number"]
    print "mk_list_1 took %i s" % timeit('mk_list_1(100)', **args)
    print "mk_list_2 took %i s" % timeit('mk_list_2(100)', **args)
    print "mk_list_3 took %i s" % timeit('mk_list_3(100)', **args)
Output on my machine:

  Executing 1000000 runs:
  mk_list_1 took 32 s
  mk_list_2 took 10 s
  mk_list_3 took 1 s


Fair enough, thanks for the comment.


Just a note: the optparse module is deprecated, in python 2.7+ argparse is preferred.


Doesn't really matter though because using argparse means an additional dependency on <2.7. In practice nobody really uses argparse atm.


It's worth knowing given the intent of the original post.


He had me up until the part where he wrote an algorithm to specifically leave out the Oxford Comma. :)

Kidding aside, this was a great read.


As he remarks, he doesn't take in account the corner cases. Suprisingly, or maybe not, his concatenation statement is similar to what I regularly use:

  def commalist(f, endword="and"):
      "Concatenate a list of strings in the form of 'Red, Yellow, Green and Blue'."
      if len(f) == 0:
          return ""
      elif len(f) == 1:
          return f[0]
      else:
          return ", ".join(f[:-1]) + " " + endword + " " + f[-1]


This might be completely trivial, but can someone explain to me how one Python idiom is not to mix tabs and spaces, and another is to align arguments over several lines when needed like this:

  def __init__(self, first, second, third,
               fourth, fifth, sixth):
I see similar examples elsewhere, and I don't see how you can achieve the alignment without mixing in spaces - at least in some cases. I, too, prefer to align arguments, but it looks bad in places like GitHub and Pastebin, when my Sublime Tex-written code gets uploaded.

Am I missing something here? The two just seem mutually exclusive in most cases (where the alignment position's required spaces % 4 != 0. Is the indentation done with just spaces then?


Yes, indentation is done just with spaces. Tabs are not used. Typically you set your text editor to have the tab key print 4 spaces.


Ah, there lies the rub. Thanks a bunch.


EDIT: Emacs is pretty intelligent when it comes to indenting Python code; it does not merely insert 4 spaces for a TAB, but it automatically aligns your code to the Python expression in your previous line!

I don't know which one is the "right" way, but I certainly prefer Emacs's default behaviour. Maybe I just didn't configure my Vim properly :-)

Example:

    In Emacs:
    def _five(one,two,
    ....,,,,..three,four):


    In Vim, with 4 space Tabs:
    def _five(one,two,
    ....,,,,|hree,four):

    | => cursor position when you press ENTER.


I really liked the part about for ... else statement, the body of else statement executes the code after the loop finishes but not if break statement was executed.

I know about this construct but whenever I have a problem that requires it, I always forget and use old:

    if counter -1 == len(obj):
        ...
ugh.


I wrote an article about the same subject, albeit much shorter and from a slightly different perspective: http://www.michielovertoom.com/python/ugly-python/


Can anyone suggest any other materials or resources in the same vein as this?



Thanks for the reply -- I should have been more specific. I'm after something that similarly contrasts naive and idiomatic styles.


You mean Python specifically or for other programming languages?


Python, specifically.


  * Idioms and Anti-Idioms in Python (http://docs.python.org/howto/doanddont.html)
  * Python Idioms and Efficiency  (http://jaynes.colorado.edu/PythonIdioms.html)
  * Python Style Guide (http://www.python.org/dev/peps/pep-0008/)
  * Google Python Style Guide (http://google-styleguide.googlecode.com/svn/trunk/pyguide.html)
  * Pocoo Style Guide (http://www.pocoo.org/internal/styleguide/)


In the Java ecosystem, they call it best practices. But of course using the phrase "best practices" these days is considered taboo. Thus the use of the word "idiomatic"?


A "best practice" is an optimal way to do something in a given context. An "idiom" is a commonly accepted way to express the thing you want to do. Best practices and idioms overlap, but are not identical.


A "best practice" is essentially the same thing as "idiomatic python" in my opinion - I could pick apart different meanings from each but often times, it really is used to say the same thing.

The phrase "best practice" isn't taboo in my experience - I'll often ask what the "best practice would be for implementing XYZ". If you rephrase that, "what is an idiomatic way to implement XYZ", it's saying the same thing.

Python's community probably leans toward this diction because the culture of python revolves around the nebulous concepts of simplicity, obviousness, and "only one way", that in order to make those concepts more hylic and less "abstract and subjective" there is a lot of education (in the form of docs, PEP8, the very doc linked to in this post) to set the stage for what the expected idiom is.


http://dictionary.reference.com/browse/idiom

An idiom is rather a preferred practice.

If you take some time to read through some Python source, you'll notice that, although the code uses the exact same constructs that you are used to in Java, it does so in a manner you might find unfamiliar or unpredictable.

One example is the use of try/except instead of if/else (beg for forgiveness rather than ask permission). Another is Duck-typing rather than Interface (if it quacks like a duck...)

Coming from Java and looking at some Python code without forewarning or context, you might just be tempted to label the whole thing as really really bad practice, when in truth, it's just Idiomatic Python.


To be fair, though, some of the things on that list really do seem more like best practices, like advice not to use:

    from module import *




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: