More

datanecdote · on Oct 20, 2020

For Jax I believe this is false.

Jax is composable. In fact it’s a core design goal. Jax arrays implement the Numpy API. I routinely drop Jax arrays into other python libraries designed for Numpy. It works quite well. It’s not effortless 100% of the time but no library interop is (including Julia multiple dispatch).

I can introspect Jax. Until you wrap your function with jit(foo) it’s as introspectable as any other Python code, at least if I’m understanding what you mean by introspection.

Jax has implemented most of the Numpy functions, certainly most of the ones anyone needs to use on a regular basis. I rarely find anything missing. And if it is, I can write it myself, in python, and have it work seamlessly with the rest of Jax (autodiff, jit, etc)

oxinabox · on Oct 20, 2020

Jax is awesome. But supporting most of numpy isn't enough. Because numpy isn't composable. You want to add banded-block-banded matrices to numpy? Then you need to fork numpy (or in this case fork jax); this is a package in julia and it works with everything. You want add names to your array dimentions like PyTorch recently did, then like PyTorch you need to fork numpy; again this is package in julia. You want to do both? You have to merge those two forks into each other. In julia this isn't even a package this is just using the aformentioned two packages.

You want to work with Units or track Measurment error (or both?). Basically same story. Except better in some ways worse in others. Better because you don't have to fork numpy, it is extensible enough to allow that. Packages exist that use that etendability for exactly that. Worse because those are scalar types, why are you even having to write code to deal with array support at all. Agian 2 julia packages and they don't even mention arrays internally.

The problem's not Jax. The problem is numpy. Or rather the problem is this level of composability is really hard most of the time in most languages (including the python + C combo. Especially so even).

Its true that this is not always trivial 100$% of the time with julia's multiple dispatch. but it is truer there than anywhere else i have seen.

datanecdote · on Oct 20, 2020

How does Jax lose composability or introspection?

krastanov · on Oct 20, 2020

E.g. jax does not autodifferentiate anything that is not jax (scipy ode solvers, special functions, image processing libraries, special number types (mpmath), domain-specific libraries). Compare that to Zygote.jl

datanecdote · on Oct 20, 2020

It is true that Jax cannot differentiate through C code. But it can differentiate through python code that was written to accept Numpy.

amkkma · on Oct 20, 2020

Which is extremely limited compared to Zygote which can do custom types, dicts, custom arrays, complex type system and multiple dispatch uses etc

datanecdote · on Oct 21, 2020

Try reading the docs before making sweeping negative comments about what a piece of software can and cannot do.

https://jax.readthedocs.io/en/latest/notebooks/autodiff_cook...

krastanov · on Oct 21, 2020

Are you talking about "Differentiating with respect to nested lists, tuples, and dicts" from that page? The comment to which you are responding covers quite a bit more. The jax documentation specifically says "standard Python containers". Zygote.jl and other less stable Julia auto-diff libraries go far beyond the built-ins and can work with structures defined by packages never designed to be used with automatic differentiation. Of course, there are limitations, but quite a bit less severe than the one in jax (and again, I am saying this while being a big jax fan).

datanecdote · on Oct 21, 2020

As the document I linked to says, Jax autograd supports custom data types and custom gradients.

It’s honestly exhausting arguing with all you Julia boosters. You can down vote me to hell, I don’t care. I’m done engaging with this community.

You all are not winning over any market share from Python with your dismissive, arrogant, closed minded culture.

cbkeller · on Oct 21, 2020

I understand you are frustrated, however, please remember

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

https://news.ycombinator.com/newsguidelines.html

krastanov · on Oct 21, 2020

I am confused why you assume I am a "Julia booster" or use such combative language. I love Python and Jax and use it for much of my research work, I just also like learning of other approaches. Please try to honestly address the sibling comments. We have repeatedly claimed that tools like Zygote.jl can autodifferentiate efficiently things that Jax can not (without a lot of extra special code and hand-defined backprop methods), e.g., an array of structs with scalar and vector properties over which a scalar cost is defined. Just give examples, so that we can both learn something new about these wonderful tools instead of using such offensive language. It is hard to not take your own comments as the ones being dismissive.

Also, look from where this conversation started. My claim was that jax does not work with "(scipy ode solvers, special functions, image processing libraries, special number types (mpmath), domain-specific libraries)". A julia library does not need to know of Zygote.jl to be autodifferentiable. A python library needs to be pure-python numpy-based library to work with jax.

In order to try to contribute to the discussion: I think this paper describes relatively well what is so special about the Julia autodiff tools: https://arxiv.org/abs/1810.07951

For a separate approach, which is also very original, check out https://github.com/jrevels/Cassette.jl

amkkma · on Oct 21, 2020

I don't see anywhere about a better type system or multiple dispatch. Try being less salty

In what language are you defining these custom arrays or types? certainly not in python, or they'll be too slow to be worthwhile.

krastanov · on Oct 20, 2020

Indeed, but "python code written to accept numpy" is a pretty restrictive subset (comparatively; I do still enjoy using python). It does not even cover most of scipy, let alone the domain specific libraries, which frequently end up using cython or C for their tightest loops.

datanecdote · on Oct 20, 2020

The right benchmark is Stan

https://github.com/TuringLang/TuringExamples/pull/25

civilized · on Oct 20, 2020

What is going on here?

datanecdote · on Oct 20, 2020

Lengthy, nuanced discussion about benchmarking between Turing devs and Stan devs.

datanecdote · on Oct 20, 2020

When I look at google trends or redmonk rankings, Julia appears stable, not accelerating.

DNF2 · on Oct 20, 2020

I don't understand google trends. Julia's progress doesn't look like much there, but even python has had pretty modest progress there, which doesn't make sense.

datanecdote · on Oct 13, 2020

Why minizinc instead of Google OR? Seems like Google OR best minizinc at their own contest?

https://www.minizinc.org/challenge2020/results2020.html

Is it more customizable? Or expressive (in terms of modeling DSL)?

mzl · on Oct 13, 2020

As another commenter mentioned, MiniZinc is a DSL for modelling combinatorial optimization problems, and Google or-tools is a solver that can solve such problems. So it is not really comparable in that sense.

The benefit of using MiniZinc instead of a specific solver directly, is that the user can try different solvers easily to find the one that works best for their problem. The drawback is naturally that it may be harder to use solver specific features.

The MiniZinc challenge is a competition that pits these solvers against each other. Note that the group that runs the challenge also develops solvers (which are ineligible to win prizes in the challenge, but are participating), and that these solvers are sometimes better than Google or-tools (no solver dominates all other solvers). See https://www.minizinc.org/challenge2020/results2020.html to get the full results.

Finally, while solving speed is of course important, it is not always the most important issue when developing a cobinatorial optimization solution. Choice of general tech stack, documentation, support, ease of integration, local expertise, and so on are important issues, as it is for every large dependency choice.

raverbashing · on Oct 13, 2020

> MiniZinc is a DSL for modelling combinatorial optimization problems, and Google or-tools is a solver that can solve such problems.

Their DSL look very similar to a lot of specific DSLs of other tools in the field

As opposed to OR-Tools where you can directly use a language you know (like Python) to express your constraints and targets

Having worked with both approaches, I like the latter better

mzl · on Oct 13, 2020

Personally, I tend to use MiniZinc for prototyping and experimenting and for solving simple problems.

For building larger solutions, I would also probably use a solver directly. Reasons for that would be customization, integration, and data handling issues. For me the go-to system would be Gecode, but there are a lot of very interesting and varied solvers available.

Worth noting is that MiniZinc is starting to get more and more support for integration into large systems, for example with the python interface (https://pypi.org/project/minizinc/).

ukd1 · on Oct 13, 2020

Basically this for me too; MiniZinc is great whilst you don't really have the full problem nailed, when it's harder to pick what you'd like to focus on using sovler wise.

buddhiajuke · on Oct 13, 2020

I use Python to generate DPLL and likewise with MiniZinc. It’s no different from SQL: there’s something of an impedance mismatch that’s already felt in eg Sympy.

haakonhr · on Oct 13, 2020

Minizinc is just a DSL for describing different models which is then compiled to flatzinc. Flatzinc is then accepted as input to a wide range of solvers, including Google OR.

datanecdote · on Oct 11, 2020

> 2. I don't entirely follow this point. Perhaps using PyArrow's parser would be faster than what is timed here, but is that what the typical Python data science user would do?

I am a Python data science user. If data gets big enough such that loading time is a bottleneck, I use parquet files instead of CSV, and PyArrow to load them into pandas. It’s a one line change. The creator of Pandas is now leading the Arrow project. It’s very seamless. Don’t know if I’m typical but that’s me.

ViralBShah · on Oct 11, 2020

Perhaps not directly relevant to your point here, but thought it would be interesting to anyone following along.

Jacob Quinn (karbacca) also has a Julia package for integrating Julia into the Arrow ecosystem: https://github.com/JuliaData/Arrow.jl

datanecdote · on Oct 12, 2020

Thanks Viral. To be clear, I’m a python user who’s cheering for Julia, because I live the problems of python and do see the potential of Julia as a better path. But unfortunately I’m not prepared to be the early adopter (at least in my day job), and will wait until other, braver users have sanded off the rough edges. God speed and good luck.

ViralBShah · on Oct 12, 2020

That's a completely reasonable viewpoint. Many users of Julia and contributors start out experimenting with it and then end up bringing it into their work when they feel comfortable with it. I hope you will have the same experience one day.

datanecdote · on Oct 9, 2020

Let me second GP’s sentiment. I find Julia really slow for my purposes. I don’t know his reasoning, but I will explain mine. None of this is surprising and is oft discussed.

Julia (at least by default) is constantly recompiling everything. This is a huge pain in a REPL style setup where you want to tweak one thing and see the changes, again and again. I know the Julia ecosystem is working on better caching etc to fix this problem but it’s a problem.

Also, despite the marketing claims around the language, expertly crafted C usually beats Julia in performance. So if your “python” program is spending most of its time in Numpy/PyTorch/etc, it will beat Julia, unless you’re writing a fancy “put a differential equation in a neural network in a Bayesian Monte Carlo” program that benefits from cross compiling across specialized libraries.

Finally, the Julia libraries are just not as mature as python’s. Armies of developers and larger armies of users have battle tested and perfected python’s crown jewel libraries over many years. Often when someone posts a bad benchmark to the Julia forums they can “fix” it in the library implementation, proving the correctness of the theoretical case for Julia. But in reality many such problems remain to be fixed.

Julia is really cool and does have many inherent advantages over python. But it’s not the silver bullet many of its proponents suggest it to be. At least not yet. Every few years I check out Julia and I hope one day it does become that perfect language. I think it will. I just fear it will take longer than many others hope.

timholy · on Oct 10, 2020

I appreciate your well-balanced critique, thanks.

> Julia (at least by default) is constantly recompiling everything. This is a huge pain in a REPL style setup where you want to tweak one thing and see the changes, again and again. I know the Julia ecosystem is working on better caching etc to fix this problem but it’s a problem.

Maybe try Revise.jl? There are a few changes it can't handle, but you can do a lot of development without ever restarting. (Disclaimer: I'm its main author.)

> expertly crafted C usually beats Julia in performance

This isn't generically true, and there are now quite a few examples of the converse. I linked to it above as well, but check out the benchmarks in LoopVectorization's documentation (https://chriselrod.github.io/LoopVectorization.jl/latest/exa...) for examples of beating MKL, one of the most carefully-engineered libraries in existence.

I think an exciting area of growth for Julia will be exploiting the fact that Julia's compiler, written mostly in Julia, is more "morphable" than most and may develop its own plug-in architecture. This seems likely to provide performance opportunities that many fields seem hungry for.

> the Julia libraries are just not as mature as python’s

On balance I agree. While there are already many examples where Julia makes things easier than Python, as of today there are many more examples to the contrary. Julia's libraries are advancing rapidly, but I expect it will take a few more years of development until it's no longer so one-sided.

rich_sasha · on Oct 9, 2020

My thoughts exactly.

I would just add, I feel Python is stagnating as a scientific programming _language_. The libraries, ecosystem etc are still great, and Python is still a great language, but these days the development focus seems to be on type hints and unicode support.

I wouldn’t be surprised if Julia takes over, simply because it actually focuses on scientific programming. To me, personally, that would be a shame; good for Julia, but I still find Python a better language overall.

datanecdote · on Oct 9, 2020

Preach, brother.

I’m cautiously optimistic that JAX (or something like JAX) can save the python programming language from stagnation by essentially building a feature-complete reimplementation of the language with JIT and autograd baked into the core. I’m praying that Google diverts like 10% of TF’s budget to JAX.

That way I don’t have to learn to love a bunch of unnecessary semi colons and “end”s littering up my beautiful zero-indexed code ;-)

bieganek · on Oct 9, 2020

Julia code almost never has semicolons. Semicolons can be used at the REPL to suppress printing, but actual Julia code does not normally use semicolons.

I personally like the "end"s because I like the symmetry and they're prettier than curly braces. Also, there are some syntax color themes that color the "end"s in a darker color in order to de-emphasize the "end"s, which can be nice depending on your taste.

datanecdote · on Oct 9, 2020

That was mostly meant as a joke, thus the “;-)”

I don’t really care much about syntax choices, but my small complaint about “end” is that it takes up a line which reduces the amount of business-logic code I can fit on one screen, especially if you ever get into lots of nested loops and conditionals.

Sukera · on Oct 9, 2020

To each their own, but I usually try to refactor when I hit too many nested loops or branched statements - it's usually a sign of missing some abstraction or trying to be too clever.

datanecdote · on Oct 9, 2020

@Sukera

Fair, but, if I break up all the loops and if statements into functions, those functions still have “end”s

ablekh · on Oct 10, 2020

For the record, I'm a fan of both Python and Julia (though, I believe that the latter is not yet ready for general programming mainstream industrial use). As for Python "stagnating as a scientific programming _language_", it is totally understandable. It tries to be the language of choice for everyone. However, as we know, "you can please some of the people all of the time, you can please all of the people some of the time, but you can’t please all of the people all of the time".

rich_sasha · on Oct 11, 2020

My point is, Python is no longer _trying_ to please scientific programming circles. It is developing, just not in these directions.

ablekh · on Oct 11, 2020

Understood, fair enough.

st1x7 · on Oct 9, 2020

> but these days the development focus seems to be on type hints and unicode support.

It's a result of the language maturing. The tradeoff is between groundbreaking innovation and being a stable language with a large user base. You can't have it both ways.

rich_sasha · on Oct 9, 2020

There are features Python is missing as a scientific language, that are not a focus. Proper multicore support, GIL, any static safety (which type hints sort of address but not really).

It was fine in Python 2, it was cheap and cheerful. It feels likes Python 3 is running out of ideas for improvement, and yet these are not at all in the scope of work.

Cf Julia that treats all these features as first class problems.

andi999 · on Oct 10, 2020

I am wondering if this is really a problem? I just parallelized some numerical code and instead of threads (Gil problem), I use processes. As far as I understand the only drawback would be that the parallel items cannot share memory, well do you do that? I find it hard to reason about correctness in these cases.

rich_sasha · on Oct 11, 2020

This works a bit. But sometimes sharing memory is useful. Sometimes you may want to parallelise a local function, or a callable class instance - except you can't.

And when I say, "it works", that's clearly on Linux and Mac. On Windows multiprocessing is very severely stunted by lack of forking.

Meanwhile, probably even my watch supports threads.

leephillips · on Oct 9, 2020

This might interest you: you can now turn off compiler optimizations at the module level, using a macro. For some people this speeds up the delopment cycle, as it skips most to the time-consuming compilation activity.

tokai · on Oct 9, 2020

oh that is interesting. I'll give it a spin again this weekend. Thanks.

oxinabox · on Oct 9, 2020

Caching has improves a ton in the last 3 minor releases.

datanecdote · on Oct 9, 2020

Thanks. I watched the JuliaCon state of Julia presentation. As I wrote in my original post, I appreciate the investments the Julia core developers are making, that have improved but not eliminated this problem. I wish them luck.

datanecdote · on Sept 18, 2020

FWIW I agree with you. I’ve always found cython easier than Numba. And more performant.

I think Numba has a lot of potential and will improve as they fill out remaining language coverage and finalize the API. The idea of a LLVM JIT compiler for python makes a ton of sense.

datanecdote · on Sept 17, 2020

Honest question from a heavy python user who would switch if it made sense:

Are there any comprehensive benchmarks that show Julia outperforming Pandas or PyTorch or SciKit?

Obviously pure Python is terrible. But the library algorithms written in C seem fairly competitive.

I’m a fairly boring user who doesn’t do new science, and is fine just composing existing boring algorithms to solve problems in my subject matter domain.

wodenokoto · on Sept 17, 2020

The argument is if you need to do something slightly different, but straight forward (and this doesn't even have to be in you algorithm, it could be in your data prep) you either have to accept a slow loop or go hunt the documentation and come up with some obscure API or some very clever combination of API's.

To my knowledge if you stick to things that calls optimized C and fortran code, it's a draw between the compiled code and Julia.

But even boring problems ends up doing things that are easily expressed in a loop, but ends up being a hard to read chain of pandas.

datanecdote · on Sept 17, 2020

In the uncommon event I need to write a loop from scratch, and I need it to be really fast, I just rewrite that one jupyter cell in cython or numba. But that is a small piece of my codebase.

I agree that Julia code is aesthetically superior to a long chain of Pandas code. But at this point I’m used to reading a bunch of chained pandas code. Often I think of myself as more of a Pandas programmer than a Python programmer.

datanecdote · on Aug 27, 2020

I am in similar boat. Python centric data scientist. Very tempted to try to learn Rust so I can accelerate certain ETL tasks.

Question for Rust experts: On what ETL tasks would you expect Rust to outperform Numpy, Numba, and Cython? What are the characteristics of a workload that sees order-of-magnitude speed ups from switching to Rust?

brundolf · on Aug 27, 2020

I'm far from an expert, but I would not expect hand-written Rust code to outperform Numpy. Not because it's Rust and Numpy is written in C, but because Numpy has been deeply optimized over many years by many people and your custom code would not have been. When it comes to performance Rust is generally comparable to C++, as a baseline. It's not going to give you some dramatic advantage that offsets the code-maturity factor.

Now, if you're doing lots of computation in Python itself - not within the confines of Numpy - that's where you might see a significant speed boost. Again, I don't know precisely how Rust and Cython would compare, but I would very much expect Rust to be significantly faster, just as I would very much expect C++ to be significantly faster.

datanecdote · on Aug 28, 2020

I deal with a lot of ragged data that is hard to vectorize, and currently write cython kernels when the inner loops take too long. Sounds like Rust might be faster than cython? Thanks for the feedback.

cft · on Aug 28, 2020

Also it might take 20x less RAM compared to using Python objects like sets and dicts. In Rust there's no garbage collection, and you can lay out memory by hand exactly as you want.

brundolf · on Aug 28, 2020

Most likely, yes

IlyaOrson · on Aug 28, 2020

Julia might be a better fit for this use case.

That way you leverage a more developed data ecosystem, can call python when necessary and avoid writing low level code.

Depends on the task of course.

datanecdote · on Aug 28, 2020

I’m fascinated by Julia and have test driven it before but it didn’t click for me. Maybe I was doing it wrong and/or the ecosystem has matured since I last looked.

I guess I generally do like the pythonic paradigm of an interpreted glue language orchestrating precompiled functions written in other languages. I don’t need or want to compile the entire pipeline end to end after every edit, that slows my development iteration cycle times.

I just want to write my own fast compiled functions to insert into the pipeline on the rare occasions I need something bespoke that doesn’t already exist in the extended python ecosystem. It seems like a lower level language would be optimal for that?

IlyaOrson · on Aug 28, 2020

If the dev cycle feels slow in julia, you can make it snappier with a tool like Revise.jl, it is quite handy.

If you just need to fill a small and slow gap maybe something like numba is also a good option to stay within python.

Going all the way to a low level language would require the compilation, the glue code and expertise in both languages. Probably that slows down the development pipeline more than the JIT compilation from julia or numba.

Anyway, any opportunity to learn/practice some rust is also great!

Measter · on Aug 28, 2020

One thing that may help with the glue-code aspect would be a crate like pyo3[0], which can generate a lot of the details for you.

[0] https://crates.io/crates/pyo3

Fiahil · on Aug 27, 2020

column-wide map-reduce over large dataframes usually give you a 1000x or so speedup.

With rust you can stream each record and leverage the insane parallelism and async-io libs (rayon, crossbeam, tokio) and a very small memory footprint. sure you have asyncio in python but that’s nowhere near the speed of tokio.

datanecdote · on Aug 28, 2020

Thanks for the pointers, those crates seem great. The flaky multithreading libs are my least favorite part of python, and rust’s strength in this area seems very appealing.