Rust-style resource management in OCaml

spraak · on March 16, 2018

Tangentially related: has anyone been exploring ReasonML? I am enjoying it so far, but still hesitant because I'm not sure if its momentum (in the community with packages, etc) will sustain, and the interop into JS feels still somewhat clunky (though that is partly BuckleScript)

parley · on March 16, 2018

I have, and really enjoy it. My previous go-to for frontend was TypeScript, but it wore me down. I am ridiculously productive in ReasonML with proper ADTs, (exhaustive!) pattern matching, immutability and other niceties. The superb type inference is also great for prototyping, and I could go on and on. Rust for backend and ReasonML for frontend is making me very happy these days. I think the community will only keep growing. I feel like I’m an extra in the filming of Revenge Of The MLs, and it feels great.

virtualwhys · on March 16, 2018

> My previous go-to for frontend was TypeScript, but it wore me down

Yes, coming from Scala.js it feels like TypeScript constantly tells me little lies (i.e. that the runtime type of the typed type is not in fact the type in question, but something else entirely). A powerful language with plenty of interesting features, but there are better options out there if strict typing is of interest.

No choice in current project, Angular essentially requires TypeScript. Editor support for TypeScript (via VS Code) and JS interop are, however, excellent.

spraak · on March 16, 2018

Can you share more about how you're enjoying Rust alongside ReasonML? I don't have any experience with Rust but it sounds very conflicting.

parley · on March 16, 2018

Sure. I would say that Rust and ReasonML are very alike and very different at the same time, and I'll explain why I feel that. ReasonML's own docs give Rust a notable mention: "Close cousin of ours! Not garbage collected, focused on speed & safety."

They both

- have powerful static type systems with good inference.

- default to immutability, with optional mutability.

- encourage functional constructs over procedural/imperative ones.

- ADTs and exhaustive pattern matching with great ergonomics which can be used for everything from rigorous error management to unambiguous program state representation.

- have escape hatches to write "I know what I'm doing and need more wiggle room"-code, but those are clearly visible for linting/auditing/testing, as opposed to languages that allow foot guns to invisibly permeate entire code bases because the language has no clearly discernible rigorous subset that one easily and naturally can keep to.

The list of features that make them alike can be made as long as your arm, and it's basically a laundry list of features that (once internalized in the developers mind) helps one build robust, correct, maintainable software. Sure, they're not carbon copies of each other but what it comes down to is enabling and encouraging the same semantic constructs.

So where are they not alike? I would say that the one constraint that the main differences result from is the fact that ReasonML's automatic memory management is a run-time solution (garbage collection) whereas Rusts is a compile-time solution (using static analysis), coupled with Rusts focus on raw performance. Everything else about the Rust language has (IMHO) been designed with the same sound underlying values as many other languages which encourage correctness. The differences that one notes when learning Rust are really mainly just the concessions that were necessary to make to achieve compile-time automatic memory management and raw performance.

Did that help? And as always, if I'm mistaken about anything, please correct me. I'm not a PLT person.

empath75 · on March 16, 2018

Are you planning on also looking at Rust on the front end?

parley · on March 16, 2018

Absolutely. I've been a Rust lurker (and later user/advocate) since ~2012/2013, I think, and the visible WASM support strides being taken right now coupled with my trust in the Rust community gives me great hope for Rust on the front end.

It remains to be seen to what extent and which problems it will be suitable for, but just as Rust despite the fears of some is (again, IMHO) turning out to be ergonomic enough to write all kinds of end user apps in (although some areas are still lacking, but give it time), so too I think it might surprise people with how widely applicable it may turn out to be on the front end.

And I'm hopeful and optimistic. I feel like we're seeing some tides turn with respect to how important software correctness/robustness really is perceived to be and -- importantly -- what we're prepared to pay for it. We're still a "young" industry compared to a lot of engineering disciplines, so it's not surprising that we're still maturing with periodic waves of change. Of course, many will disagree with the direction, but for me it can't come fast enough.

Sorry, I hadn't ranted in a while, and it is Friday afternoon. =o)

Rotareti · on March 16, 2018

So far I've been using Python/JS/TypeScript mostly and I'm striving for a better Type System and more functional capabilities. ReasonML offers a lot of the things I was looking for:

* Familiar syntax (coming from Python/Go/JS/TS etc.)

* A lot of functional capabilities.

* A nice flexible type system.

* High (predictable) performance.

* Fast compile times.

* Compiles to byte/native/JS

* A mature compiler.

I'm still having an eye on three things:

1) What's the concurrency model going to be.

2) What's the solution for unicode going to be.

3) At what pace is the ecosystem going to grow.

If things turn out well, I can imagine OCaml/ReasonML becoming my general purpose language of choice for most of the stuff I'm currently using Python/Go/JS/TS for.

I recently started with a frontend using bucklescript-tea [0] and it's a great experience so far!

0: https://github.com/OvermindDL1/bucklescript-tea

rs86 · on March 17, 2018

Ocaml has concurrency support for async io but no multi threading. There are experiments on concurrent ML dialects.

steve-chavez · on March 17, 2018

Haskell has also an ongoing implementation of linear types https://github.com/ghc-proposals/ghc-proposals/pull/111.

Anyone know what are the major differences between these two implementations? Will linear types work better for OCaml than for Haskell?

Ono-Sendai · on March 16, 2018

Why would you want to burden a GC'd functional language with the syntactical and semantic requirements of Rust's resource management?

catnaroek · on March 16, 2018

Because Rust-style resource management is not only about performance, but also about correctness. And the ML crowd cares a lot about the latter - rightly so.

I'd kill for a language with ordinary ML types for values (integers, strings, etc.), and substructural types for objects (file handles, database connections, etc.).

zenhack · on March 16, 2018

Worth pointing out, the resource management strategy isn't even necessarily a net positive for performance. In typical GC'd languages you have a much richer design space for your memory system. You can move stuff around, you have flexibility as to when you do collection, you often have allocation as a built-in primitive, which means the compiler knows about it and can help optimize.

The end result is that good GC-based memory systems tend to perform better than malloc/free style APIs (with the call to free() possibly being implicit). to get good performance in C/C++/Rust, the programmer needs to be concientious about allocation. You have more control, but also more responsibility. In OCaml, allocation is bumping a pointer -- go nuts.

dpc_pw · on March 16, 2018

> The end result is that good GC-based memory systems tend to perform better than malloc/free style APIs

Citation needed.

GCs promise wonders, and yet in practice they are going to eat your memory, trash caches, and slow everything down. Performance benefits are typical "in some applications, in some use cases, etc." not "tend to"

saurik · on March 16, 2018

> Citation needed.

Oh come on: this is a well-known property of garbage collectors and should have been covered in an entry-level CS class. Did you even try to find this before pulling out the "citation needed" trope? Hacker News needs an auto-responses to any comment that has that phrase in it with "have you tried Google yet?" :/.

With just ten seconds of Google searching (so surely less time than it likely took you to type your "citation needed") I was able to find a meta-reference (a Stack Overflow answer with links to papers looking at this result). The results were "similar or up to 4% faster" and "much faster with lots of ram".

https://stackoverflow.com/questions/755878/any-hard-data-on-...

(At least one of those links is dead, but it is to the same paper that thesz linked to in a sibling to this comment... a comment which provided a citation and which someone downvoted, because people care more about opinion than actually finding citations. I upvoted his comment back to being rendered in black text :/.)

What is so strange about this being controversial is that it is truly an obvious result: a real heap allocation is really really slow, and a copying collector never has to allocate anything (it just bumps a pointer). So only when you have memory pressure does it eventually prune objects, and until then it runs lightning fast: faster than the malloc/free code could possibly dream of.

The advantage of affine types (as in Rust) isn't actually that it is avoiding GC: it is that it makes it possible to avoid allocation itself by making it safe to allocate things on the stack (where you get the speed of bumping a pointer again instead of the painfully slow malloc). That is the kind of analysis that is often attempted by languages like Java ("escape analysis"), but is only available in limited circumstances.

dpc_pw · on March 16, 2018

> Oh come on: this is a well-known property of garbage collectors and should have been covered in an entry-level CS class.

There's plenty of BS tought in academia. In CS that would include love for modeling tools (I had classes about IBM Rational tools, bleh), OOP and other forms of sophisticated complexity. The practicioners tend to dislike GCs, but again it's not a good argument.

> a real heap allocation is really really slow

There is nothing preventing a heap allocation to be almost as fast, at the cost of memory fragmentation/overutilization and/or slower deallocation.

And that's where the whole tradeoff is. Memory allocation has to track data explicitily, GC tracks data implicitly. Typically this tradeoff is expressed in a memory over-consmpation of GC to amortize the GC deallocation slowness. And it's all dandy on paper, because the argument is "oh, just take more memory and you're fast again". Which is not such an easy thing to do. Such memory could be used as a IO caches or to run other processes. And over and over companies rewrite their memory-hogging GC-ing services into explicit-memory allocation languages and they run faster and what's more important, utilize less memory and thus make the whole system faster. They can run on much smaller VM instances etc.

So the whole "GC is as fast or even faster" is dubious in general sense, IMO. And most of the papers trying to prove it is of form "in this specific circumstances GC can run the same or even outperform mangaged memory lanagues".

Veedrac · on March 16, 2018

Thinking GC is slow is an easily mistake because GC encourages languages to allocate pathologically, which is slow.

zenhack · on March 16, 2018

Okay, yeah, I should temper that statement. This stuff is super hard to actually study rigorously, and I certainly don't have a reference to support as broad and vague a claim as "tend to perform better."

And of course "perform" is a very muilti-dimentional thing, not a single value.

The truth of the matter is that any generic memory management strategy is going to fall on it's face in certain scenarios; most folks have seen first hand the raii failure mode where some application takes forever just to exit, because it's pointlessly calling god-knows how many destructors.

It's late, and I should generally try to be more clear with myself about what my point is before I post. The thing in there that I think is more salient is, with manual memory management you're opting into control, not magic everything-is-faster-now. I don't think the op was suggesting it, but people often act like using a lower level language is going to always make things faster, ans it just isn't true. Idiomatic, decently performing OCaml is going to make for some very slow Rust.

pjmlp · on March 16, 2018

Depends on the GC implementation, not all of them "eat your memory, trash caches, and slow everything down" the same way.

And even those that do, if they do it within the time and memory budget bounds for the task goals, then that is still a win.

I don't care that manually managed memory would solve the task with 10ms and 20KB, the GC with 100ms and 200KB, if my budget is 1s and 1MB.

thesz · on March 16, 2018

You asked for citation, here it is: https://www.sciencedirect.com/science/article/pii/0020019087...

Garbage collection can be faster than stack allocation when your live set is much smaller than memory.

catnaroek · on March 16, 2018

As I said before, resource management is first and foremost a matter of correctness, so performance is neither here nor there. While you don't need manual resource management for data structures containing values, you do need it for various types of handles (files, network connections, you name it), which a GC cannot guarantee it will collect at the right moment. There is no reason why you couldn't have GC when it makes sense, and deterministic reclamation where it is required.

pjmlp · on March 16, 2018

> There is no reason why you couldn't have GC when it makes sense, and deterministic reclamation where it is required.

Which is actually available in quite a few GC enabled languages.

I think lack of exposure to multiple programming languages has created this urban myth that there is only one way of doing resource management in languages with a GC.

That is what happens when professional schools adopt "one language to rule them all" teaching.

catnaroek · on March 16, 2018

> Which is actually available in quite a few GC enabled languages.

In existing implementations, not with Rust's static safety guarantees about dynamically allocated resources, sorry. “Just provide some way to disable the garbage collector” raises some obvious (at least to me) questions that are nevertheless not addressed by the alternatives you usually propose:

(0) When does it make sense to disable the GC?

(1) How do we know we are using resources correctly by ourselves?

(2) How do we know we are cleaning up resources correctly by ourselves?

(3) How would you rigorously prove all of the above? (Yes, unlike most other programmers, I do prove my programs correct.)

pjmlp · on March 16, 2018

I have not written anywhere “Just provide some way to disable the garbage collector”, rather that they offer other ways of managing resources.

In Modula-3 or D, as possible examples, a solution might be destructors or scope attributes.

The only way to actually be 100% safe would be if everyone would be programming with formal logic, which still requires a lot of research to make it approachable by the average developer.

catnaroek · on March 16, 2018

> In Modula-3 or D, as as possible examples, a solution might be destructors or scope attributes.

These still allow you to mismanage resources.

> The only way to actually be 100% safe would be if everyone would be programming with formal logic,

Indeed. In order to construct a correct program, you also need to construct the logical argument that establishes its correctness. So, yes, you have to use formal logic.

> which still requires a lot of research to make it approachable by the average developer.

s/research/education/

You raise the craftsman to the level the craft requires, not lower the craft to the level of an apprentice.

pjmlp · on March 17, 2018

> These still allow you to mismanage resources.

True, I didn't say the solution was perfect, but it is good enough for most devs.

> You raise the craftsman to the level the craft requires, not lower the craft to the level of an apprentice.

Nice goal, but I doubt you would manage to achieve that to anyone doing enterprise CRUD apps, mobile apps, games, ... to bother one second to learn TLA+, Idris, Coq or similar, unless their jobs depend on it.

catnaroek · on March 17, 2018

The original post to which you replied begins with:

> As I said before, resource management is first and foremost a matter of correctness, so performance is neither here nor there.

Your proposed alternatives, namely:

> In Modula-3 or D, as possible examples, a solution might be destructors or scope attributes.

merely offer automation, not verification. Automation without verification simply makes screw-ups harder to undo.

nickpsecurity · on March 16, 2018

Replying here since your other comment may be gone. When you phrased it as trees and graphs, it made a lot of since given I looked previously at languages like term-rewriting that did stuff like that. I'll pay you back with one of my most recent finds on handling trees with dedicated language:

https://www.microsoft.com/en-us/research/wp-content/uploads/...

I agree with you on your point that you can mix GC and manual management selectively. As pjmlp, there's quite a few things in between full manual and GC, though, to add to that claim. Ada also has techniques with contracts recently added. Any language with contracts then adds even more if one can express the property in the contract logic for manual or automated methods.

Far as performance, it matters so much that the biggest ecosystems either got there due to better cost/performance ratio or marketing moves by big companies. Those are arguably more important if one wants mass adoption with all the library and funding benefits that come with it. Both Go and Rust are nailing it in their respective niches by balancing performance and safety benefits. Even Haskell developers invested lots of effort into compiler performance with some commercial uptake happening as a result. I tell language authors to ignore performance or ecosystem compatibility at their language's peril.

catnaroek · on March 16, 2018

> Replying here since your other comment may be gone.

I deleted it because zenhack's reply was better.

> Those are arguably more important if one wants mass adoption with all the library and funding benefits that come with it.

I couldn't care less about “mass adoption”. Also, working on the Right Thing (tm) doesn't require a lot of money. It requires an intrinsic scientific motivation, and patience to polish programs until they are flawless.

> I tell language authors to ignore performance or ecosystem compatibility at their language's peril.

Performance is important, but only after correctness has been established. It doesn't matter how fast a program is if it doesn't work.

As for “ecosystems”, again I couldn't care less.

mratsim · on March 16, 2018

I think Nim or Ada would fit your needs:

- very strongly typed, you can even make sure you don't add Miles and Meters together, and that Meters * Meters give you square-meters

- destructors

thesz · on March 16, 2018

They are not all that well typed. You can't have constrained types, for example (available with type classes in Haskell).

I spent long time with VHDL, which is Ada with third leg. It is not nice language to work with.

pjmlp · on March 16, 2018

I also spend some time with stored procedure programming, mostly PL/SQL, most of them are also Ada based, and I find them quite productive, actually.

mratsim · on March 16, 2018

When you say constrained types do you mean generics? You have generics and concepts (similar to C++17) in Nim, not too sure about Ada.

thesz · on March 18, 2018

I meant types with constraints, like this one: sum :: Num a => [a] -> a The type variable a is constrained to have implementation of interface Num (integer constant (0), addition, subtraction, etc).

These constraints can be much deeper. For example, I can define a function that allow me to add two somethings if their "regions" (a property I know how to compute) is equal and the result will be produced in "region" of first argument: add :: (Region a ~ Region b, Region a ~ Region c) => a -> b -> c

One use of something similar is addition of integer values represented by bit vectors in Verilog - you should not add bit vectors from different clock domains, otherwise you get unpredictable results.

catnaroek · on March 16, 2018

Algebraic data types and ML-style modules are non-negotiable.

pjmlp · on March 16, 2018

In Ada, you can get similar concepts via tagged records and generic packages.

catnaroek · on March 16, 2018

Tagged records are one possible in-memory representation of sum values, perhaps the most sensible one. But I don't want in-memory representations of sum values. I want sum values themselves.

ML modules are much more than generic packages. ML's module system allows you to take an existing module, hide some of its components, make some of its concrete types abstract, and get a new module. My methodology of enforcing only one invariant in each module is impracticable without ML modules.

nickpsecurity · on March 16, 2018

"I'd kill for a language with ordinary ML types for values (integers, strings, etc.), and substructural types for objects (file handles, database connections, etc.)."

Could you explain what that means and how it benefits compared to common languages discussed here? Those of us from imperative backgrounds mainly read these threads to learn ideas that we might move into such languages.

zenhack · on March 16, 2018

So, from an executution model standpoint Rust is pretty close to C++ -- manual memory management, with RAII as the usual strategy for managing things. The big novelty is that the type system actually enforces this, so you can't have dangling pointers, or forget to close a file, or...

But most languages have had a solution to the resource problem at least as far as memory is concerned for a long time: garbage collection. It has huge advantages over Rust's approach, in that you can basically not think about RAII or ownership at all; as long as you're not actively holding on to an object, the memory will get reclaimed for you.

But this doesn't solve the problem for other resources like files, since when a file is closed actually matters semantically. You don't want to just let the garbage collector decide when to shut down a tcp connection.

So what the gp is suggesting is, it would be nice to have a language that uses GC where it makes sense, as it's generally easier to work with, but also provides sane mechanisms for releasing resources like files and sockets.

pjmlp · on March 16, 2018

> But this doesn't solve the problem for other resources like files, since when a file is closed actually matters semantically. You don't want to just let the garbage collector decide when to shut down a tcp connection.

At least in some languages this is not an issue if one is able to use higher order functions, with bonus points if they allow for trailing lambdas.

So if it is possible to organize the application architecture as regions where the resources are supposed to be valid, then one can manually get rid of such resources.

Of course, this might not always be possible, and there is the caveat that one needs to remember to apply such patterns.

Which is easier if it can be imposed via the type system.

catnaroek · on March 16, 2018

> At least in some languages this is not an issue if one is able to use higher order functions, with bonus points if they allow for trailing lambdas.

I am afraid you are talking nonsense. Higher-order functions make verification harder, not easier, because they hide the point at which control flow is transferred from one module to another. This is backwards, because this point should be prominent, in big neon letters, precisely so that you can tell exactly when a resource stops being available, or when an invariant goes from being “your responsibility” to “someone else's problem”.

zenhack · on March 16, 2018

I think pjmpl is referencing apis like this:

   with_file "hello.txt" (fun fd -> (* do stuff *) )

where you've basically created a c++ destructor style api with a lambda.

It's certainly not "verification" in any formal sense, and the type system can't help you there any more than it can in c++.

But this is a perfectly reasonable pattern.

catnaroek · on March 16, 2018

> where you've basically created a c++ destructor style api with a lambda.

Except for the part where destructors are meant to be called no more than once per object, at the end of its lifetime. All that you can guarantee is that `with_file` doesn't call the destructor more than once. But that's not terribly interesting.

zenhack · on March 16, 2018

You can screw up destructors in exactly the same ways. And it also serves to make sure you close the file at all.

But again hence the value of having the lifetime of the object enforced by the type system.

catnaroek · on March 16, 2018

In the presence of reference-counted mutable objects, destructors no longer guarantee that cleanup will happen. But that much is okay. This is a liveness property, and, as far as I can tell, nobody really knows of any non-annoying way to enforce liveness properties with types. On the other hand, “cleanup is the last operation that can be performed on an object” is a safety property, and types are excellent tools for verifying properties of this kind.

pjmlp · on March 17, 2018

Yep, that was it. Thanks for the example.

anilmujagic · on March 16, 2018

Take a look at F# units of measure: https://fsharpforfunandprofit.com/posts/units-of-measure/

pjmlp · on March 16, 2018

Because you can enjoy the productivity of working in a GC functional language, and only use affine types when critical to do so.

Just like GC enabled systems programming languages do provide ways of manually managing resources in performance critical sections.

sitkack · on March 16, 2018

To escape the GC and have determinism. Ownership annotations will eventually make it into all languages. They orthogonal to types. Optional type annotations will eventually make it into all languages. And at that point, we will have some sort of common semantic substrate to root all of current languages on.

rs86 · on March 17, 2018

Ownership in rust is determined from the type system using linear types.

Rusky · on March 16, 2018

To provide stronger types in some cases? To opt into in performance-sensitive programs? To see if it can be made less burdensome if different tradeoffs are made?

abiox · on March 16, 2018

what do you mean by "stronger types"?

testestx · on March 16, 2018

Affine and linear types and variants (ownership types, uniqueness types) can model predictable release of resources, like a compiler-enforced C++'s RAII.

As another commenter put,

> The paper explains it's because they want deterministic destructors for things like file handles.

masklinn · on March 16, 2018

1. deterministic resource management (locks, files, …)

2. affine & linear types are actually useful modelling tools e.g. to make state machines statically non-rollbackable/duplicable (this is related to resource management as those are generally the STMs you want to lock down)

skybrian · on March 16, 2018

The paper explains it's because they want deterministic destructors for things like file handles.

thesz · on March 16, 2018

You may employ non-trivial optimizations to a program which are not available in the Rust compiler and then output to, say, Rust. Or C.

Also, you can easier combine different parts of a program, creating algebraic structures not available in Rust. Rust does not support higher-order functions nicely, OCaml does, for one example.

rcaught · on March 16, 2018

Karma?