Audio, video, and image codecs written in Rust seem like a fantastic early oppor...

pcwalton · on June 18, 2015

I know we've run the Rust URL parser through AFL. IIRC, it found one panic (brackets in URLs) and nothing else. Nothing security-sensitive was found.

(Don't take the statement about security too strongly, however; AFL will not detect logic problems that could result in security issues, such as interactions with TLS hostname validation or whatnot.)

vbezhenar · on June 18, 2015

I don't think that production-quality codecs will be written in Rust. As far as I aware, codecs are usually very complex piece of software and employ a lot of hand-written assembly and very low-level C. Rust just doesn't offer anything valuable in this area.

pjc50 · on June 18, 2015

A very complex piece of software is exactly the sort of thing you want to avoid writing in C or assembly. The last person I know writing a codec implementation did it by machine-translating the spec into a functional program that output assembler, C, python, Verilog etc as targets. That did require handcrafting "leaf nodes" for things like matrix multiplication, but they were small enough to verify.

haberman · on June 18, 2015

Yes, I think of Rust as a promising target for parser generators. Thinking about text parsing, if Bison generated Rust, then you'd have a memory-safe parser that should be about as efficient as C. Something like this probably already exists or is being worked on. Ideally existing code generators could target Rust also to get memory safety.

There are some SIMD optimizations in parsers though that I don't know how easily you could express in Rust. The quintessential example of this for text parsing is Clang's optimization that uses SSE to skip over C++ comments 16 bytes at a time:

https://github.com/llvm-mirror/clang/blob/61f6bf2c8a8e94c4fa...

steveklabnik · on June 18, 2015

I am far from an expert in this area, but you can do SIMD with Rust: https://github.com/huonw/simd

haberman · on June 18, 2015

This looks like a good start (though experimental). If it supported something like AltaVec's vec_any_eq() intrinsic on its u8x16 type, that would do the trick. vec_any_eq() takes a vector and a value and returns true if any element of the vector equals the value.

On x86 with SSE, this could generate a sequence of two instructions: pcmpeqb (do 16 byte-wise compares) followed by pmovmskb (collect the 16 comparison results into a single byte). Then you'd get the same efficiency as what Clang does (Clang searches 8 bytes at a time for a '/' character when skipping over comments).

jroesch · on June 18, 2015

Huon is working on SIMD full time at Mozilla this summer as far as I know. We should see more mature support materialize in the next couple of months for sure.

buster · on June 19, 2015

I have been working with https://github.com/kevinmehall/rust-peg for a while. Also there is https://github.com/Geal/nom

nitrogen · on June 19, 2015

A while back there was a blog post (I think by Dark Shikari) about the use of assembly in x264. I couldn't find it by searching, but basically the very best C versions of essential image processing algorithms were orders of magnitude slower than the hand-crafted assembly versions, especially with SIMD instructions.

To create a viable codec, a functional spec compiler would have to close that performance gap. Edit: it might be possible to create a set of SIMD algorithmic building blocks (e.g. like liboil[0]) that could be verified for correctness then incorporated into the compiler.

[0] https://wiki.freedesktop.org/liboil/

teacup50 · on June 19, 2015

There has been work on typed assembly that could also apply: http://research.microsoft.com/en-us/projects/talproj/

peteretep · on June 19, 2015

Conveniently computers are getting orders of magnitude faster and cheaper, while the software on them - by virtue of complexity - has an increasing attack surface.

I will happily trade a video player utilising 4% of my CPU power and with fewer potential vulnerabilities than one using 0.4%.

nitrogen · on June 19, 2015

When it comes to video encoding, the tradeoff is often between encoding live video or not encoding live video.

I think there's also an important tradeoff to be made in how much of a carbon footprint we dedicate to software. I really don't think we should be trying to make software expand to consume all available resources. We have to be both efficient and secure.

Too · on June 19, 2015

Even when you are on a mobile device constrained by battery?

RBO2 · on June 18, 2015

Very interesting. How can we get more info on this? Is there any public code or product out there?

pjc50 · on June 18, 2015

The resulting commercial product is not the codec itself, but a set of test signals for codec verification for use by hardware vendors.

http://www.argondesign.com/products/argon-streams-hevc/

RBO2 · on June 19, 2015

Thanks! I'm contributing to an open-source project where we do a lot of standardization (including MP4). We're trying to improve the parser generation by using a model from the specification. We even have some funding for this. I'd be happy if we could discuss this (contact@gpac.io). Better standards means a better world for everyone :)

fithisux · on June 19, 2015

Can you imagine, writing a Bluetooth spec (and profiles) in a high level language and let Rust (or something else) produce the native driver code?

RBO2 · on June 19, 2015

Yes that's exactly what it is about! Unfortunately, our current result shows that it still requires much time to model the spec correctly. Any help appreciated, my contact is available on the message above :)

steveklabnik · on June 18, 2015

We'll see. For inline assembly, sure, it doesn't, but you can do some pretty intense stuff with Rust's type system to guarantee certain static properties. For example, earlier today this comment appeared on Reddit, talking about how to use the type system to guarantee that you're not doing out-of-bound array accesses: http://www.reddit.com/r/rust/comments/3aahl1/outside_of_clos...

It is true that the lower you go, the less we currently offer, but over time I expect library authors will use this and other future features to check even more properties without runtime overhead.

pcwalton · on June 18, 2015

A more modern language (modules, closures, pattern matching, package management, type inference) and memory safety are nice. If I were writing a new codec, I'd seriously consider writing the lion's share of it in Rust, to save time and to avoid security problems.

Alupis · on June 18, 2015

I think the sentiment of the parent was that we should avoid (re)writing things in pet language of the week.

Sure, Rust has a great following, but so does GO. Which is the "right" choice for a new codec? I don't think there is a clear answer.

geofft · on June 18, 2015

Rust does not require a runtime / garbage collector and offers a programming model that interacts very precisely with external threading requirements. It supports being called from the native platform ABI without initialization, and it supports writing libraries to match an existing, native ("C") ABI and function as a drop-in replacement. Go does not do these things.

The suggestion to use Rust here is because this is specifically a thing Rust was designed to do, not because it has a huge following. Go is a great language, but it is designed for different things (and has a following of people who want it to do those different things). If those two languages are the two choices for replacing a codec that functions as a library, there is an objectively correct answer.

Rusky · on June 18, 2015

For a codec there is probably a clearer answer than for some other applications- codecs are very performance-sensitive (probably don't want a GC) and also very security-sensitive (you're decoding completely untrusted data). Previously they've always been written in C or C++ because of the performance requirements, and Rust keeps the same performance while adding some more safety checks. Go doesn't.

coldtea · on June 18, 2015

>Sure, Rust has a great following, but so does GO. Which is the "right" choice for a new codec? I don't think there is a clear answer.

Sure there is. Go would be an absolutely horrible choice for writing a video codec -- and wasn't even designed for that kind of work in the first place.

bsder · on June 19, 2015

> Sure, Rust has a great following, but so does GO. Which is the "right" choice for a new codec? I don't think there is a clear answer.

Actually, there is quite a clear right answer for a hardware codec.

You have 3 choices: C, C++, or Rust.

While C or C++ probably is better for the core of the codec, that's not where your security and concurrency issues probably are.

Most of your issues are in unpacking headers, validating packets, getting decode parameters correct, etc. Rust is WAY better for this task than C/C++ from a security or concurrency point of view.

pjmlp · on June 19, 2015

There is also Ada. :)

pjc50 · on June 19, 2015

How easy is it to inter-link C and ADA?

pjmlp · on June 19, 2015

Like this with GNAT

https://gcc.gnu.org/onlinedocs/gnat_ugn/Interfacing-to-C.htm...

chillingeffect · on June 18, 2015

As codec operations move into dedicated hardware blocks on silicon, their code will be more about pointing registers at blocks of data and less about performing the mathematics on CPU.

dkhenry · on June 18, 2015

rust code can incorporate assembly just like C can and low-level rust should be just as fast as low level C so it sounds like rust will be a great replacement to write production-quality codecs

nathan_f77 · on June 19, 2015

I use downvotes pretty sparingly on HN, but I'm afraid you're completely wrong. This is the kind of thing that Rust excels at.

yrro · on June 18, 2015

What about (de)muxers?

jfb · on June 18, 2015

Container formats are much, much, much simpler and demand much less CPU than sample data formats. One could (and I have) write a perfectly useable ISO parser in emacs lisp, of all things. The reason that they tend to be written in C is because a container parser is not super useful outside of the context of handling the sample data; the overwhelming majority of usages of code that understands containers is to be able to pump out pointers to each individual sample. Doing this across process boundaries would be ... unpleasant.