Week 15: Performance Work continued

Last week, a lot of optimization work has been done. While the week before was mostly about making NQP faster, this week (or should I call it “the last week”?) shifted the focus on optimizing pieces of Rakudo instead.

  • The spesh branch of MoarVM, which contains the bytecode specializer, has been merged into the master branch.
  • a long-standing inconsistency concerning how protos have been skipped or not depending on whether or not they have been compile-time-inlined has been fixed
  • on MoarVM, this previous bit also enables an optimization to make multiple dispatch cheaper. This will come to the JVM at some point, too.
  • Hash assignment used to use a very indirect way of getting values into the actual hash, which caused lots of unnecessary work in the most general case. There is now a method “assign_key”, that works much faster.
  • Array assignment got a similar improvement involving an “assign_pos” method.
  • Using some subs on lists, like push and unshift, used to come with a lot of extra overhead, as it created a slurpy list even if you only passed a single parameter. This rather common case is now much cheaper.
  • Rakudo’s optimizer now removes some instances of $*DISPATCHER, $_, $/, and $! if they are not used.
  • … and many more little improvements all over the place

I’ve actually posted a benchmark run with perl5, nqp-moarvm and the 2014.03 releases of rakudo-parrot, rakudo-moar and rakudo-jvm to compare them against the master branch at that time. You can find it here. It’s a bit of a mess, so I’ll point out a few things:

  • The graphs have log-log scales. One step to the right means twice the amount of work, one step up means half the time taken.
  • When hovering over a data point, it will display “$foo x times slower than fastest”. The “fastest” it refers to is the highest data point anywhere in the graph. Thus, if the graph has a little spike on the far left, or the amount of work doesn’t scale linearly with the “scaling factor” (like in the 2d visit tests, or the man-or-boy tests), you can’t rely on it. You’ll have to count grid lines instead (or … you know … submit a pull request to  the perl5 script that generates the benchmark plots)
  • Clicking on a name in the legend will toggle visibility of the corresponding line. With so many lines, you can easily lose track of what’s going on.

There’s of course not only performance improvements to be found:

  • The implementation and semantics of “winner” have been refined further by lizmat
  • Mouq has fixed both the samespace method for Str and the ss/// regex form
  • lizmat has started implementing the “is cached” routine trait, that would automatically cache return values based on all parameters.
  • SetHash, BagHash and MixHash now have minpairs and maxpairs methods that give you the value with the most or fewest occurences together with the number of occurences.
  • Using threads on MoarVM used to cause a messy shutdown, as one thread was going ahead to destroy the VM object and other threads still tried to access it. Now it properly shuts down and just lets the OS free all the resources.
  • This has been the case for a somewhat longer time, but I forgot to mention it: linenoise is now in use on MoarVM again, so the REPL actually has some nice line editing.

In the spectest repository, the following things have happened:

  • dwarring has added more and more tests based on the advent calendar blog.
  • moritz has fudged more failing tests and created RT tickets to go with them.
  • Mouq has added a couple of tests for RT tickets.

And here’s a thing I forgot to mention in last week’s post:

  • retupmoca created a module named “LibraryMake”, that greatly eases shipping native C code with your Perl 6 module. It comes with an example Makefile.in that you can adapt to your needs and it will fill in all the necessary flags and values to build a compatible library. After that, it helps you find exactly where the library was installed.

For more information about what’s been happening with MoarVM recently, you can also check out jnthn’s blog post about the topic.

As you can see in his post, jnthn will be ensuring we’ll finally get a multi-backend Rakudo * release this month. Color me excited!

Something for you to try

Currently, Rakudo sorts lists by generating a list of indices, sorting this index list based on the original values and then using list subscript splicing to re-order the objects into the correct order. This was once needed so that we could use Parrot’s built-in sort function, but on MoarVM and JVM this is quite a lot of wasted effort. If you (yes you!) would like to contribute a little something to Rakudo, find us on the IRC channel and ask for directions.

2014.14: Speshial things

Hey there,

if you belong to the many people who are annoyed by the somewhat mediocre performance characteristics of rakudo, last week’s changes may get you quite excited :)

Let’s dig into the performance changes first:

  • Rakudo used to box a fresh instance of 1 for each ++ or — you put in your code. Now it just uses the ones found in the constant pool.
  • In the new and shiny “spesh” branch of MoarVM, jnthn has published his work for creating a Control Flow Graph and Single Static Assignment representation of MoarVM bytecode that can then be optimized based on a “facts database”. In particular, the following optimizations for this “bytecode specializer” have been created so far:
    • method calls on known types will – if possible – get resolved at “specialize-time”. if the type is not known, we add a little cache for the cases where the type turns out to be the same many times in a row.
    • the hllize operation (which is used to ensure things like boxed strings or integers, or arrays and such have been transferred into the right High Level Language, for example from NQP to Perl 6) gets turned into a set operation instead if the type is known and it’s already in the right HLL.
    • the decont operation can likewise turn into a set operation instead if the objects that come in are known to already be decontainered.
    • the operations to get arguments that were passed to the sub or method usually check for the proper amount of arguments passed before hand, but at the time we specialize, we already know exactly how many arguments got passed, so these operations all get replaced with quicker, specialized operations.
    • likewise, the operations to get optional parameters are all conditional jumps in order to set default values if values were not passed; these jumps are now turned unconditional at specialize-time and the code that turns unreachable gets removed from the specialized bytecode
    • operations that belong to the “if” or “unless” category that operate on known values will now be turned into unconditional jumps (or removed completely) at specialize-time.
    • if the type of something is known at specialize-time, the “istype” operation will turn into a “load a constant 0 or 1 into the register” operation, if possible.
    • a bunch of operations that usually would have to go indirectly through the REPR of the given type can now be inlined (when the type is known) and generate code that is faster and has less indirection. Currently, creating objects, getting and setting attributes, boxing and unboxing values, and finding out the number of elements an object has can be optimized by each REPR, though it has to be implemented for each REPR + operation individually. In particular, no boxing/unboxing or getattr speshes are implemented yet, but they likely take only a few minutes each to write
  • There are a few things still missing, for example information gathered from turning operations into “constants” (like istype) are not yet aggressively propagated and the specializer will currently specialize anything that gets called more than 10 times, so it will waste a lot of time on things that are not actually “hot”, but it does help find out early if the specializer does anything wrong.
  • The spesh branch will likely be merged this week in order for it to show up in the next MoarVM release. It currently regresses no spectests in Rakudo’s test suite, so that’s a good sign already!
  • After I’ve tried – unsuccessfully – for a long time to get this particular optimization off the ground, jnthn implemented a much cleaner approach at it in an evening. The optimization in question is turning lexical variables into locals if they are not used in nested blocks and then turning nested blocks into in-lined blocks if they don’t define any lexical variables in them. These two optimizations harmonize perfectly and since the specializer doesn’t know how to operate on lexicals yet, it’s worth twice as much. Sadly, this optimization is only possible in NQP so far, as the analysis and care needed to make the same thing work in Perl 6 are much more complicated. On the other hand, every piece of compilation you do now is a bit faster.
  • The work done in the spesh branch will serve as the basis for the JIT project that has been proposed for the GSoC.

And here’s the non-performance related changes:

  • FROGGS introduced a variable $*EXECUTABLE which is an IO::Path object that points to the executable used in the given process
  • FROGGS also worked on a tool to build actual executables that carry MoarVM bytecode and commandline parameters inside them. These can then be statically linked to libmoar and be completely stand-alone.
  • lizmat worked on “winner” quite a lot and through her work found out that the construct is in desperate need of a re-think and re-spec.
  • dwarring has constantly been creating test cases for the advent calendar posts of the past years
  • retupmoca has merged patches to rakudo and panda to support handling multiple state files, for example if you install some modules system-wide and other modules in your home directory.
  • vendethiel helped me flesh out the tests for the “orelse” operator. Then we found out that I have had a wrong understanding of how orelse is supposed to handle exceptions. Oh well, lesson learned!
  • last week I pointed out that the I/O subsystem of MoarVM was lacking locks. This has been addressed and the I/O stuff should now be in very good shape for concurrency.

2014.13: Beginnings, In-Betweens and presentations

A few nice things have been merged last week, but many things are still kind of in-between. Let’s see what we’ve got …

  • arnsholt has merged the “jastcompiler” branch for our JVM backend. We used to have a parrot-backed NQP generate a text representation of the bytecode that needed to end up on a JVM-backed process when we were still bootstrapping the JVM backend. That was necessary, because we were using the ASM java library to generate class files. Nowadays, we have an NQP or even Rakudo running directly on the JVM, which means we don’t have to pass serialized (textified, really) data between two processes, so turning the bytecode into text and then parsing that text directly afterwards is no longer necessary. This branch does just that. It should give a noticable improvement on compile times, but I haven’t seen anyone post any precise numbers.
  • jnthn felt it was the right time to merge the moar-conc branches into NQP and Rakudo. While they are not yet stable enough for production use, it is now easier for everyone to jump in and hunt bugs. Yes, this means you can now use Promises, Channels, Supplies, low-level Threads, … on MoarVM! However, the I/O subsystem has almost no locks in it, so doing I/O from multiple threads simultaneously is likely going to produce “interesting” results. The implementation of stdout has already got a bit of locking applied to it, so it shouldn’t break spectacularly to “say” multiple times in parallel, but other things still need looking at.
  • The German Perl Workshop happened in Hannover and there were a few presentations on Perl 6. I don’t think videos have been uploaded anywhere yet, though. Additionally, Damian Conway held a presentation on Perl 6. On top of that, lizmat and wendy are giving talks in Cluj.PM. And Larry Wall is currently touring China. I’m not sure if he’s doing talks on Perl 6; he probably is.
  • Newcomer simula67 has been improving the test suite with regards to sockets. I haven’t done much with Sockets in Rakudo, so I can’t judge the current state. But improvements in test coverage and robustness are always very nice to have!
  • Moritz has turned a lot of failing tests into Todo’d tests and corresponding tickets in RT. This way we’ll get a state of the test suite where all known-to-work tests will pass and every improvement or regression will be very easy to spot.

On a less positive note:

  • Rakudo * for the three backends is not likely going to happen this month; It would require point releases of NQP and Rakudo and we’re kind of not too fond of that idea. This month will see a regular Rakudo * with the parrot backend and next month we’ll try again – likely with much greater success.
  • I spent all my time before the game development competition deadline trying to get PNG images loaded properly just to realize in the final moments that I’ve been linking SDL_image rather than SDL2_image. D’oh!

And some things that are coming up:

  • jnthn said, he’ll have something showable for MoarVM’s type specializing subsystem this week.
  • CPAN, PAUSE and Panda are moving towards a compatible position. There’s some exciting stuff coming up in this area!
  • At the time of writing this post, arnsholt is working to refactor the way operators are defined in the grammar of NQP and Rakudo. It may give a slight improvement to compile times, but it’s mostly crufty legacy code being replaced by much saner code – and less code overall!

I wish you a pleasant week! You can play with MoarVM’s very latest development code and let us know what oddities you encounter! :)

2014.12: Release Week, GSoC, and Game Development, oh my!

Last week, there has been a Rakudo release (the Rakudo * releases happen later in the month), which gives you most of the changes I’ve talked about in the recent weeklies. Sadly, there was a bug in the way GC threads pass work to each other in the MoarVM Concurrency branch. Therefore, that branch hasn’t been merged for the release.

We didn’t bump the minimum parrot revision to the 6.2.0 release yet, which would have given us some optimizations (I’ve yet to run the benchmarks), but next month’s parrot release will likely bring handling environment variables back to the qx quoting construct, so I’ll bug the release manager to bump parrot in any case :)

arnsholt has been working on the JAST compiler – an intermediate stage of compiling NPQ or Rakudo QAST to JVM .class files. So far, it has been creating a string representation of a tree, which was then parsed by some java code. With the new branch, the java code will directly get passed proper data structures, thus eliminating redundant parsing.

brrt has pushed his GSoC application out to hopefully be accepted to work on a JIT compiler for MoarVM. jnthn will be his mentor and until the GSoC starts, he’ll have made public his work towards control flow graph generation and bytecode specialization. I’m really looking forward to that!

dwarring has added a few spectests for the p6advent 2013. It’s always good to immediately see when a change to the implementation or spec causes these texts to become outdated. Though I don’t know yet if we have yet decided how to exactly handle outdated posts from years ago.

Spectest-wise, Rakudo-moar is still at the 100% spot with Rakudo-jvm at 99.27% and Rakudo-parrot is now down to 99.00% relative to Rakudo-moar. I imagine that is the good Unicode DB support we have in MoarVM. When the concurrency branch of MoarVM is merged to master, that lead will increase even more. Oh my!

I kept the best thing for last: tadzik has been working on a tiny game development framework on top of SDL2 and has made an example game with that. On top of that, he’s called out a game development contest that’ll go until sunday. Find all the details on his blog.

I wish you all a pleasant week! :)

Week 11 of 2014

This week has seen changes spread across many different pieces of Perl 6. Let’s have a  look!

  • rurban has gotten the -O2 pbc optimization enabled for parrot and tomorrow’s parrot release will feature that. He measured some 3-5% improvement, but hasn’t benchmarked Rakudo thus far. If nobody beats me to it, I may run some benchmarks this week.
  • rurban has also set up a bunch of buildbots, among others for parrot, NQP and Rakudo.
  • I’ve missed it last week, but lizmat has been working diligently and effectively on both S11 (modules) and S22 (package format) during these last two weeks!
  • At the same time, FROGGS has been moving the integration of PAUSE (the perl programming author’s upload server) with Perl 6 and it sounded like there’ll be a big announcement this or next week regarding that whole complex :)
  • Another thing lizmat did (the week before this one) was improving error messages for when you typo a routine or try to call one that doesn’t exist as well as helpful messages about trying to use .length and .bytes methods on stringy things being banned from Perl 6 (because length and bytes are very ambiguous when applied to a string with regards to Unicode and combining characters and stuff)
  • jnthn has advanced NativeCall support on MoarVM to the point where it passes all tests! These changes are already in the master branch of MoarVM, NQP, Rakudo and zavolaj (aka NativeCall)
  • jnthn has started the “tristar” branch in the rakudo/star repository to hopefully get a three-backend Rakudo Star distribution this month. Mouq and me helped him a bit. Neither of us three are really good at doing build system stuff, though …
  • just as I was writing this, rurban started fixing a nasty regression that made changes to %*ENV not propagate to qx (execute shell commands and return the output as a string) on Rakudo for Parrot.

And here’s a few things people have in the pipe for the near future:

  • as mentioned above, FROGGS is still working on ecosystem/cpan/PAUSE/versioning/installation things
  • lizmat is likely going to continue work on the S11 and S22 synopses
  • I’ve found out that during the CORE.setting compilation, a few strings occur in MoarVM’s generation two more than a few thousand times each. By introducing string interning in the right place, this might bring memory usage down by a significant amount, but my first attempt at string interning just caused mysterious compilation failures.
  • there’s quite a bit of momentum towards having tristar finished for this month’s release, so I’m very much looking forward to that
  • MoarVM concurrency support is probably going to land after this month’s release, so that it can be tested and debugged sufficiently before it gets unleashed on the world.
  • rurban is going to work on a JIT compiler for parrot at some point in the near future
  • Mouq has begun a pure-perl GIF decoder as well as a library to read tar files
  • hoelzro is likely going to have more time for Perl 6 in the future, so I’m hoping for a complete highlighter for Kate (and thus QtCreator) in a relatively short amount of time :)
    • sadly, there’s a billion incompatible syntax highlighting definition schemes … it would be fantastic if we could generate many highlighters from a common format, but I fear that’ll remain a pipe dream.

Not quite bad progress, considering a big portion of the Perl 6 developers have been slowed down by 2048!

Well, I probably forgot to mention something cool again. I’ll probably put it into next week’s post :)

Changes for week 10 of 2014

Let’s make this short and sweet:

  • The somewhat long-standing precompilation related bug in MoarVM has been fixed. This brings us quite a bit closer to Rakudo * on MoarVM.
  • A few memory leaks inside MoarVM have been fixed. MoarVM doesn’t rely exclusively on its own garbage-collected heap. It also uses malloc and free for things like C strings and it also has reference counting semantics for frames. These were a little bit buggy. During the core setting compilation, about 100 megabytes of ram are saved by this.
  • MoarVM got some improvements to its unicode database and now offers operations to query it directly. In combination with a few spec changes to the Unicode Synopsis and the corresponding tests, MoarVM is now the leading implementation in number of passing spectests.
  • jnthn has just implemented nqp::queuepoll on MoarVM and except for a bit of design work left for the scheduler, there’s not much keeping the moar-conc branches – containing concurrency support for MoarVM – from being merged.
  • Mouq has continued pushing more fixes and improvements to Pod6 parsing and rendering to rakudo and Pod::To::HTML.
  • Coke has recently started porting Mojolicious to Perl 6. So far it’s only the Mojo::Util package, but it’s surely something to keep an eye on for perl5 users.

What’s cooking?

pmurias has resurfaced and is currently preparing nqp-js for a merge into the main NQP repository.

rurban has started working on parrot performance. Just today he got NQP to compile on a parrot that uses -O2 (not gcc’s -O flag. This one is about the parrot imcc). He’ll probably be improving parrot’s performance for Rakudo and NQP in the near future. He has also said before that parrot’s concurrency model is vastly superior to MoarVM’s. The MoarVM developers don’t agree on that, but I’d love to be surprised.

jnthn said he’ll focus on unblocking Rakudo * on MoarVM and the JVM after the next bits of concurrency work, so that we will hopefully have a triple-backend Rakudo * release this month. This includes NativeCall for MoarVM, which will thankfully be a lot simpler than it was for the JVM. MoarVM and parrot rely on the same library to do this, so there may be a lot of copypesto to be had.

I finally got the tip I needed for how to properly implement cascading block inlining for the NQP optimizer, so I’ll try to get that up and running this week. It ought to improve performance and memory usage on all of our backends.

Low hanging fruit

If you’d like to help out, here’s a little nugget to get you started:

Our evalbot on the IRC server currently outputs the versions of the backends used whenever someone asks it to eval some code. Since we’ve got three backends at the moment, these lines can become quite long:

   moritz | r: 1
 +camelia | rakudo-parrot 1aeb7c, rakudo-jvm 1aeb7c, rakudo-moar 1aeb7c: ( no output )

It would be nice to have it output

rakudo-{parrot,jvm,moar} 1aeb7c:

instead. Camelia is written in perl5 and the source code can be found on github. I’ve started a naive implementation of this in Perl 6, but only perl5 code will be accepted into the evalbot at this point. I’m sure there’s an algorithmically nicer implementation that can be done, though.

Something else that could be done soon-ish even if you don’t have a lot of experience yet is implementing the “locallifetime” op on MoarVM to allow temporary locals to be released by the register allocator earlier.

Something that’s always appreciated is helping out with ecosystem modules. You can try one or two out, report and perhaps golf bugs, weigh in on design decisions, … Or you can port your favourite perl5 (or even python/ruby/…) module over.

If any of those tasks seem interesting to you, feel free to visit us on the IRC. We are on the freenode network in #perl6. I hope you’ll have a wonderful week, my esteemed readers :)

Changes during Week 9 of 2014

Last week didn’t pack quite as many individual changes, but there are a couple really nice bugfixes in there. Let’s dig in!

  • Fewer methods on List will force eager evaluation of said list (some of them used List.end indirectly, which forced eagerness erroneously)
  • The combinations method of List will now also contain the empty List in the full Powerset.
  • The X and Z metaops (“Cross” – the cartesian product – and “Zip” respectively) will now correctly refrain from flattening itemized things, such as Arrays.
  • The reduction metaoperator now recognizes operators with list infix precedence like the X and Z metaops. This makes things like [Z+] work as expected on lists of lists.
  • Unary hyperops will now only distribute among the outermost level of nested list structures. Additionally, there are now “deepmap” and “duckmap” methods in addition to “flatmap”. “duckmap” first tries to apply the operator directly to any object in the list. If that fails, it’ll fall back to descending into substructures.
  • A whole bunch of spectest failures on Rakudo MoarVM that related to unicode properties have been fixed and we’ve now passed Rakudo Parrot in number of passing test cases. Rakudo Parrot is now at approximately 99.8%, Rakudo Moar is at 99.83% and Rakudo JVM is still the baseline of 100% against which we compare. As soon as the moar-conc branches get merged into their respective master branches, we expect the number of spectest passes to go up yet another bit.
  • On the topic of Rakudo MoarVM: the “moar-support” branch of panda has landed and you can now install modules for Rakudo MoarVM using panda. Unfortunately there is at least one known pre-compilation bug that prevents URI among others to be installed. You can install panda for MoarVM by running the “bootstrap.pl” or “rebootstrap.pl” scripts with perl6-m.
  • On the topic of module installation, FROGGS has been working hard on pushing the “eleven” branches forwards. Those aim to implement the much improved module installation, versioning and loading handling as specified in Synopsis S11.
  • I’ve built a fast path into the “string flattening” routine of MoarVM, that causes string concatenation, joining and repetition to become quite a bit faster. Ideally, the flattening will be gone completely soon, but the speed boost is nice to have until then. These operations still suffer from very bad asymptotical behavior.
  • Mouq and lue have been continuing their work on Pod6 and documentation.
  • Mouq also added shorthands @<foo> for @($<foo>) and %<foo> for %($<foo>) and @1 for @($1) as well as %1 for %($1). These are now useful, since $<foo> will – just like the sigil suggests – be itemized.
  • raydiak and smls have been spending time prettying up the style of the rakudo documentation website some more.

An interesting thing to point out is that the first 6 items on this list have been done by TimToady, who has left his self-imposed exile from the implementor’s side of Perl 6. Jnthn didn’t have much time during the week, so I have no big achievements to report for the multithreading support on MoarVM.