debuggable

 
Contact Us
 
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

RightJS 1.5: 6-8 times faster than jQuery

Posted on 9/12/09 by Felix Geisendörfer

My journey of mastering procrastination has led me to an interesting article on Hacker News today:

RightJS 1.5: 6-8 times faster than jQuery

(The title has been updated to "RightJS Version 1.5.0 Is Out" since I started writing this.)

Wow, I thought! This sounds like an excellent example of cargo cult science, one of my favourite subjects.

I mean I love innovation in this field just like everybody else. But seriously, jQuery is not exactly know for being slow & heavy. So anybody claiming a 6-8x speed improvement must have achieved an unbelievable breakthrough. Either that, or he must be using using the cargo cult method.

Applying the cargo cult method to performance testing is rather simple, which probably explains its popularity. You pick a random series of tests that can be run against the various implementations you want to compete with. Then you spend a few hours hacking away at your implementation until it is the clear winner. Don't give up if it becomes too hard, just tweak the test cases to slightly favor your implementation. It's really as easy as that.

I can totally understand why people are doing that. The opposite would mean that you have to apply the scientific method, which is really cumbersome. First you have to collect data, lots of it. In our case that means performing very detailed analysis and profiling of a large enough set of real world JavaScript applications. Using this data set, you should be able to answer questions like: What are the most common selectors people use? What DOM operations are popular? Which of those are actually relevant to the performance of the analyzed applications? With those answers you can attempt to come up with a series of tests that will rank the various implementations according to their performance. But actually writing those tests will be very hard. Should you use the most distinct and sexy way in each implementation? Or should you use the most effective techniques people have come up with?

Luckily there is a third option. It is called specialized benchmarking. You start by admitting that the things you are going to test are purely based on your curiosity about them, possibly because they are related to the particular problem YOU care about. Make it very clear that the outcome of those tests should in no way be seen as an indicator for overall performance and try to hide them from people who don't know what that means.

Specialized benchmarking will possibly not answer anybodies questions other than your own, but it beats the hell out of the cargo cult method.

Let's examine why the RightJS performance tests get it wrong and what they could do about it. From this point on I will only refer to material on their page, the post on Hacker News was just how I heard about them.

First of all, they claim that their performance benchmarks are there "To give you some ideas about the project quality and abilities". I think that should be changed to: "We are especially fast for the following operations (...), those however are not proofen to be good indicators for general performance in JS projects.". It's kind of like weight loss advertisement. You can show pictures of people who lost 50 pound, but you gotta put that "* Results are not typical" note there. This way people can pause for a second, and remember that there are no magic bullets to weight loss and consider their purchase with that in mind.

After that, they could start to decide whether some of their tests are worth keeping, and if so, make sure that they are as scientific as possible. I'll just use their test #1 as an example, but check the test suite for yourself, to see that this pattern is repeated throughout the entire thing.

Testing jQuery DOM building (343ms*):

"make": function(){
  for(var i = 0; i<250; i++){
    $("<ul id='setid" + i + "' class='fromcode'></ul>")
      .append("<li>one</li>")
      .append("<li>two</li>")
      .append("<li>three</li>")
      .appendTo("body");
  }
  return $("ul.fromcode").length;
}

Testing RightJS DOM building (80ms*):

"make" : function(){
  for (var i = 0; i < 250; i++) {
    document.body.appendChild(
      new Element('ul', {
        'class': 'fromcode', id: 'setid'+i
      }).insert([
        new Element('li', {html: 'one'}),
        new Element('li', {html: 'two'}),
        new Element('li', {html: 'three'})
      ])
    );
  }

  return $$('ul.fromcode').length;
}

I smell cargo! First of all, why is RightJS using a native DOM method, document.body.appendChild, and jQuery has to use .appendTo('body')? Those are two radically different operations, and just to see how radical lets make the following change:

Optimized jQuery DOM building I (194ms*):

"make": function(){
  for(var i = 0; i<250; i++){
    document.body.appendChild(
      $("<ul id='setid" + i + "' class='fromcode'></ul>")
        .append("<li>one</li>")
        .append("<li>two</li>")
        .append("<li>three</li>")[0]
    );
  }
  return $("ul.fromcode").length;
}

Ouch, an error rate of 43% against jQuery. Let's try harder:

Optimized jQuery DOM building II (72ms*):

"make": function(){
  for(var i = 0; i<250; i++){
    document.body.appendChild(
      $(
        "<ul id='setid" + i + "' class='fromcode'>"+
        "<li>one</li>"+
        "<li>two</li>"+
        "<li>three</li>"+
        "</ul>"
      )[0]
    );
  }
  return $("ul.fromcode").length;
}

If this was a presentation I would have an LOLCat saying "jQuery rulez" right now. But luckily this isn't and I'll try to reason scientifically about this.

jQuery is NOT faster in this example. Don't believe the numbers you see. They have been meaningless all along. The reason for that is simple: While initially it looked like we were performing the same test with jQuery as we were with RightJS, we never actually did! The jQuery example, from the beginning, was creating DOM elements from HTML strings, while RightJS was wrapping the document.createElement API. This is not the same thing and you cannot learn anything from comparing apples to oranges.

The truth as far as this test case is concerned? Well, jQuery simply does not have a document.createElement wrapper. Thus you cannot compare it to implementations that do. And why should you? DOM building like this is largely useless, given excellent alternatives such as John' Micro -Templating engine.

Just to show how useless this test was from the beginning, here is my not so paradox implementation that outperforms the pure DOM test:

Testing Pure DOM building (37ms*):

"make": function(){
    for(var
        body = document.body,
        ul = document.createElement("ul"),
        li = document.createElement("li"),
        i = 0,
        fromcode;
        i < 250; ++i
    ){
        fromcode    = ul.cloneNode(true);
        fromcode.id = "setid" + i;
        fromcode.className = "fromcode";
        fromcode.appendChild(li.cloneNode(true)).appendChild(document.createTextNode("one"));
        fromcode.appendChild(li.cloneNode(true)).appendChild(document.createTextNode("two"));
        fromcode.appendChild(li.cloneNode(true)).appendChild(document.createTextNode("three"));
        body.appendChild(fromcode);
    };
    return  utility.getSimple.call(body, "ul.fromcode").length;
}

Optimized jQuery DOM building III (36ms*):

"make": function(){
  var elements = '<div>';
  for(var i = 0; i<250; i++){
    elements = elements+
        "<ul id='setid" + i + "' class='fromcode'>"+
        "<li>one</li>"+
        "<li>two</li>"+
        "<li>three</li>"+
        "</ul>";
  }
  $(elements+'</div>')
    .children()
    .each(function() {
      document.body.appendChild(this);
    });

  return $("ul.fromcode").length;
}

As you can see, the cargo cult method is quite powerful : ).

Anyway, I don't want to discourage the development of RightJS in any way. I think it's awesome that there are libraries that are trying to compete with jQuery.

It is really hard to do meaningful performance testing and infinitely easy for some random punk like me to come along and point out all the flaws. To me, even trying to do a general purpose performance test against 6 (!) implementations, that is pure bravery. So in case you decide to do something similar, just admit the odds you are up against and people will be very forgiving and engaged.

Comments, hate mail & suggestions are welcome!

-- Felix Geisendörfer aka the_undefined

* Results not typical - Some recent version of Firefox on my Laptop, picking random samples from runs that looked good!

 

Parsing form data with node.js

Posted on 24/11/09 by Felix Geisendörfer

Many people asked about form parsing in #node.js after the initial buzz-wave yesterday.

Right now node does not include a parser for regular form data (application/x-www-form-urlencoded). However, you can use the http multipart parser to achieve the same thing.

Here is a bare-bone example for that:

var http = require('http');
var multipart = require('multipart');
var sys = require('sys');

var server = http.createServer(function(req, res) {
  switch (req.uri.path) {
    case '/':
      res.sendHeader(200, {'Content-Type': 'text/html'});
      res.sendBody(
        '<form action="/myaction" method="post" enctype="multipart/form-data">'+
        '<input type="text" name="field1">'+
        '<input type="text" name="field2">'+
        '<input type="submit" value="Submit">'+
        '</form>'
      );
      res.finish();
      break;
    case '/myaction':
      multipart.parse(req).addCallback(function(parts) {
        res.sendHeader(200, {'Content-Type': 'text/plain'});
        res.sendBody(sys.inspect(parts));
        res.finish();
      });
      break;
  }
});
server.listen(8000);

Run this code and point your browser to http://localhost:8000/. You will be presented with a form, and when you submit it, you will see the contents of the POST as JSON. For more information check:

The important part is specifying the enctype of your form as "multipart/form-data".

If you need to parse regular form data, have a look at sixtus www-forms module. Chances are good a module like this, with a similar API to the multipart parser, will make it into the core at some point (patches are welcome).

HTH,
-- Felix Geisendörfer aka the_undefined

 

Simon Willson: Node.js is genuinely exciting

Posted on 23/11/09 by Felix Geisendörfer

Simon Willson of Django fame has just published an awesome introduction to node.js: Node.js is genuinely exciting. Enjoy!

-- Felix Geisendörfer aka the_undefined

 

Git remote hates you

Posted on 17/11/09 by Felix Geisendörfer

No, you didn't do anything wrong. Git sometimes is like your best friend who secretly hates you. Let's say you start a fresh new project:

mkdir new-project
cd new-project
git init
touch README
git add README
git commit -m 'first commit'
git remote add origin git@github.com:felixge/new-project.git
git push origin master

So far so good. But - if like any self respecting geek, you juggle a million git repositories on your machine - you will soon have forgotten whether you started or cloned this particular repository. If you are unlucky that means you will run into this:

% git push
fatal: The current branch master is not tracking anything.

Not helpful. "git pull" seems to try to make up for it by giving you way too much information:

% git pull
You asked me to pull without telling me which branch you
want to merge with, and 'branch.master.merge' in
your configuration file does not tell me either.	Please
specify which branch you want to merge on the command line and
try again (e.g. 'git pull <repository> <refspec>').
See git-pull(1) for details.

If you often merge with the same branch, you may want to
configure the following variables in your configuration
file:

    branch.master.remote = <nickname>
    branch.master.merge = <remote-ref>
    remote.<nickname>.url = <url>
    remote.<nickname>.fetch = <refspec>

See git-config(1) for details.

Ok, so what exactly do I have to do to fix this? Right, you ignore all those blobs in your repository making fun of you, and type:

git pull origin master

Using git and feeling like one has a certain overlap in the beginning. On your way to git enlightenment, or to the madhouse, you may eventually discover the fix:

git config branch.master.merge refs/heads/master
git config branch.master.remote origin

And no, do not, not even for a second, assume you could skip "branch.master.remote". Git remote will be very clear about how much it hates you if you do:

% git pull
You asked me to pull without telling me which branch you
...
# The 2s delay made you suspicious, turn off wifi
% git pull
ssh: Could not resolve hostname github.com: nodename nor servname provided, or not known
fatal: The remote end hung up unexpectedly

What in the name of the kernel? Git clearly knows the remote you are talking about, its merely teasing you, possibly corrupting your repo by turning some blobs into LOLcats. For added frustration, here is my output for "git push":

% git push
Bus error

This is probably unique to my install. In case git does not hate you equally, you can try to complain about it in #git. That of course will only result in people telling you that you are being unreasonable. To a kernel hacker, the idea of git remote making some smart assumptions when adding the first remote to your fresh repository that only has a single branch, that is like talking healthcare reform with a right-wing hardliner.

Disclaimer: I love git, but some parts of it seem to purely back up the name. But relax. I'll save talking about git modules, tracking empty folders and checking out partial trees for another time ...

-- Felix Geisendörfer aka the_undefined

 

FFMPEG multiple thumbnails

Posted on 21/10/09 by Felix Geisendörfer

I'm currently implementing /video/thumbnail functionality for transload.it and did some research on how to implement it.

First I set out to create a single thumbnail which was very easy:

ffmpeg -i intro.mov -vframes 1 -s 320x240 -ss 10 thumb.jpg

This takes a video file (-i intro.mov) and extract a single frame (-vframes 1) with 320x240px (-s 320x240) at an offset of 10 seconds (-ss 10) and saves it as thumb.jpg.

So far so good. But we actually want to offer the ability to take multiple thumbnails (like 8) in 1/8th increments of the video playtime. My initial idea was that there had to be more efficient way then calling up ffmpeg 8 times, and indeed I found one:

ffmpeg -i intro.mov -r 1/10 -s 320x240 thumb_%03d.jpg

This command works the same as the one above, except that it tries to set the frame rate to 1/10 (-r 1/10 = 1 frame every 10 seconds) and saves the results as thumb_000.jpg, thumb_001.jpg, thumb_002.jpg etc. Unfortunately I could not get it to produce the exact results I wanted. I would always end up with the first frame being captured twice, and the frame rate I set would be off by 2-3 seconds.

So I hopped to IRC and asked #ffmpeg for help. Dark_Shikari (one of the crazy people who build the best video codec in the world) was kind enough to help me.

It turns out that the offset parameter (-ss) needs to be set before the input parameter (-i). That will cause ffmpeg to seek to that position in the stream *without* decoding it and in fact skipping anything but key frames! This is pretty significant as performance improved from ~20 seconds for 4 thumbnails to about 2-3 seconds.

So my final setup is pretty much like this. First I find out the video duration by running:

midentify intro.mov

This gives me all kinds of useful information including the length of the video:

ID_AUDIO_ID=0
ID_VIDEO_ID=1
ID_FILENAME=intro.mov
ID_DEMUXER=mov
ID_VIDEO_FORMAT=avc1
ID_VIDEO_BITRATE=0
ID_VIDEO_WIDTH=630
ID_VIDEO_HEIGHT=360
ID_VIDEO_FPS=25.000
ID_VIDEO_ASPECT=0.0000
ID_AUDIO_FORMAT=sowt
ID_AUDIO_BITRATE=0
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
ID_LENGTH=81.32
ID_VIDEO_CODEC=ffh264
ID_AUDIO_BITRATE=1536000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
ID_AUDIO_CODEC=pcm

I then divide the length by the amount of thumbnails I need, and run a loop like this:

ffmpeg -ss $i*$interval -i intro.mov -vframes 1 -s 320x240 thumb_$i.jpg

Where $i is the number of the thumb I'm extracting and $interval is the duration of the video divided by the amount of thumbs.

Works like a charm!

-- Felix Geisendörfer aka the_undefined

 
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9