JSON value extraction benchmark

dbohdan 2017-02-03: This benchmark compares the speed with which various JSON libraries for Tcl can extract a deeply nested string value from a large JSON blob (circa 18 MiB). The comparison may help you choose one out of a growing number of similar JSON-parsing libraries for Tcl, but, as any microbenchmark, it has a limited scope. Libraries may be faster or slower under different circumstances.

How to run the benchmark

Install Git, jq, Tcl 8.6, Tcllib and (optionally) wiki-reaper. Install the Tcl extensions rl_json, SQLite with JSON1, tcl-duktape and yajl-tcl. Then run the POSIX shell commands below.

mkdir jsonbench
cd jsonbench
git clone https://github.com/dbohdan/jimhttp
curl -sf -o AllSets.json.zip https://mtgjson.com/json/AllSets.json.zip
# or wget https://mtgjson.com/json/AllSets.json.zip
unzip AllSets.json.zip
rm AllSets.json.zip
# Instead of running the next command you can manually copy the code from the
# "Code" section of this wiki page and save it as jsonbench.tcl.
wiki-reaper 48500 3 | tee jsonbench.tcl
tclsh jsonbench.tcl

Sample results

These results are from running the benchmark on WSL (Ubuntu 16.04) on a Phenom II X4 955 CPU with no CPU-intensive tasks running in the background. The benchmark used version version 3.8.1 (Jan 23, 2017) of the MTG JSON data set. The jq version was 1.5.

5 iterations

Package versions:
Tcl           --  8.6.5
Tcllib json   --  1.3.3
duktape       --  0.3.0
jimhttp json  --  2.1.0
rl_json       --  0.9.7
sqlite3       --  3.18.0
yajltcl       --  1.6.2

Running the benchmark with 5 iterations for each library
jq:              1062 ms
tcl-duktape:     1670 ms
jimhttp JSON:   37941 ms
rl_json:           99 ms
Tcllib JSON:    25261 ms
SQLite JSON1:      58 ms
yajl-tcl:         284 ms

20 iterations

Package versions:
Tcl           --  8.6.5
Tcllib json   --  1.3.3
duktape       --  0.3.0
jimhttp json  --  2.1.0
rl_json       --  0.9.7
sqlite3       --  3.18.0
yajltcl       --  1.6.2

Running the benchmark with 20 iterations for each library
jq:              1051 ms
tcl-duktape:     1634 ms
jimhttp JSON:   38772 ms
rl_json:           28 ms
Tcllib JSON:    26073 ms
SQLite JSON1:      57 ms
yajl-tcl:         284 ms

Code

#! /usr/bin/env tclsh
# version 0.4.0
package require fileutil
puts {Package versions:}
puts "Tcl           --  [package require Tcl]"
puts "Tcllib json   --  [package require json]"
puts "duktape       --  [package require duktape]"
package require duktape::oo
source jimhttp/json.tcl
puts "jimhttp json  --  $::json::version"
puts "rl_json       --  [package require rl_json]"
puts "sqlite3       --  [package require sqlite3]"
puts "yajltcl       --  [package require yajltcl]"
puts {}

proc ms timeResult {
    return [expr {round([lindex $timeResult 0] / 1000.0)}]
}

proc benchmark {command data times result} {
    return [ms [time {
        set actualResult [$command $data]
        if {$actualResult ne $result} {error "bad result: \"$actualResult\""}
    } $times]]
}

proc jq data {
    return [exec jq -r {.INV.cards[68].flavor} << $data]
}

proc duktape data {
    set j [::duktape::oo::JSON new $::duk $data]
    set result [$j get INV cards 68 flavor]
    $j destroy
    return $result
}

proc jimhttp-json data {
    return [dict get [::json::parse $data] INV cards 68 flavor]
}

proc rl_json data {
    return [::rl_json::json get $data INV cards 68 flavor]
}

proc tcllib-json data {
    set parsed [::json::json2dict $data]
    return [dict get [lindex [dict get $parsed INV cards] 68] flavor]
}

proc sqlite3-json1 data {
    return [lindex [::sq3 eval {
        select json_extract($data, '$.INV.cards[68].flavor')
    }] 0]
}

proc yajltcl data {
    set parsed [::yajl::json2dict $data]
    return [dict get [lindex [dict get $parsed INV cards] 68] flavor]
}

proc report {displayName n} {
    puts [format {%-13s  %6u ms} ${displayName}:  $n]
}

proc main {} {
    set times 20
    set value {Children claim no two feathers are exactly the same color,\
            then eagerly gather them for proof.}

    set sets [::fileutil::cat AllSets.json]

    puts "Running the benchmark with $times iterations for each library"
    report jq             [benchmark jq $sets $times $value]
    set ::duk [::duktape::oo::Duktape new]
    report tcl-duktape    [benchmark duktape $sets $times $value]
    $::duk destroy
    report {jimhttp JSON} [benchmark jimhttp-json $sets $times $value]
    report rl_json        [benchmark rl_json $sets $times $value]
    report {Tcllib JSON}  [benchmark tcllib-json $sets $times $value]
    sqlite3 ::sq3 :memory:
    ::sq3 enable_load_extension 1
    report {SQLite JSON1} [benchmark sqlite3-json1 $sets $times $value]
    ::sq3 close
    report yajl-tcl       [benchmark yajltcl $sets $times $value]
}

main

Discussion

ak - 2017-05-01 23:44:31

Note, while basic Tcllib is pure Tcl you can use critcl and make tcllibc to generate a C-based accelerator for various parts of Tcllib, including the json package. It might be interesting to see how much this accelerator helps the json extractor.


dbohdan 2018-03-31: It is worth noting that the jimhttp JSON parser is a lot slower in Tcl 8.6 than it is in Jim Tcl. The following benchmark shows the difference. To run the benchmark script I used a Tcl 8.6.8 Tclkit built with KitCreator and a Jim Tcl v0.77 binary built locally with GCC 5.4.0 with the default CFLAGS given by its configure script: -g -O2 -fno-unwind-tables -fno-asynchronous-unwind-tables.

$ du -h AllSets.json  # v3.14 Mar 7, 2018 -- much larger than v3.8.1 above
29M AllSets.json

$ cat bench.tcl
puts "$::tcl_platform(engine) [info patchlevel]"
set ch [open $argv]; set json [read $ch]; close $ch
source json.tcl
set i 0
puts [time {::json::parse $json; puts [incr i]} 5]

$ for bin in /tmp/benchkits/tclkit /tmp/benchkits/jimtcl/jimsh; \
do $bin bench.tcl AllSets.json; done
Tcl 8.6.8
1
2
3
4
5
148498381.0 microseconds per iteration
Jim 0.77
1
2
3
4
5
25020679 microseconds per iteration

That's a 6x difference. The same does not happen with Tcllib JSON hacked up to work in Jim Tcl.

$ diff -r /usr/share/tcltk/tcllib1.17/json/json.tcl /tmp/tcllib-json/json.tcl
9d8
< package require Tcl 8.4
32c31
<       if {![package vsatisfies [package provide Tcl] 8.4]} {return 0}
---
>       return 0
diff -r /usr/share/tcltk/tcllib1.17/json/json_tcl.tcl /tmp/tcllib-json/json_tcl.tcl
12,14d11
< if {![package vsatisfies [package provide Tcl] 8.5]} {
<     package require dict
< }

$ cat bench2.tcl
puts "$::tcl_platform(engine) [info patchlevel]"
set ch [open $argv]; set json [read $ch]; close $ch
source /tmp/tcllib-json/json.tcl
set i 0
puts [time {::json::json2dict $json; puts [incr i]} 5]

$ for bin in /tmp/benchkits/tclkit /tmp/benchkits/jimtcl/jimsh; \
do $bin bench2.tcl AllSets.json; done
Tcl 8.6.8
1
2
3
4
5
27883525.0 microseconds per iteration
Jim 0.77
1
2
3
4
5
29755740 microseconds per iteration

dbohdan 2019-10-28: The comparison between jimhttp's json.tcl in Jim Tcl and Tcl 8.6 above is outdated. UTF-8 builds of Jim Tcl's master branch perform more than an order of magnitude slower on the benchmark than Tcl 8.6. In return they have gained full Unicode support in regexp and hence the important ability to decode JSON with UTF-8 strings that aren't escaped. For faster JSON parsing in UTF-8 Jim Tcl you can use SQLite with JSON1, jq, or the jmsn binary extension. Non-UTF-8 builds retain the old performance profile.

dbohdan 2019-12-08: The --full configuration of Jim Tcl 0.79 comes with a native json::decode command.