Tumdum's homepage https://tilde.cat/feed.atom Tumdum's homepage http://www.rssboard.org/rss-specification python-feedgen en Sat, 18 Apr 2026 09:48:49 +0000 Cost of includes <p>For each <code>src/.cc</code> file there is detailed log in <code>logs/.cc.txt</code> with:</p> <ul> <li>strace summary of syscalls times and number of executions </li> <li>wall clock time it took to compile </li> <li>number of header locations which whare tried </li> <li>number of opened (not unique) headers </li> <li>number of headers according to gcc -M</li> </ul> <p>Compilation was done with <code>-Os</code> using <code>g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2</code></p> <pre><code> Summary: src/all.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 69.13 0.280109 7 39941 6178 lstat 17.69 0.071678 9 7798 6200 open 4.00 0.016206 10 1650 read 2.81 0.011378 7 1598 close real 4.93 tried to open 7700 headers opened 1555 headers depends on 968 headers src/bind.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 61.35 0.024457 7 3545 517 lstat 16.09 0.006413 9 696 539 open 4.96 0.001977 13 156 read 2.91 0.001160 7 157 close real 0.31 tried to open 617 headers opened 118 headers depends on 108 headers src/empty.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 14.51 0.000942 10 90 mmap 12.88 0.000836 13 66 29 open 12.14 0.000788 22 36 read 10.09 0.000655 11 60 29 stat real 0.05 tried to open 8 headers opened 1 headers depends on 1 headers src/map.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 32.02 0.004875 11 460 42 lstat 13.78 0.002098 15 136 64 open 10.20 0.001553 22 71 read 8.43 0.001283 13 96 mmap real 0.13 tried to open 78 headers opened 36 headers depends on 36 headers src/mpl_apply_and_vector.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 70.61 0.079509 7 11277 1736 lstat 17.13 0.019287 9 2093 1758 open 2.93 0.003298 10 334 read 2.04 0.002296 7 335 close real 0.69 tried to open 2014 headers opened 296 headers depends on 256 headers src/optional.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 64.42 0.071699 7 10565 1576 lstat 16.86 0.018767 9 2060 1598 open 4.76 0.005293 5 997 mmap 4.72 0.005257 11 462 read real 1.06 tried to open 1974 headers opened 422 headers depends on 299 headers src/shared_ptr.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 59.20 0.029393 7 4198 576 lstat 16.67 0.008275 10 833 598 open 6.72 0.003337 14 235 read 3.62 0.001797 8 235 close real 0.48 tried to open 747 headers opened 195 headers depends on 162 headers src/sstream.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 47.04 0.011771 7 1710 202 lstat 15.54 0.003888 11 368 224 open 9.16 0.002293 16 143 read 5.39 0.001349 13 103 mmap real 0.29 tried to open 303 headers opened 107 headers depends on 94 headers src/string.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 44.86 0.009815 6 1540 193 lstat 15.52 0.003396 10 336 215 open 9.18 0.002008 17 120 read 5.79 0.001266 13 100 mmap real 0.20 tried to open 271 headers opened 84 headers depends on 73 headers src/thread.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 67.58 0.208776 7 30597 4689 lstat 17.40 0.053756 9 6037 4711 open 4.39 0.013553 10 1378 read 2.93 0.009059 7 1326 close real 4.23 tried to open 5939 headers opened 1283 headers depends on 782 headers src/variant.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 69.27 0.148890 7 21371 3300 lstat 17.52 0.037660 9 4111 3322 open 3.49 0.007495 9 789 read 2.93 0.006300 6 1007 mmap real 1.80 tried to open 4025 headers opened 749 headers depends on 514 headers src/vector.cc % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 24.69 0.003097 7 466 42 lstat 14.58 0.001829 13 137 64 open 11.10 0.001392 19 72 read 9.65 0.001210 12 97 mmap real 0.13 tried to open 79 headers opened 37 headers depends on 37 headers </code></pre> <p><a href="https://tilde.cat/static/compilations.tar.xz">Details</a></p> https://tilde.cat/posts/cost_of-includes Thu, 02 Oct 2014 00:00:00 +0000 Cat in different unices <p>As per sloccount:</p> <pre><code>gnu_cat.c Total Physical Source Lines of Code (SLOC) = 486 Total Estimated Cost to Develop = $ 12,665 freebsd_cat.c Total Physical Source Lines of Code (SLOC) = 270 Total Estimated Cost to Develop = $ 6,832 netbsd_cat.c Total Physical Source Lines of Code (SLOC) = 260 Total Estimated Cost to Develop = $ 6,567 apple_cat.c Total Physical Source Lines of Code (SLOC) = 247 Total Estimated Cost to Develop = $ 6,223 openbsd_cat.c Total Physical Source Lines of Code (SLOC) = 190 Total Estimated Cost to Develop = $ 4,724 plan9_cat.c Total Physical Source Lines of Code (SLOC) = 31 Total Estimated Cost to Develop = $ 704 </code></pre> <p><a href="https://tilde.cat/static/cat.tar.xz">sources</a></p> https://tilde.cat/posts/cat_in_different_unices Thu, 09 Oct 2014 00:00:00 +0000 λ vs ≫= code size <p>Comparison of 3 different ways for binding functions with parameters in c++. What is of interest here is vast difference (&gt;2x) between lambdas and binds in default (rtti + exceptions) case, and even bigger when both rtti and exceptions are disabled (~3.8x).</p> <p>See <a href="https://tilde.cat/static/lambda_vs_bind.tar.xz">listing</a> for files needed to reproduce this locally. Calling generate.sh will produce every file seen in <code>stats_*</code> directories. All results have been generated using <code>g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2</code>.</p> <p>Things of note:</p> <ul> <li>boost based solution increases in size when compiled without rtti. This is because boost rolls its own solution for rtti.</li> <li>but decreases (slightly) when compiled also without exceptions</li> <li>best case scenario for boost is worse than any worst case scenario for either of other solutions</li> <li>gcc std solution responds very nicely to disabling rtti, and reduces some more when exceptions are removed</li> <li>best case for lambda is more than 4 times smaller than worst case for boost and always better significantly than gcc std</li> </ul> <p>For detailed statistics please see <code>stats_gcc</code> directory in listing archive from beggining of the post.</p> https://tilde.cat/posts/lambda_vs_bind_code_size Tue, 17 Feb 2015 00:00:00 +0000 Joy of c++ <pre><code>#include &lt;type_traits&gt; decltype(auto) foo1() { int x; return x; } decltype(auto) foo2() { int x; return (x); } int main() { static_assert(std::is_same&lt;decltype(foo1()),int&gt;::value, &quot;&quot;); static_assert(std::is_same&lt;decltype(foo2()),int&amp;&gt;::value, &quot;&quot;); } </code></pre> https://tilde.cat/posts/joy_of_cpp Fri, 06 Mar 2015 00:00:00 +0000 Go Caves! <p>Quite recently <a href="http://www.gridsagegames.com/blog/2015/05/cogmind-alpha-access-launched/">Cogmind</a> alpha was released. As it happens I am also playing <a href="https://sites.google.com/site/broguegame/">Brogue</a> from time to time for last couple of months. This once again made me want to write my own <a href="http://www.roguebasin.com/index.php?title=What_a_roguelike_is">roguelike</a>, which reminded me that I’ve already seen two nice tutorials on how to write roguelike:</p> <ul> <li><a href="http://trystans.blogspot.com/2011/08/roguelike-tutorial-what-and-why.html">Original</a> by Trystan</li> <li><a href="http://stevelosh.com/blog/2012/07/caves-of-clojure-01/">Caves of Clojure</a> port of Trystan’s tutorial to Clojure by Steve Losh</li> </ul> <p>Adding another one might be fun experience and in the process I might finally end up with small game that I always wanted to write. As far as language, I’m going to use <a href="http://golang.org/">Go</a>. If you want to follow along and have never used it I can recommend short introduction available at <a href="http://tour.golang.org/welcome/1">A Tour of Go</a>.</p> <h2>Project setup</h2> <p>In Go there is well defined standard structure for code (for detailed description of it see <a href="http://golang.org/doc/code.html">How to Write Go Code</a>). In short I’m going to start with main package located at <code>~/src/bitbucket.org/$USER/gocaves</code>. In future I might need to extract some packages but for right now this will suffice. If you want to follow along you will have to create folder structure that mimics your own repository url.</p> <h2>Termbox</h2> <p>For interacting with terminal I’m going to use excellent library <a href="https://github.com/nsf/termbox-go">termbox-go</a>. In contrast to <a href="https://github.com/trystan/AsciiPanel">AsciiPanel</a> used by Trystan and <a href="https://github.com/mabe02/lanterna">Lanterna</a> used by Steve, termbox is pure terminal based solution without any means of rendering to gui based terminal emulators. I’m a little bit worried that updating full large terminals might get slow. If that becomes a problem I will deal with it then, for now this will be more than enough for my modest needs.</p> <p>To install termbox for Go you should use go tool:</p> <pre><code>go get -u github.com/nsf/termbox-go </code></pre> <p>Apart from setting up Go workspace this is all there is to do before starting.</p> <h2>gocaves.go</h2> <p>For now I’m going to stick with just one file called <code>gocaves.go</code>.</p> <h2>The Main</h2> <p>Since termbox is much smaller library I will have to do just a little more than Steve. Starting from the entry point this is my main function:</p> <pre><code>func main() { start() defer stop() events := make(chan termbox.Event) go poolEvents(events) for running { select { case ev := &lt;-events: processEvent(ev) case &lt;-time.After(100 * time.Millisecond): renderGame() termbox.Flush() } } } </code></pre> <p>The start and stop function are there to set up and tear down some global services and I will explain them in a moment.</p> <p>Next comes, first somewhat Go specific construct. For interaction with outside world (meaning keyboard) I will use goroutine. It’s like a thread but much less resource consuming. To communicate which events have been generated by keyboard I will use a channel which is just a thread safe bounded queue. So the pollEvents function will be running in the background collecting keyboard events and pushing them to channel events.</p> <h2>The Game Loop</h2> <p>The rest of main is so called <a href="http://gameprogrammingpatterns.com/game-loop.html">main game loop</a>. It needs to do two things. First it needs to handle all of players input, second it needs to render current game view for the player.</p> <p>Since I’m using channel to provide queue of input events I will need to somehow access those events. Reading values from channel can be done in one of two ways. You can either read in a blocking way or nonblocking way. In this case this needs to be nonblocking so that if user stops pressing buttons game will have a chance to render in between key presses. This might sound strange if you think about it from classical roguelike point of view. After all roguelikes are turn based and any changes in the game happen in reaction to player actions. With that simple model I could just block main loop for as long as there is no input from the player. But this also means that if I want to animate anything on the screen I won’t be able. So to make animations possible I need to maintain constant framerate regardless of player input.</p> <p>This is achieved with select statement. It itself blocks on many sources of inputs and unblocks first case that is ready for execution. In this case this means that whenever player presses any button, event loop will process this immediately but every 0.1s game will be rendered anew. start()</p> <p>The function:</p> <pre><code>func start() { runtime.GOMAXPROCS(runtime.NumCPU()) f, err := os.Create(&quot;gocaves.log&quot;) if err != nil { log.Fatalln(err) } logger = log.New(f, &quot;&quot;, log.LstdFlags|log.Lshortfile|log.Lmicroseconds) if err := termbox.Init(); err != nil { logger.Fatalln(err) } termbox.SetInputMode(termbox.InputMouse | termbox.InputEsc) logger.Println(&quot;Game started.&quot;) } </code></pre> <p>This could have been shorter. The only required section is the call to termbox.Init which starts and initializes any termbox specific services. I’ve added three things:</p> <ul> <li>Logger - this will make it easy to log from any part of the program and examine that log after game ends or even while it’s running.</li> <li>Extended input mode - termbox allows to intercept mouse clicks on terminal and enabling this is so easy that I could not resist.</li> <li>By default Go uses only one OS thread. Calling GOMAXPROCS with NumCPU tells Go runtime to exploit all hardware threads in the CPU game is running on.</li> </ul> <h2>stop()</h2> <pre><code>func stop() { termbox.Close() logger.Println(&quot;Game ended.&quot;) } </code></pre> <p>Little explanation needed - termbox requires to have Close called before program exit and that’s all to be done here.</p> <h2>pollEvents</h2> <pre><code>func pollEvents(events chan termbox.Event) { for { events &lt;- termbox.PollEvent() } } </code></pre> <p>Once again simple function - infinite loop in which we wait for any user input and put it on channel.</p> <h2>Rendering</h2> <pre><code>func setString(x, y int, s string, fg, bg termbox.Attribute) { for i, ch := range s { termbox.SetCell(x+i, y, ch, fg, bg) } } func renderGame() { setString(10, 4, &quot;Welcome to GoCaves!&quot;, termbox.ColorRed|termbox.AttrBold, termbox.ColorBlack) setString(10, 5, &quot;Press ESC key to exit.&quot;, termbox.ColorWhite, termbox.ColorBlack) } </code></pre> <p>Before rendering any text there needs to exits function which is capable of that. Since termbox provides only basic building blocks I need to make one. The <code>setString</code> function iterates over all characters in string and puts them in the same row one after the other.</p> <p>Which brings us to the <code>renderGame</code> itself. All it needs to do is to call <code>setString</code> to show our player welcoming message and instructs her how to exit this exiting game.</p> <h2>Final result</h2> <p>The final result can be downloaded from Bitbucket. This specific entry is tagged as entry01. Finally it can be built and run using go tool:</p> <pre><code>go build &amp;&amp; ./gocaves </code></pre> <p>Which looks like this:</p> <p><img alt="screen shot of the game" src="/static/entry01.png" /></p> https://tilde.cat/posts/go_caves Tue, 04 Aug 2015 00:00:00 +0000 Strong types and testing <p>This is reimplementation of Haskell code from <a href="http://bitemyapp.com/posts/2014-11-18-strong-types-and-testing.html">bitemyapp</a>, which itself was inspired by <a href="https://web.archive.org/web/20150206054038/http://levinotik.com/strong-types-and-their-impact-on-testing/">levinotik</a>. Obviously Go has much simpler type system and some of the constructs in Haskell are not possible to express in it. The most important of those are sum types and purity. Yet it is still possible to express quite a lot in Go.</p> <p>bitemyapp starts with declaring simplest structure to express email, and immediately notices that having all fields to be of type <code>string</code> is not the best approach. In Go such a struct would be:</p> <pre><code>type Email struct { toAddress string fromAddress string emailBody string recipientName string } </code></pre> <p>Having separate types for each field will make it hard to make a mistake when constructing email. Just like in Haskell, it is easy in Go:</p> <pre><code>type ( ToAddress string FromAddress string EmailBody string RecipientName string ) type Email struct { toAddress ToAddress fromAddress FromAddress emailBody EmailBody recipientName RecipientName } </code></pre> <p>bitemyapp tests his code in repl, which we don’t have in Go - let’s write one unit test:</p> <pre><code>func TestInitialization(t *testing.T) { to := ToAddress(&quot;levi@startup.com&quot;) from := FromAddress(&quot;chris@website.org&quot;) body := EmailBody(&quot;hi!&quot;) name := RecipientName(&quot;Levi&quot;) /* email := Email{ ToAddress: from, FromAddress: to, EmailBody: body, RecipientName: name, } Error: ./email_test.go:12: cannot use from (type FromAddress) as type ToAddress in field value ./email_test.go:13: cannot use to (type ToAddress) as type FromAddress in field value */ _ = Email{ To: to, From: from, Body: body, Recipient: name, } } </code></pre> <p>This is still basically the same as Haskell.</p> <p>Next thing bitemyapp mentions is making this email type abstract. In Haskell that means exporting only type without any way to access its data or construct it from outside of module. Hiding constructor in Go is easy - it’s enough to change <code>Email</code> to <code>email</code> to makes it private. But in contrast to Haskell, in Go it would be still possible to access members of values of that type. To have complete encapsulation (control both construction and data access) we will export only interface which email will satisfy:</p> <pre><code>type email struct { To ToAddress From FromAddress Body EmailBody Recipient RecipientName } func (e email) ToAddress() ToAddress { return e.To } func (e email) FromAddress() FromAddress { return e.From } func (e email) EmailBody() EmailBody { return e.Body } func (e email) RecipientName() RecipientName { return e.Recipient } type Email interface { ToAddress() ToAddress FromAddress() FromAddress EmailBody() EmailBody RecipientName() RecipientName } </code></pre> <p>Next comes smart constructor. To validate email addresses we will use <code>net/mail</code>:</p> <pre><code>type ( ErrToAddressDidntParse struct{ reason error } ErrFromAddressDidntParse struct{ reason error } ) func (e ErrToAddressDidntParse) Error() string { return &quot;'To' address didn't parse: &quot; + e.reason.Error() } func (e ErrFromAddressDidntParse) Error() string { return &quot;'From' address didn't parse: &quot; + e.reason.Error() } func NewEmail(to ToAddress, from FromAddress, body EmailBody, name RecipientName) (Email, []error) { errors := []error{} if err := validateAddress(string(to)); err != nil { errors = append(errors, ErrToAddressDidntParse{err}) } if err := validateAddress(string(from)); err != nil { errors = append(errors, ErrFromAddressDidntParse{err}) } if len(errors) &gt; 0 { return nil, errors } return &amp;email{To: to, From: from, Body: body, Recipient: name}, errors } func validateAddress(address string) error { _, err := mail.ParseAddress(address) return err } </code></pre> <p>Here we start to diverge from Haskell, at least in terms of succinctness. Thanks to thy way instance of Applicative typeclass for Maybe is <a href="http://hackage.haskell.org/package/base-4.8.2.0/docs/src/GHC.Base.html#line-633">defined</a>, it is possible to express smart constructor in much shorter way. Additionally since Go has no sum types we lose type information in return value, by returning list of any errors. And finally to make this as close to Haskell as possible we diverge from Go idiom of returning error value and return list of errors. Since we still have no repl, here is a test for smart constructor:</p> <pre><code>func TestSmartConstructor(t *testing.T) { if _, err := NewEmail(ToAddress(&quot;PLAID&quot;), FromAddress(&quot;TROLOLOL&quot;), body, name); len(err) == 0 { t.Fatal(&quot;Malformed email in 'To' field was not detected&quot;) } else { log.Println(err) } if _, err := NewEmail(to, from, body, name); len(err) &gt; 0 { t.Fatalf(&quot;Correct invocation failed: '%v'&quot;, err) } } </code></pre> <p>Lastly we add a way to parse email out of json. This is done with <code>encoding/json</code> and whole code is available <a href="http://bitbucket.org/tumdum/email">here</a>.</p> https://tilde.cat/posts/strong_types_and_testing Mon, 08 Feb 2016 00:00:00 +0000 Type level programming <p>While watching <a href="https://youtu.be/_-J4YRI1rAw">Typelevel Programming 101: The Subspace of Scala</a> I was reminded of Peano numbers. It’s a way to encode natural numbers which is derived from <a href="https://en.wikipedia.org/wiki/Peano_axioms">Peano axioms</a>. Which got me thinking - since I don’t know how long I had a feeling that there is almost mechanical translation from subset of Haskell programs to C++ meta programs. Specifically I had in mind programs which use only natural numbers and lists. So let’s find out if such a relationship is real.</p> <p>In the rest of the post I will always start with Haskell version, followed by C++ version. </p> <h2>Encoding natural numbers</h2> <p>To encode natural numbers let’s start with Peano axioms 1 and 6. Expressed as Haskell programs we have:</p> <pre><code>data Number = Zero | Next Number deriving (Show, Eq) </code></pre> <p>This declares new type <code>Number</code> with two constructors. One niladic representing number 0 and called <code>Zero</code>. Second one is a successor function, and takes one argument also of type Number.</p> <p>So how do we do this as a type level program in C++? This will require one type definition and one type template definition.</p> <pre><code>struct Zero { enum { value = 0 }; }; template &lt;typename Nat&gt; struct Next { enum { value = 1 + Nat::value }; }; </code></pre> <p>To make it easier to test this number representations, lets define helper function to easily convert integers to Peano naturals. First Haskell:</p> <pre><code>v :: Int -&gt; Number v 0 = Zero v n = Next (v (n -1)) </code></pre> <p>and similarly C++:</p> <pre><code>template &lt;int i&gt; struct _V { using type = Next&lt;typename _V&lt;i-1&gt;::type&gt;; }; template &lt;&gt; struct _V&lt;0&gt; { using type = Zero; }; template &lt;int i&gt; using V = typename _V&lt;i&gt;::type; </code></pre> <p>In both languages there is special case for zero, and general recurrence for all other values/types. In case of C++ I’ve added helper alias to make it easier to use.</p> <p>We can test Haskell part in ghci:</p> <pre><code>&gt; let one = Next Zero &gt; let two = Next one &gt; let five = Next (Next (Next two)) &gt; let six = v 6 </code></pre> <p>And C++ can be declared as:</p> <pre><code>using One = Next&lt;Zero&gt;; using Two = Next&lt;One&gt;; using Five = Next&lt;Next&lt;Next&lt;Two&gt;&gt;&gt;; using Six = V&lt;6&gt;; </code></pre> <h2>Arithmetic</h2> <p>Now that we have numbers, we can define basic <a href="https://en.wikipedia.org/wiki/Peano_axioms#Arithmetic">arithmetic</a> operations.</p> <p>Starting with <a href="https://en.wikipedia.org/wiki/Peano_axioms#Addition">addition</a>, we just follow definition from Wikipedia:</p> <pre><code>add :: Number -&gt; Number -&gt; Number add Zero rhs = rhs add (Next lhs) rhs = Next (add lhs rhs) </code></pre> <p>This is very short in Haskell and very straightforward. In type level C++ it’s only slightly longer:</p> <pre><code>template &lt;typename Lhs, typename Rhs&gt; struct _Add; template &lt;typename Rhs&gt; struct _Add&lt;Zero, Rhs&gt; { using type = Rhs; }; template &lt;typename Lhs, typename Rhs&gt; struct _Add&lt;Next&lt;Lhs&gt;, Rhs&gt; { using type = Next&lt;typename _Add&lt;Lhs, Rhs&gt;::type&gt;; }; template &lt;typename LHS, typename RHS&gt; using Add = typename _Add&lt;LHS, RHS&gt;::type; </code></pre> <p>Testing in Haskell repl shows that it seems to be working:</p> <pre><code>&gt; let two = v 2 &gt; let five = v 5 &gt; five == two False &gt; let four = v 4 &gt; (add four four) == (add five (add two (Next Zero))) True </code></pre> <p>In C++ we can express similar tests as:</p> <pre><code>static_assert(!std::is_same&lt;Five, Two&gt;::value, &quot;&quot;); static_assert(std::is_same&lt;Add&lt;Four, Four&gt;, Add&lt;Five, Add&lt;Two, One&gt;&gt;&gt;::value, &quot;&quot;); </code></pre> <p>Now that we have addition, <a href="https://en.wikipedia.org/wiki/Peano_axioms#Multiplication">multiplication</a> should be easy. First Haskell:</p> <pre><code>mul :: Number -&gt; Number -&gt; Number mul Zero _ = Zero mul (Next lhs) rhs = add rhs (mul lhs rhs) </code></pre> <p>and C++:</p> <pre><code>template &lt;typename Lhs, typename Rhs&gt; struct _Mul; template &lt;typename Rhs&gt; struct _Mul&lt;Zero, Rhs&gt; { using type = Zero; }; template &lt;typename Lhs, typename Rhs&gt; struct _Mul&lt;Next&lt;Lhs&gt;, Rhs&gt; { using tmp = typename _Mul&lt;Lhs, Rhs&gt;::type; using type = typename _Add&lt;Rhs, tmp&gt;::type; }; template &lt;typename LHS, typename RHS&gt; using Mul = typename _Mul&lt;LHS, RHS&gt;::type; </code></pre> <p>Testing in Haskell repl:</p> <pre><code>&gt; let six = v 6 &gt; let seven = v 7 &gt; (v 42) == (mul six seven) True </code></pre> <p>And in C++:</p> <pre><code>using Six = V&lt;6&gt;; using Seven = V&lt;7&gt;; static_assert(std::is_same&lt;V&lt;42&gt;, Mul&lt;Six, Seven&gt;&gt;::value, &quot;&quot;); </code></pre> <p>Last thing we need to have usable definition of numbers is order definition. Obviously zero is less then any non zero natural number. Additionally any two numbers are in the same order as their successors. Given this, we can derive clear Haskell implementation:</p> <pre><code>less :: Number -&gt; Number -&gt; Bool less Zero (Next _) = True less (Next _) Zero = False less Zero Zero = False less (Next lhs) (Next rhs) = less lhs rhs </code></pre> <p>And corresponding C++ version is somewhat more involved:</p> <pre><code>template &lt;typename LHS, typename RHS&gt; struct _Less; template &lt;typename RHS&gt; struct _Less&lt;Zero, Next&lt;RHS&gt;&gt; { using type = std::true_type; }; template &lt;typename LHS&gt; struct _Less&lt;Next&lt;LHS&gt;, Zero&gt; { using type = std::false_type; }; template &lt;&gt; struct _Less&lt;Zero, Zero&gt; { using type = std::false_type; }; template &lt;typename LHS, typename RHS&gt; struct _Less&lt;Next&lt;LHS&gt;, Next&lt;RHS&gt;&gt; { using type = typename _Less&lt;LHS, RHS&gt;::type; }; template &lt;typename LHS, typename RHS&gt; using Less = typename _Less&lt;LHS, RHS&gt;::type; </code></pre> <p>Testing in Haskell repl:</p> <pre><code>&gt; less (v 7) (v 8) True &gt; less (v 9) (v 8) False &gt; less (v 9) (v 9) False </code></pre> <p>Similar type level tests in C++:</p> <pre><code>static_assert(std::is_same&lt;std::true_type, Less&lt;V&lt;7&gt;, V&lt;8&gt;&gt;&gt;::value, &quot;&quot;); static_assert(!std::is_same&lt;std::true_type, Less&lt;V&lt;9&gt;, V&lt;8&gt;&gt;&gt;::value, &quot;&quot;); static_assert(!std::is_same&lt;std::true_type, Less&lt;V&lt;9&gt;, V&lt;9&gt;&gt;&gt;::value, &quot;&quot;); </code></pre> <p>That is already quite a lot of code, so let’s stop here and continue in next post with lists, folds and sorting. For the impatient, full source is on <a href="https://bitbucket.org/tumdum/typelevel101/">bitbucket</a>.</p> https://tilde.cat/posts/type_level_programming Fri, 11 Mar 2016 00:00:00 +0000 My introduction to Common Lisp <p>So I want to finally learn some Lisp. Today there are many languages claiming to be lisp, but as far as I can tell there are only 5 really active dialects.</p> <p>The oldest one is <a href="https://en.wikipedia.org/wiki/Scheme_(programming_language)">Scheme</a> which started it's life in 1975. The nice thing about Scheme is that it's a <strong>very</strong> small language. Next one is <a href="https://en.wikipedia.org/wiki/Common_Lisp">Common Lisp</a> which started almost 10 years later - in 1984. The nice thing about Common Lisp is that it is <strong>very</strong> big language. After next 10 years (in 1995) we got <a href="https://racket-lang.org/">Racket</a> (back than called DrScheme). The nice thing about Racket is quality of its' documentation and it's macro system. Finally after 12 years (in 2007) we got <a href="https://en.wikipedia.org/wiki/Clojure">Clojure</a>. The nice thing about Clojure is that it's running on JVM and its solution to state management in multithreaded applications. So, those are (as far as I can tell) 4 major dialects active today<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>, where is the fifth one? It's <a href="https://en.wikipedia.org/wiki/Emacs_Lisp">Emacs Lisp</a> (from 1985). The nice thing about Elisp is that it's used as scripting language in Emacs.</p> <p>So which one to chose? I think all of them are good choice. I've dabbled in all of them except Elisp. In the end I've decided to learn Common Lisp and will be using <a href="http://www.sbcl.org/">Sbcl</a> implementation.</p> <p>Now that we have our language chosen, how should we learn it? There is a great <a href="http://stevelosh.com/blog/2018/08/a-road-to-common-lisp">post by Steve Losh</a> about this. I will follow it to some extent, but since I know <em>something</em> about CL<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup> I will sometimes make detours or might change some things. So the point of this document(s?) will by to document my journey that starts with me as almost complete CL beginner.</p> <h2>Hello Euler</h2> <p>So my first program will be one solving first problem from Project Euler<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>. We are to find sum of all numbers that are multiplies of 3 or 5 and are below 1000. To do that we have to know how to define functions in CL. For our purposes it's enough to know that we first need to use <em>macro</em> <a href="http://clhs.lisp.se/Body/m_defun.htm">defun</a>, followed by name of function, its list of arguments as a list and finally its body. So function that returns t when number is multiple of 3 is one which checks reminder of division by 3. If it's 0 than we have multiple of 3, otherwise it's not the number we are looking for:</p> <pre><code>(defun d3 (x) (= 0 (mod x 3))) </code></pre> <p>This declares function d3 with one parameter x. It's body first computes reminder of division by 3 of x and than compares that reminder to 0. Now that we know that it's should be easy to write function that checks divisibility by 5:</p> <pre><code>(defun d5 (x) (= 0 (mod x 5))) </code></pre> <p>Since our numbers are to be divisible by 3 or 5 we need to combine those functions:</p> <pre><code>(defun d (x) (or (d3 x) (d5 x))) </code></pre> <p>Now we can detect our numbers. What is left is to find all such numbers below 1000. What seems to be simplest is to generate all numbers from 1 to 1000, remove ones for which d returns nil and finally sum what remains.</p> <p>We can solve this in multiple ways. First we will try to write as much code as we can, so that later we can refactor it to use built in functions.</p> <p>So how to generate this list of 1000 numbers? We will write a function that tracks what is the next number to generate. If it is less than maximum we want it will cons<sup id="fnref:4"><a class="footnote-ref" href="#fn:4">4</a></sup> this nubmer with result of recursive call of itself with next number incremented by one:</p> <pre><code>(defun gen (min max) (if (&gt;= min max) nil (cons min (gen (+ 1 min) max)))) </code></pre> <p>Using <code>trace</code> you can see how it is working:</p> <pre><code>CL-USER&gt; (trace gen) (GEN) CL-USER&gt; (gen 5 10) 0: (GEN 5 10) 1: (GEN 6 10) 2: (GEN 7 10) 3: (GEN 8 10) 4: (GEN 9 10) 5: (GEN 10 10) 5: GEN returned NIL 4: GEN returned (9) 3: GEN returned (8 9) 2: GEN returned (7 8 9) 1: GEN returned (6 7 8 9) 0: GEN returned (5 6 7 8 9) (5 6 7 8 9) </code></pre> <p>Now we need to figure out how to remove elements from that list that do not satisfy our predicate d. To do that we will iterate over the list and apply our predicate to each element, ignoring ones which do not satisfy it and keeping (consing) those that do. To do that we will need to know how to apply any predicate to value without hardcoding it. In CL we can do that with <a href="http://clhs.lisp.se/Body/f_funcal.htm">funcall</a> function.</p> <pre><code>(defun filter (pred list) (cond ((null list) nil) ((funcall pred (car list)) (cons (car list) (filter pred (cdr list)))) (t (filter pred (cdr list))))) </code></pre> <p>Final task is to sum together all filtered numbers:</p> <pre><code>(defun sum (l) (if (null l) 0 (+ (car l) (sum (cdr l))))) </code></pre> <p>We can use those functions we just wrote to arrive at solution:</p> <pre><code>(defun solve1 (&amp;optional (max 1000)) (sum (filter 'd (gen 1 max)))) </code></pre> <p>And we can see that it is indeed working:</p> <pre><code>CL-USER&gt; (untrace gen) T CL-USER&gt; (solve1 1000) 233168 CL-USER&gt; (solve1) 233168 </code></pre> <p>But CL is a big language and we didn't have to write all of that by hand. In fact we can replace our <code>filter</code> function with <code>remove-if-not</code> and <code>sum</code> with <code>reduce</code>:</p> <pre><code>(defun solve2 (&amp;optional (max 1000)) (reduce '+ (remove-if-not 'd (gen 1 max)))) </code></pre> <p>Finally there is one huge function that can be used to replace all of our code called loop:</p> <pre><code>(defun solve (&amp;optional (x 1000)) (loop for n from 1 below x by 1 when (d n) sum n)) </code></pre> <h2>Performance optimization</h2> <p>CL is supposed to be fast, so lets measure how fast our functions are. To do that we will use <a href="http://clhs.lisp.se/Body/m_time.htm">time</a> macro:</p> <pre><code>CL-USER&gt; (loop for f in '(solve1 solve2 solve) collect (time (funcall f 10000))) Evaluation took: 0.001 seconds of real time 0.001666 seconds of total run time (0.001666 user, 0.000000 system) 200.00% CPU 5,657,736 processor cycles 229,376 bytes consed Evaluation took: 0.002 seconds of real time 0.001398 seconds of total run time (0.001398 user, 0.000000 system) 50.00% CPU 4,744,446 processor cycles 229,376 bytes consed Evaluation took: 0.001 seconds of real time 0.000921 seconds of total run time (0.000921 user, 0.000000 system) 100.00% CPU 3,128,434 processor cycles 0 bytes consed (23331668 23331668 23331668) </code></pre> <p>We see that last function we wrote seems to be fastest, but the times are all very small - can we increase problem size? In case of our own functions not really - gen suffers from serious issue. It can't be optimized into loop via <a href="https://en.wikipedia.org/wiki/Tail_call">tail call optimization</a>:</p> <pre><code>CL-USER&gt; (solve1 1000) 233168 CL-USER&gt; (solve1 10000) 23331668 CL-USER&gt; (solve1 100000) Control stack guard page temporarily disabled: proceed with caution ; Evaluation aborted on #&lt;SB-KERNEL::CONTROL-STACK-EXHAUSTED {10021CD643}&gt;. CL-USER&gt; (solve2 1000) 233168 CL-USER&gt; (solve2 10000) 23331668 CL-USER&gt; (solve2 100000) Control stack guard page temporarily disabled: proceed with caution ; Evaluation aborted on #&lt;SB-KERNEL::CONTROL-STACK-EXHAUSTED {10023AE083}&gt;. </code></pre> <p>Instead of fixing this problem with gen lets see what micro-optimizations we can apply<sup id="fnref:5"><a class="footnote-ref" href="#fn:5">5</a></sup> to final version of solve. To do that lets first estabilish some baseline performance:</p> <pre><code>CL-USER&gt; (time (solve 100000000)) 3.767 seconds of real time 3.766975 seconds of total run time (3.766975 user, 0.000000 system) 100.00% CPU 12,808,428,054 processor cycles 0 bytes consed 2333333316666668 </code></pre> <p>Now we can ask our compiler to generate fast code for our function using <a href="http://clhs.lisp.se/Body/d_optimi.htm#speed">optimize</a> declaration:</p> <pre><code>(defun solve-opt (&amp;optional (x 1000)) (declare (optimize (speed 3))) (loop for n from 1 below x by 1 when (d n) sum n)) </code></pre> <p>Compiling that function generate 4 notes from compiler:</p> <pre><code>; in: DEFUN SOLVE-OPT ; (LOOP FOR N FROM 1 BELOW X BY 1 ; WHEN (D N) ; SUM ...) ; --&gt; BLOCK LET SB-LOOP::WITH-SUM-COUNT LET TAGBODY WHEN IF &gt;= OR LET ; --&gt; IF = IF ; ==&gt; ; (= SB-C::X SB-C::Y) ; ; note: unable to open code because: The operands might not be the same type. ; --&gt; BLOCK LET SB-LOOP::WITH-SUM-COUNT LET TAGBODY WHEN IF &gt;= OR LET &gt; ; --&gt; IF ; ==&gt; ; (&gt; SB-C::X SB-C::Y) ; ; note: forced to do GENERIC-&gt; (cost 10) ; unable to do inline fixnum comparison (cost 4) because: ; The first argument is a (INTEGER 1), not a FIXNUM. ; The second argument is a REAL, not a FIXNUM. ; --&gt; BLOCK LET SB-LOOP::WITH-SUM-COUNT LET TAGBODY IF SETQ THE ; ==&gt; ; (+ #:LOOP-SUM-1 N) ; ; note: forced to do GENERIC-+ (cost 10) ; unable to do inline fixnum arithmetic (cost 2) because: ; The first argument is a UNSIGNED-BYTE, not a FIXNUM. ; The second argument is a (INTEGER 1), not a FIXNUM. ; The result is a (VALUES (INTEGER 1) &amp;OPTIONAL), not a (VALUES FIXNUM ; &amp;REST T). ; unable to do inline (signed-byte 64) arithmetic (cost 5) because: ; The first argument is a UNSIGNED-BYTE, not a (SIGNED-BYTE 64). ; The second argument is a (INTEGER 1), not a (SIGNED-BYTE 64). ; The result is a (VALUES (INTEGER 1) &amp;OPTIONAL), not a (VALUES ; (SIGNED-BYTE 64) ; &amp;REST T). ; etc. ; --&gt; BLOCK LET SB-LOOP::WITH-SUM-COUNT LET TAGBODY ; --&gt; SB-LOOP::LOOP-DESETQ SETQ THE 1+ ; ==&gt; ; (+ N 1) ; ; note: forced to do GENERIC-+ (cost 10) ; unable to do inline fixnum arithmetic (cost 1) because: ; The first argument is a (INTEGER 1), not a FIXNUM. ; The result is a (VALUES (INTEGER 2) &amp;OPTIONAL), not a (VALUES FIXNUM ; &amp;REST T). ; unable to do inline fixnum arithmetic (cost 2) because: ; The first argument is a (INTEGER 1), not a FIXNUM. ; The result is a (VALUES (INTEGER 2) &amp;OPTIONAL), not a (VALUES FIXNUM ; &amp;REST T). ; etc. ; ; compilation unit finished ; printed 4 notes </code></pre> <p>It seems that there are some problems with our function that prevent compiler from generating fast code. But did those problems stopped compiler completely? Lets see:</p> <pre><code>CL-USER&gt; (time (solve-opt 100000000)) Evaluation took: 3.775 seconds of real time 3.774962 seconds of total run time (3.774962 user, 0.000000 system) 100.00% CPU 12,834,910,795 processor cycles 0 bytes consed 2333333316666668 </code></pre> <p>No change - it seems that compiler wasn't able to optimize our code. So lets see one more time those notes. It looks like addition somewhere in loop macro can't assume that me are adding integers (<code>GENERIC-+</code>). Similar issue seems to be with &gt; operator (<code>GENERIC-&gt;</code>). We should be able to fix that by specifying types:</p> <pre><code>(defun solve-opt (&amp;optional (x 1000)) (declare (optimize (speed 3))) (loop for n fixnum from 1 below x by 1 when (d n) sum n)) </code></pre> <p>This change make our code a little bit faster (from 3.775s down to 3.392s):</p> <pre><code>CL-USER&gt; (time (solve-opt 100000000)) Evaluation took: 3.392 seconds of real time 3.392041 seconds of total run time (3.392041 user, 0.000000 system) 100.00% CPU 11,533,339,425 processor cycles 0 bytes consed 2333333316666668 </code></pre> <p>But when compiling we still got 1 note:</p> <pre><code>; note: forced to do GENERIC-+ (cost 10) ; unable to do inline fixnum arithmetic (cost 2) because: ; The first argument is a UNSIGNED-BYTE, not a FIXNUM. ; The result is a (VALUES (INTEGER 1) &amp;OPTIONAL), not a (VALUES FIXNUM ; &amp;REST T). ; unable to do inline (signed-byte 64) arithmetic (cost 5) because: ; The first argument is a UNSIGNED-BYTE, not a (SIGNED-BYTE 64). ; The result is a (VALUES (INTEGER 1) &amp;OPTIONAL), not a (VALUES ; (SIGNED-BYTE 64) ; &amp;REST T). ; etc. </code></pre> <p>Is this related to missing type specifier of function argument x? No, it's not - I tried to specify x to be fixnum but that did not remove that note. The problem is that <code>sum</code> section<sup id="fnref:6"><a class="footnote-ref" href="#fn:6">6</a></sup> of loop needs to know type in which it should store result. So lets specify that:</p> <pre><code>(defun solve-opt (&amp;optional (x 1000)) (declare (optimize (speed 3))) (loop for n fixnum from 1 below x by 1 when (d n) sum n fixnum)) </code></pre> <p>Did it change anything? Not really:</p> <pre><code>CL-USER&gt; (time (solve-opt 100000000)) Evaluation took: 3.357 seconds of real time 3.356738 seconds of total run time (3.356738 user, 0.000000 system) 100.00% CPU 11,413,433,639 processor cycles 0 bytes consed 2333333316666668 </code></pre> <p>Lets add optimizations declaration to d3, d5 and d5:</p> <pre><code>(defun d3 (x) (declare (optimize (speed 3))) (declare (type fixnum x)) (= 0 (mod x 3))) (defun d5 (x) (declare (optimize (speed 3))) (declare (type fixnum x)) (= 0 (mod x 5))) (defun d (x) (declare (optimize (speed 3))) (declare (type fixnum x)) (or (d3 x) (d5 x))) </code></pre> <p>This still produces 2 notes but I don't know enough about CL to fix them. But this still is a big win (from 3.357s to 2.421s):</p> <pre><code>CL-USER&gt; (time (solve-opt 100000000)) Evaluation took: 2.421 seconds of real time 2.420994 seconds of total run time (2.420969 user, 0.000025 system) 100.00% CPU 8,231,452,745 processor cycles 0 bytes consed 2333333316666668 </code></pre> <h2>Profiling</h2> <p>So is this the end? Maybe not - we can try <a href="http://www.sbcl.org/manual/#Statistical-Profiler">profiling</a> our code and stop optimizing blindly using :</p> <pre><code>CL-USER&gt; (require :sb-sprof) (&quot;SB-SPROF&quot;) CL-USER&gt; (sb-sprof:with-profiling (:report :flat) (solve-opt 100000000)) Number of samples: 242 Sample interval: 0.01 seconds Total sampling time: 2.4199998 seconds Number of cycles: 0 Sampled threads: #&lt;SB-THREAD:THREAD &quot;repl-thread&quot; RUNNING {1003DA83E3}&gt; Self Total Cumul Nr Count % Count % Count % Calls Function ------------------------------------------------------------------------ 1 120 49.6 120 49.6 120 49.6 - D3 2 80 33.1 80 33.1 200 82.6 - D5 3 21 8.7 138 57.0 221 91.3 - D 4 18 7.4 244 100.8 239 98.8 - SOLVE-OPT 5 0 0.0 242 100.0 239 98.8 - &quot;Unknown component: #x52C45920&quot; ... </code></pre> <p>We can see that majority of time is spent in d3 and d5 functions. Maybe we can <a href="http://clhs.lisp.se/Body/d_inline.htm">inline</a> them? Lets do that:</p> <pre><code>(declaim (inline d3)) (defun d3 (x) (declare (optimize (speed 3))) (declare (type fixnum x)) (= 0 (mod x 3))) (declaim (inline d5)) (defun d5 (x) (declare (optimize (speed 3))) (declare (type fixnum x)) (= 0 (mod x 5))) </code></pre> <p>This gives us another big performance win (from 2.421s down to 1.807s):</p> <pre><code>CL-USER&gt; (time (solve-opt 100000000)) Evaluation took: 1.807 seconds of real time 1.807050 seconds of total run time (1.807050 user, 0.000000 system) 100.00% CPU 6,143,959,987 processor cycles 0 bytes consed 2333333316666668 </code></pre> <p>Profiling once again shows that d is major time sink:</p> <pre><code>CL-USER&gt; (sb-sprof:with-profiling (:report :flat) (solve-opt 100000000)) Number of samples: 181 Sample interval: 0.01 seconds Total sampling time: 1.81 seconds Number of cycles: 0 Sampled threads: #&lt;SB-THREAD:THREAD &quot;repl-thread&quot; RUNNING {1003DA83E3}&gt; Self Total Cumul Nr Count % Count % Count % Calls Function ------------------------------------------------------------------------ 1 168 92.8 168 92.8 168 92.8 - D 2 12 6.6 183 101.1 180 99.4 - SOLVE-OPT 3 0 0.0 181 100.0 180 99.4 - &quot;Unknown component: #x52C28DD0&quot; ``` So once again we add inlining declaration: </code></pre> <p>(declaim (inline d)) (defun d (x) (declare (optimize (speed 3))) (declare (type fixnum x)) (or (d3 x) (d5 x))</p> <pre><code> This brings us to very big performance improvement (from 1.807s down to 0.243s): </code></pre> <p>CL-USER&gt; (time (solve-opt 100000000)) Evaluation took: 0.243 seconds of real time 0.243403 seconds of total run time (0.243403 user, 0.000000 system) 100.00% CPU 827,563,685 processor cycles 0 bytes consed</p> <p>2333333316666668 ```</p> <p>Here my CL knowledge ends. Profiling shows 100% time spent in solve-opt. I know that it's easy in CL to get assembly generated for function (via <a href="http://clhs.lisp.se/Body/f_disass.htm">disassemble</a> function) but it's above my pay grade to know what to do with that.</p> <p>In the end we still got very nice improvement, from 3.767s down to 0.243s.</p> <div class="footnote"> <hr /> <ol> <li id="fn:1"> <p>It's 11 years since major new Lisp dialect, should we expect new one soon? Will it be Fennel, Hy, Carp or maybe one I've never heard of?&#160;<a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">&#8617;</a></p> </li> <li id="fn:2"> <p>I will reference Common Lisp as CL.&#160;<a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">&#8617;</a></p> </li> <li id="fn:3"> <p>I warned you that Steve's post will not be followed exactly.&#160;<a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">&#8617;</a></p> </li> <li id="fn:4"> <p>consing is primary way to allocate in CL, it creates pair of values. Thos pairs are (among other things) building blocks of CL lists.&#160;<a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">&#8617;</a></p> </li> <li id="fn:5"> <p>Just for fun and education.&#160;<a class="footnote-backref" href="#fnref:5" title="Jump back to footnote 5 in the text">&#8617;</a></p> </li> <li id="fn:6"> <p>What is the proper way to address this part of code?&#160;<a class="footnote-backref" href="#fnref:6" title="Jump back to footnote 6 in the text">&#8617;</a></p> </li> </ol> </div> https://tilde.cat/posts/common_lisp_intro Sun, 18 Nov 2018 00:00:00 +0000 Cost of using unbuffered io <p>It's easy to think that when using compiled languages like rust or c++ we get high performance almost for free. But sometimes it's easy to lose all of that speed for trivial reasons.</p></p> <p>One such example is using various convenience functions when doing I/O. Let's take as an example <a href="https://gist.github.com/tumdum/b912c7941a48d9ba5d3b0334d804dbd8">parsing json file</a> using <code>serde_json</code>:</p> <pre><code>fn main() -&gt; Result&lt;()&gt; { let opt = Opt::from_args(); let file = File::open(opt.path)?; let content: Value = match (opt.buffer, opt.buffer_size) { (false, _) =&gt; serde_json::from_reader(file)?, (true, None) =&gt; serde_json::from_reader(BufReader::new(file))?, (true, Some(n)) =&gt; serde_json::from_reader(BufReader::with_capacity(n, file))?, }; dbg!(content.is_object()); Ok(()) } </code></pre> <p>When we run that program under <code>hyperfine</code> we will see significant increase in spead in the buffered case:</p> <pre><code>$ hyperfine './target/release/unbuffered ./test.json' './target/release/unbuffered ./test.json --buffer' Finished release [optimized] target(s) in 0.02s Benchmark 1: ./target/release/unbuffered ./test.json Time (mean ± σ): 19.737 s ± 0.131 s [User: 6.891 s, System: 12.843 s] Range (min … max): 19.596 s … 20.040 s 10 runs Benchmark 2: ./target/release/unbuffered ./test.json --buffer Time (mean ± σ): 550.4 ms ± 5.5 ms [User: 494.0 ms, System: 56.2 ms] Range (min … max): 538.8 ms … 558.2 ms 10 runs Summary './target/release/unbuffered ./test.json --buffer' ran 35.86 ± 0.43 times faster than './target/release/unbuffered ./test.json' </code></pre> <p>Since buffering is so much faster lets see how many <code>read</code> syscalls each version emits:</p> <pre><code>$ strace -e read ./target/release/unbuffered test.json 2&gt;&amp;1 | wc -l 26141351 $ strace -e read ./target/release/unbuffered test.json --buffer 2&gt;&amp;1 | wc -l 3200 </code></pre> <p>The number of <code>read</code> syscalls in unbuffered case is very close to the size of <code>test.json</code> in bytes - 26141343. We can easily check first few <code>read</code> syscalls:</p> <pre><code>$ strace -e read ./target/release/unbuffered test.json 2&gt;&amp;1 | head read(3, &quot;\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0&gt;\0\1\0\0\0P\237\2\0\0\0\0\0&quot;..., 832) = 832 read(3, &quot;\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0&gt;\0\1\0\0\0\0\0\0\0\0\0\0\0&quot;..., 832) = 832 read(3, &quot;55c65f639000-55c65f645000 r--p 0&quot;..., 1024) = 1024 read(3, &quot;15000-7f6b53019000 r--p 00214000&quot;..., 1024) = 1024 read(3, &quot;57 /usr/lib/x8&quot;..., 1024) = 873 read(3, &quot;[&quot;, 1) = 1 read(3, &quot;{&quot;, 1) = 1 read(3, &quot;\&quot;&quot;, 1) = 1 read(3, &quot;i&quot;, 1) = 1 read(3, &quot;d&quot;, 1) = 1 </code></pre> <p>We can see that the json file is being read starting from 5th <code>read</code> syscall. Each syscall reads only one byte. This is in a big contrast to the buffered case where we can see that each <code>read</code> syscall reads 8192 bytes:</p> <pre><code>$ strace -e read ./target/release/unbuffered test.json --buffer 2&gt;&amp;1 | head read(3, &quot;\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0&gt;\0\1\0\0\0P\237\2\0\0\0\0\0&quot;..., 832) = 832 read(3, &quot;\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0&gt;\0\1\0\0\0\0\0\0\0\0\0\0\0&quot;..., 832) = 832 read(3, &quot;56509c94a000-56509c956000 r--p 0&quot;..., 1024) = 1024 read(3, &quot;15000-7f8dacc19000 r--p 00214000&quot;..., 1024) = 1024 read(3, &quot;57 /usr/lib/x8&quot;..., 1024) = 873 read(3, &quot;[{\&quot;id\&quot;:\&quot;2489651045\&quot;,\&quot;type\&quot;:\&quot;Crea&quot;..., 8192) = 8192 read(3, &quot;,\&quot;target_commitish\&quot;:\&quot;master\&quot;,\&quot;na&quot;..., 8192) = 8192 read(3, &quot;l\&quot;:\&quot;https://api.github.com/users&quot;..., 8192) = 8192 read(3, &quot;ownloads_url\&quot;:\&quot;https://api.githu&quot;..., 8192) = 8192 read(3, &quot;/api.github.com/repos/edx/edx-pl&quot;..., 8192) = 8192 </code></pre> <p>Would increasing the buffer size improve the speed even more? Not really:</p> <pre><code>$ hyperfine './target/release/unbuffered --buffer --buffer-size 65536 ./test.json' './target/release/unbuffered ./test.json --buffer' Benchmark 1: ./target/release/unbuffered --buffer --buffer-size 65536 ./test.json Time (mean ± σ): 540.2 ms ± 8.2 ms [User: 489.8 ms, System: 49.7 ms] Range (min … max): 529.9 ms … 557.7 ms 10 runs Benchmark 2: ./target/release/unbuffered ./test.json --buffer Time (mean ± σ): 541.2 ms ± 5.5 ms [User: 481.1 ms, System: 59.2 ms] Range (min … max): 531.4 ms … 546.4 ms 10 runs Summary './target/release/unbuffered --buffer --buffer-size 65536 ./test.json' ran 1.00 ± 0.02 times faster than './target/release/unbuffered ./test.json --buffer' </code></pre> https://tilde.cat/posts/cost_of_using_unbuffered_io Fri, 17 Mar 2023 00:00:00 +0000 Advent Of Code 2018 <p>As usual this time of a year I'm attending Advent of Code. Since I'm learning CL this time I will try to solve all tasks in CL. In this post I will document my findings and failures and you can always check code here.</p> <h2>Day 1 - Chronal Calibration</h2> <p>First part of day one essentially asks us to sum list of integers. If we know how to parse input (you will see in a moment that I failed to do that correctly) it's just a matter of applying + to that list:</p> <pre><code>(defun solve-day1-part1 (input) (apply #'+ input)) </code></pre> <p>So that's first part solved. In the second part we need to find first frequency that is repeated. To solve that I maintain current frequency and set of seen frequencies:</p> <pre><code>(defun solve-day1-part2 (input) (let ((seen (make-hash-table)) (current 0)) (setf (gethash current seen) t) (loop do (loop for e in input do (setf current (+ current e)) (if (gethash current seen) (return-from solve-day1-part2 current)) (setf (gethash current seen) t))))) </code></pre> <p>It seem correct but result was incorrect. What did I wrong? I incorrectly parsed input - my input reading function returned frequency changes in inverted order:</p> <pre><code>(defun read-lines (path) (let ((ret nil)) (with-open-file (stream path) (do ((line (read-line stream nil) (read-line stream nil))) ((null line)) (setf ret (cons line ret)))) ret)) (defun read-day1-input (path) (mapcar #'parse-integer (read-lines path))) </code></pre> <p>The problem is in read-lines and solution is as simple as adding reverse:</p> <pre><code>(defun read-lines (path) (let ((ret nil)) (with-open-file (stream path) (do ((line (read-line stream nil) (read-line stream nil))) ((null line)) (setf ret (cons line ret)))) (reverse ret))) </code></pre> <p>This didn't broke part 1 since addition is commutative. Fixing that gave correct solution for part 2.</p> <p>One interesting thing is that initially I tried to use <a href="https://cl-containers.common-lisp.dev/documentation/metabang.cl-containers-package/class-set--container.html">cl-containers:set-container</a>. It works, but is very slow. With hash-table from frequency to t solution runs under 0.1s:</p> <pre><code>AOC18&gt; (time (solve-day1-part2 *input*)) Evaluation took: 0.018 seconds of real time 0.018372 seconds of total run time (0.018372 user, 0.000000 system) 100.00% CPU 62,442,578 processor cycles 20,937,616 bytes consed 312 </code></pre> <p><a href="https://git.sr.ht/~ttt/aoc18/tree/master/day1.lisp">Full solution for day 1</a></p> <p>Part 1 time: 0.000 seconds of real time</p> <p>Part 2 time: 0.016 seconds of real time Awk?</p> <p>As a bonus here are my day 1 solutions in awk:</p> <pre><code>awk '{sum+=int($1)} END {print sum}' day1.input # part1 awk '{c+=int($1); if (c in seen) { print c; exit 0} else seen[c]=1}' &lt;(while cat day1.input; do :; done) # part2 </code></pre> <h2>Day 2 - Inventory Management System</h2> <p>In the first part we basically need to find out how many words contain same letter 2 or 3 times. First we will map word to hash-table from letters to the number of times they appear in a word:</p> <pre><code>(defun letter-frequencies (word) (let ((f (make-hash-table))) (loop for l across word do (incf (gethash l f 0))) f)) </code></pre> <p>Now that we have a way to calculate letter frequencies we need a way to detect hash tables with values of 2 or 3:</p> <pre><code>(defun has-duplicate (hash &amp;optional (count 2)) (loop for v being the hash-values of hash do (if (= count v) (return t)))) </code></pre> <p>Having those building blocks we can make final solution for part 1:</p> <pre><code>(defun solve-day2-part1 (words) (loop for w in words for f = (letter-frequencies w) counting (has-duplicate f) into two counting (has-duplicate f 3) into three finally (return (* two three)))) </code></pre> <p>In the second part we need to find pair of words that differ only in one place and find common substring of those words. There might be smarter ways to do this, but my solution is brute force one with O(n^2) complexity - due to size of input it is still fast enough.</p> <p>First we need a way to check if two words differ only in one place - we iterate over letters and count how many differ:</p> <pre><code>(defun correct-words (w1 w2) (loop for c1 across w1 for c2 across w2 counting (not (eql c1 c2)) into diff finally (return (= 1 diff)))) </code></pre> <p>Next we need a way to find substring of letters that are the same for two words:</p> <pre><code>(defun common-letters (w1 w2) (coerce (loop for c1 across w1 for c2 across w2 when (eql c1 c2) collecting c1) 'string)) </code></pre> <p>Finally we check every possible pair of words and for the one which differ only in one place we calculate common substring:</p> <pre><code>(defun solve-day2-part2 (words) (dolist (w1 words) (dolist (w2 words) (when (and (string-not-equal w1 w2) (correct-words w1 w2)) (return-from solve-day2-part2 (common-letters w1 w2)))))) </code></pre> <p>One thing that surprised me was that it seems it's not possible to iterate over same sequence twice using one loop form. That's why I used nested dolist. I've also learned that CL has some hard-coded conversions available via <a href="http://clhs.lisp.se/Body/f_coerce.htm">coerce</a> - it seems you can't extend this mechanism to your own types.</p> <p><a href="https://git.sr.ht/~ttt/aoc18/tree/master/day2.lisp">Full solution for day 2</a></p> <p>Part 1 time: 0.002 seconds of real time</p> <p>Part 2 time: 0.004 seconds of real time</p> <h2>Day 3 - No Matter How You Slice It</h2> <p>In part 1 we are given set of rectangles and are asked to find field with maximum number of covering it rectangles. Each rectangle is called claim in that task and we can model it as a struct:</p> <pre><code>(defstruct claim (id 0 :type fixnum) (x 0 :type fixnum) (y 0 :type fixnum) (w 0 :type fixnum) (h 0 :type fixnum)) </code></pre> <p>We are going to model surface (called fabric in the task) on which claims are made with 2d array of fixnums. Value in array will indicate number of claims made for given position:</p> <pre><code>(defun make-fabric (w h) (make-array (list w h) :initial-element 0 :element-type 'fixnum)) </code></pre> <p>Given those building blocks we can write function to claim parts of fabric - we will increment each field under claim:</p> <pre><code>(defun claim-fabric (f c) (loop for y from (claim-y c) below (+ (claim-y c) (claim-h c)) do (loop for x from (claim-x c) below (+ (claim-x c) (claim-w c)) do (incf (aref f x y))))) </code></pre> <p>Finally we need to find field that was claimed the most and return number of those claims:</p> <pre><code>(defun count-overclaims (f) (let* ((d (array-dimensions f)) (w (car d)) (h (cadr d)) (ret 0)) (loop for y from 0 below h do (loop for x from 0 below w do (when (&gt; (aref f x y) 1) (incf ret)))) ret)) </code></pre> <p>This lets us write solution for part 1 in which we apply each claim and find most claimed field:</p> <pre><code>(defun solve-day3-part1 (input) (let ((fabric (make-fabric 2000 2000))) (loop for claim in input do (claim-fabric fabric claim)) (values (count-overclaims fabric) fabric))) </code></pre> <p>Next in part 2 we need to find claim that does not overlap with other claims. To do that we need to observe that claim overlaps when parts of fabric under it have values greater than 1. So lets first write predicate that will detect if claim overlaps:</p> <pre><code>(defun claim-overlap-p (f c) (loop for y from (claim-y c) below (+ (claim-y c) (claim-h c)) do (loop for x from (claim-x c) below (+ (claim-x c) (claim-w c)) do (when (&gt; (aref f x y) 1) (return-from claim-overlap-p t))))) </code></pre> <p>This together with part 1 is enough to solve part 2. We will execute part 1 solution and then check each claim against 2d array with all claims applied:</p> <pre><code>(defun solve-day3-part2 (input) (multiple-value-bind (ignore f) (solve-day3-part1 input) (loop for claim in input do (when (not (claim-overlap-p f claim)) (return-from solve-day3-part2 (claim-id claim)))))) </code></pre> <p><a href="https://git.sr.ht/~ttt/aoc18/tree/master/day3.lisp">Full solution for day 3</a></p> <p>Part 1 time: 0.091 seconds of real time</p> <p>Part 2 time: 0.079 seconds of real time</p> <h2>Day 4 - Repose Record</h2> <p>In part one we need to find guard that sleeps the most. First problem is parsing input. We will represent each entry as pair of date and message:</p> <pre><code>(defstruct log-entry (date nil) (event nil)) </code></pre> <p>Date will be represented as number of seconds:</p> <pre><code>(defun parse-date (in) (cl-ppcre:register-groups-bind ((#'parse-integer year month day hour minute)) (&quot;(.*)-0*(.*)-0*(.*) 0?(.*):0?(.*)&quot; in) (+ (* year (* 366 31 1440)) (* month (* 31 1440)) (* day 1440) (* hour 60) minute))) (defun parse-log-entry (in) (cl-ppcre:register-groups-bind ((#'parse-date date) event) (&quot;\\[(.*)\\] (.*)&quot; in) (make-log-entry :date date :event event))) </code></pre> <p>I really like how we can bind regex groups to variables and even map some function before storing values to those variables.</p> <p>Now we can convert input to hashmap from guard id to list of intervals representing start and end of a nap width interval length as a head:</p> <pre><code>(defun new-guard-p (log) (cl-ppcre:register-groups-bind ((#'parse-integer id)) (&quot;Guard #(.*) begins shift&quot; (log-entry-event log)) id)) (defun collect-guards-times (input) (let ((gtimes (make-hash-table)) (current-guard nil) (current-start nil)) (loop for log in input do (cond ((new-guard-p log) (setf current-guard (new-guard-p log) current-start nil)) ((not current-start) (setf current-start (log-entry-date log))) (t (let ((diff (- (log-entry-date log) current-start)) (l (gethash current-guard gtimes '()))) (setf (gethash current-guard gtimes) (cons (list diff current-start (log-entry-date log)) l) current-start nil))))) gtimes)) </code></pre> <p>Having this structure we can compute total nap time for each guard given value from hashmap and find longest napping guard:</p> <pre><code>(defun total-guard-time (intervals) (apply #'+ (mapcar #'car intervals))) (defun find-best-guard (h) (let ((best-id 0) (best-total 0) (best-intervals 0)) (loop for id being the hash-keys of h using (hash-value interval) do (when (&gt; (total-guard-time interval) best-total) (setf best-total (total-guard-time interval) best-id id best-intervals interval))) (list best-id (mapcar #'cdr best-intervals)))) </code></pre> <p>Finally we need to find minute at which that guard most often slept:</p> <pre><code>(defun sleep-minutes (intervals) (let ((table (make-array 60 :initial-element 0 :element-type 'fixnum))) (loop for interval in intervals do (loop for i from (nth 0 interval) below (nth 1 interval) do (incf (aref table (mod i 60))))) table)) (defun most-asleep-minute-from-table (table) (let ((best-min 0) (best-min-val)) (loop for i from 0 below 60 do (when (&gt; (aref table i) (aref table best-min)) (setf best-min i best-min-val (aref table i)))) best-min)) (defun most-asleep-minute (intervals) (most-asleep-minute-from-table (sleep-minutes intervals))) </code></pre> <p>This lets us find answer:</p> <pre><code>(defun solve-day4-part1 (input) (let* ((times (collect-guards-times input)) (best (find-best-guard times)) (best-intervals (nth 1 best)) (best-min (most-asleep-minute best-intervals))) (* (car best) best-min))) </code></pre> <p>In part 2 we need to find guard which is most frequently asleep on the same minute. We have all the needed building blocks:</p> <pre><code>(defun solve-day4-part2 (input) (let ((best-min 0) (best-id 0) (best-val 0)) (loop for id being the hash-key of (collect-guards-times input) using (hash-value int) do (let* ((intervals (mapcar #'cdr int)) (table (sleep-minutes intervals)) (min (most-asleep-minute-from-table table)) (val (aref table min))) (when (&gt; val best-val) (setf best-min min best-id id best-val val)))) (* best-min best-id))) </code></pre> <p><a href="https://git.sr.ht/~ttt/aoc18/tree/master/day4.lisp">Full solution for day 4</a></p> <p>Part 1 time: 0.004 seconds of real time</p> <p>Part 2 time: 0.010 seconds of real time</p> <h2>Day 5 - Alchemical Reduction</h2> <p>I really like this day even if my solution is not that fast. In first part we need to find string which is a result of applying mutation rules until it can't be mutated further - this is essentially the problem of finding <a href="https://en.wikipedia.org/wiki/Fixed_point_(mathematics)">fix point</a>. The mutation rules are simple - if two neighbouring characters have different case but otherwise are the same they should be removed.</p> <p>We can express one mutation this way:</p> <pre><code>(defun unit-react-p (a b) (and (not (eql a b)) (eql (char-upcase a) (char-upcase b)))) (defun polymer-react-once (in) (loop for i from 0 below (- (length in) 1) do (when (unit-react-p (aref in i) (aref in (+ 1 i))) (return-from polymer-react-once (cl-strings:replace-all in (format nil &quot;~a~a&quot; (aref in i) (aref in (+ 1 i))) &quot;&quot;)))) in) </code></pre> <p>Now we need to find fix point. This meant that we have to find such input x such that (= x (polymer-react-once x)). So lets write function that will be able to find such fix point for given function and starting point:</p> <pre><code>(defun find-fix-point (f start &amp;optional (eq #'equal)) (let ((result (funcall f start))) (if (funcall eq start result) start (find-fix-point f result eq)))) </code></pre> <p>So the solution is to find fix point and return its length:</p> <pre><code>(defun solve-day5-part1 (in) (let ((poly (find-fix-point #'polymer-react-once in))) (values (length poly) poly))) </code></pre> <p>In part 2 we need to find shortest fix point given starting input with one kind of letter removed. We can work on result of solution for part 1 since pairs of letters removed there would be also removed after removing one kind of letter:</p> <pre><code>(defun remove-unit (p u) (remove (char-upcase u) (remove u p))) (defun solve-day5-part2 (p) (let ((preprocessed (nth-value 1 (solve-day5-part1 p)))) (apply #'min (lparallel:pmapcar (lambda (u) (solve-day5-part1 (remove-unit preprocessed u))) *alphabet*)))) </code></pre> <p><a href="https://git.sr.ht/~ttt/aoc18/tree/master/day5.lisp">Full solution for day 5</a></p> <p>Part 1 time: 0.359 seconds of real time</p> <p>Part 2 time: 0.865 seconds of real time</p> <h2>Day 6 - Chronal Coordinates</h2> <p>During this day we will be working with 2d regions. We will parse input and convert it to map from region name to its starting point:</p> <pre><code>(defun read-day6-input (path &amp;optional (offset 0)) (mapcar #'(lambda (l) (mapcar #'(lambda (p) (+ p offset)) (mapcar #'parse-integer (cl-ppcre:split &quot;, &quot; l )))) (read-lines path))) (defstruct cord (x 0 :type fixnum) (y 0 :type fixnum)) (defun give-names (points) (let ((mapping (make-hash-table))) (loop for x from 0 below (length points) do (setf (gethash x mapping nil) (nth x points))) mapping)) (defun read-input-as-named-cords (path) (give-names (loop for p in (read-day6-input path) collecting (make-cord :x (car p) :y (cadr p))))) </code></pre> <p>In first part we need to find size of largest finite area. But before we can do that we need to get rid of infinite regions. In case of this task, an infinite region is one which is closest to one of the points on bounding box encompassing all starting points. So lets first find points on the border of our bounding box:</p> <pre><code>(defun border-cords (named) (let* ((x-cords (loop for c being the hash-value of named collecting (cord-x c))) (y-cords (loop for c being the hash-value of named collecting (cord-y c))) (min-x (1- (apply #'min x-cords))) (min-y (1- (apply #'min y-cords))) (max-x (1+ (apply #'max x-cords))) (max-y (1+ (apply #'max y-cords))) (cords nil)) (loop for x from min-x to max-x do (loop for y from min-y to max-y do (when (or (= x min-x) (= x max-x) (= y min-y) (= y max-y)) (push (make-cord :x x :y y) cords)))) cords)) </code></pre> <p>We will be using <a href="https://en.wiktionary.org/wiki/Manhattan_distance">Manhattan distance</a> to measure distance between points:</p> <pre><code>(defun cord-m-dist (c1 c2) (+ (abs (- (cord-x c1) (cord-x c2))) (abs (- (cord-y c1) (cord-y c2))))) </code></pre> <p>We can now find closest region for given coordinate:</p> <pre><code>(defun closest-cord-name (named c) (let ((best-name nil) (best-dist most-positive-fixnum)) (loop for name being the hash-key of named using (hash-value cord) do (let ((dist (cord-m-dist cord c))) (cond ((&lt; dist best-dist) (setf best-name name best-dist dist)) ((= dist best-dist) (setf best-name nil))))) best-name)) (defun closest-to-p (named c expected) (equal expected (closest-cord-name named c))) </code></pre> <p>Having this we can find all infinite regions:</p> <pre><code>(defun find-infinite-cords (named) (remove-duplicates (remove nil (mapcar #'(lambda (x) (closest-cord-name named x)) (border-cords named))))) </code></pre> <p>What is left is function to calculate size of finite region. This can be done by either dfs or bfs:</p> <pre><code>(defun move-coord (pos dir) (make-cord :x (+ (cord-x pos) (cord-x dir)) :y (+ (cord-y pos) (cord-y dir)))) (defparameter *dirs* (mapcar #'(lambda (p) (make-cord :x (car p) :y (cadr p))) '((0 1) (0 -1) (1 0) (-1 0) (1 1) (1 -1) (-1 -1) (-1 1)))) (defun neighbours (pos) (mapcar #'(lambda (p) (move-coord pos p)) *dirs*)) (defun region-starting-from (start pred) (assert (funcall pred start)) (let ((to-visit (list start)) (seen (make-hash-table :test #'equalp)) (region 0)) (loop while (not (empty-p to-visit)) do (let* ((next (pop to-visit)) (candidates (neighbours next))) (incf region) (setf (gethash next seen) t) (loop for cand in candidates do (when (and (funcall pred cand) (null (gethash cand seen))) (setf (gethash cand seen) t) (push cand to-visit))))) region)) </code></pre> <p>And now we have a solution:</p> <pre><code>(defun solve-day6-part1 (named) (let* ((infinite (find-infinite-cords named)) (finite (set-difference (loop for n being the hash-key of named collecting n) infinite))) (apply #'max (mapcar #'(lambda (c) (region-starting-from (gethash c named) #'(lambda (cand) (closest-to-p named cand c)))) finite)))) </code></pre> <p>In part 2 we need to find size of a new region which contains points close enough to all starting points. Since we wrote our finding region function generically we can just change the predicate and be done with it:</p> <pre><code>(defun is-safe-cord-p (named cord &amp;optional (max 10000)) (&lt; (loop for p being the hash-value of named summing (cord-m-dist cord p)) max)) (defun solve-day6-part2 (named) (region-starting-from (middle-cord named) #'(lambda (cand) (is-safe-cord-p named cand)))) </code></pre> <p><a href="https://git.sr.ht/~ttt/aoc18/tree/master/day6.lisp">Full solution for day 6</a></p> <p>Part 1 time: 0.275 seconds of real time</p> <p>Part 2 time: 0.236 seconds of real time</p> <h2>Day 7 - The Sum of Its Parts</h2> <p>This day is a directed graph problem. We are given list of edges describing that one node is requirement of another node. The task is to find order in which nodes will be completed, assuming that node becomes completed right after all of its dependencies are completed (and nodes that become completed at the same time are saved in lexicographic order).</p> <p>We will use hash table from node to list of nodes to represent our input:</p> <pre><code>(defstruct edge (from nil) (to nil)) (defun parse-edge (line) (cl-ppcre:register-groups-bind (from to) (&quot;Step (.*) must be finished before step (.*) can begin.&quot; line) (make-edge :from from :to to))) (defun parse-day7-input (path) (let ((map (make-hash-table :test #'equal)) (starts nil) (edges (mapcar #'parse-edge (read-lines path)))) (loop for edge in edges do (push (edge-to edge) (gethash (edge-from edge) map nil))) (setf starts (set-difference (remove-duplicates (mapcar #'edge-from edges) :test #'equal) (remove-duplicates (mapcar #'edge-to edges) :test #'equal) :test #'equal)) (values map (sort starts #'string-lessp)))) </code></pre> <p>Actually it will be useful to have map from node to its requirement:</p> <pre><code>(defun invert-rels (g) (let ((map (make-hash-table :test #'equal))) (loop for from being the hash-key of g using (hash-value tos) do (loop for to in tos do (push from (gethash to map)))) map)) </code></pre> <p>The core issue will be to get next ready node given currently waiting list. Given list of currently waiting nodes we can solve it this way:</p> <pre><code>(defun next-ready (cand reqs done) (find-if #'(lambda (c) (empty-p (set-difference (gethash c reqs) done :test #'equal))) cand)) </code></pre> <p>Finally we can solve part 1. We will maintain list of nodes waiting for completion (initially filled with nodes without any requirements). Each time we remove one element from that list that has no unfinished requirement and add nodes for each it was a requirement to waiting list:</p> <pre><code>(defun solve-day7-part1 (g starts) (let ((visited nil) (reqs (invert-rels g))) (loop while (not (empty-p starts)) do (let* ((next (next-ready starts reqs visited)) (cand (sort (gethash next g) #'string-lessp))) (setf starts (remove next starts :test #'equal)) (push next visited) (when cand (setf starts (sort (append starts cand) #'string-lessp))))) (apply #'concatenate 'string (reverse visited)))) </code></pre> <p>In the second part we get to simulate multithreaded system. Each task has a cost associated with it and in represents how much time it takes to complete said task. Additionally we are to simulate a pool of threads executing those tasks. Eventually we need (like in part 1) to find out in what order will those task finish.</p> <p>We start with calculation of tasks cost:</p> <pre><code>(defun time-needed (task) (apply #'+ (mapcar #'(lambda (c) (- c 4)) (mapcar #'char-code (coerce task 'list))))) (defun timed-task (current-time task) (cons (+ current-time (time-needed task)) task)) </code></pre> <p>Next we need a way to get list of tasks that can fill up free slots in thread pool:</p> <pre><code>(defun next-ready-many (cand reqs done max) (let ((ret nil)) (loop for c in cand do (let ((req (gethash c reqs))) (when (null (set-difference req done :test #'equal)) (push c ret)))) (let* ((sorted (sort ret #'string-lessp)) (end (min (length sorted) max))) (subseq sorted 0 end)))) </code></pre> <p>And a way to put those tasks into the pool:</p> <pre><code>(defun insert-timed-tasks (tasks pool current-time) (let ((timed (mapcar #'(lambda (x) (timed-task current-time x)) tasks))) (loop for i from 0 below (length pool) do (when (not (aref pool i)) (setf (aref pool i) (pop timed)))))) </code></pre> <p>Our pool should be viewed as a priority queue with task ordered by age, so lets write a function that will find out earliest time our pool would produce completed task:</p> <pre><code>(defun find-next-time (pool) (let* ((best most-positive-fixnum)) (loop for timed-task across pool do (when timed-task (setf best (min best (car timed-task))))) (if (= most-positive-fixnum best) 0 best))) </code></pre> <p>If we know next maturation task time from pool we can remove all tasks that would mature at that time:</p> <pre><code>(defun remove-matured (pool time) (let ((matured '())) (loop for i from 0 below (length pool) do (let ((element (elt pool i))) (when (and element (= time (car element))) (push (cdr element) matured) (setf (elt pool i) nil)))) matured)) </code></pre> <p>Now we are ready to solve part 2. We will make a loop in which we will extract already matured tasks and replace them with next set of tasks (observing dependency graph from input) until all tasks are done:</p> <pre><code>(defun empty-pool-p (pool) (= (loop for task across pool when task counting t) 0)) (defun solve-day7-part2 (g starts &amp;optional (workers 2)) (let* ((pool (make-array workers :initial-element nil)) (possible starts) (current-time 0) (visited nil) (reqs (invert-rels g))) (loop while (or (not (empty-p possible)) (not (empty-pool-p pool))) do (let* ((next-time (find-next-time pool)) (matured (remove-matured pool next-time))) (setf visited (append visited matured) current-time next-time) (loop for m in matured do (setf possible (sort (remove-duplicates (append possible (gethash m g)) :test #'equal) #'string-lessp))) (let* ((free-slots (count nil pool)) (cands (next-ready-many possible reqs visited free-slots))) (insert-timed-tasks cands pool current-time) (loop for c in cands do (setf possible (remove c possible :test #'equal)))))) current-time)) </code></pre> <p><a href="https://git.sr.ht/~ttt/aoc18/tree/master/day7.lisp">Full solution for day 7</a></p> <p>Part 1 time: 0.002 seconds of real time</p> <p>Part 2 time: 0.002 seconds of real time</p> <h2>Day 8 - Memory Maneuver</h2> <p>In the first part we are given list of integers that encode a tree. This is somewhat like parsing s-exps but each exp is preceded with information how many subexpressions it will contain. Let's first parse that tree. Our parsing function will take number of nodes to be parsed and input and return list of parsed nodes and rest of input that wasn't consumed by parsing:</p> <pre><code>(defstruct node (children nil) (meta nil)) (defun read-header (in) (list (car in) (cadr in))) (defun parse-nodes (in nodes) (let ((current in) (children nil)) (loop for x from 0 below nodes do (let* ((h (read-header current)) (n (car h)) (m (cadr h)) (rest (cddr current)) (subchildren nil)) (when (&gt; n 0) (multiple-value-bind (c r) (parse-nodes rest n) (setf subchildren c rest r))) (push (make-node :children subchildren :meta (subseq rest 0 m)) children) (setf current (subseq rest m)))) (values (reverse children) current))) </code></pre> <p>We need to evaluate this tree. A value of a node in that tree is sum of its metadata added to sum of values of it subnodes:</p> <pre><code>(defun sum-metadata (root) (+ (apply #'+ (node-meta root)) (loop for n in (node-children root) summing (sum-metadata n)))) (defun solve-day8-part1 (input) (sum-metadata (car (parse-nodes input 1)))) </code></pre> <p>Part 2 also asks us to evaluate this tree, but this time metadata elements are indexes into subnodes list of each node:</p> <pre><code>(defun sum-node (root) (cond ((null root) 0) ((empty-p (node-children root)) (apply #'+ (node-meta root))) (t (sum-nodes (mapcar #'1- (node-meta root)) (node-children root))))) (defun sum-nodes (indexes nodes) (loop for i in indexes summing (sum-node (nth i nodes)))) (defun solve-day8-part2 (input) (sum-node (car (parse-nodes input 1)))) </code></pre> <p><a href="https://git.sr.ht/~ttt/aoc18/tree/master/day8.lisp">Full solution for day 8</a></p> <p>Part 1 time: 0.085 seconds of real time</p> <p>Part 2 time: 0.081 seconds of real time</p> <h2>Day 9 - Marble Mania</h2> <p>This was a fun day, mostly thanks to my stubbornness to keep using solution that could barely solve part 1. Instead of improving it I should have immediately rewrite it. So lets first see that final solution.</p> <p>In both parts of day 9 we are asked to simulate a game. Each turn marble is added to the circle but each 23rd turn marble 7 positions before current one is removed (as usual see task description for details).</p> <p>Best way to approach this is to use doubly linked list since we have to move both backward and forward and erase elements from visited positions. To avoid doing bounds checking we will use circular buffer based on doubly linked list. As far as I know there is no such data structure in CL standard. The standard lists are singly linked lists so it will be very costly to move back - you would need to essentially move from beginning. So let's write our doubly linked list:</p> <pre><code>(defstruct dlnode (prev nil) (next nil) (value 0)) (defun add-dlnode (after value) (let ((node (make-dlnode :prev after :value value :next (dlnode-next after))) (next (dlnode-next after))) (setf (dlnode-next after) node) (setf (dlnode-prev next) node) node)) (defun remove-dlnode (node) (let ((prev (dlnode-prev node)) (next (dlnode-next node))) (setf (dlnode-next prev) next (dlnode-prev next) prev) prev)) (defun go-back-dlnode (start steps) (let ((current start)) (loop for i fixnum from 0 below steps do (setf current (dlnode-prev current))) current)) </code></pre> <p>That is the list, lets now make it a loop by joining next field of last dlnode with first dlnode (and prev of first with last):</p> <pre><code>(defun dlnode-from-list (init) (let* ((root (make-dlnode :value (car init))) (current root)) (loop for e in (cdr init) do (let ((next (make-dlnode :prev current :value e))) (setf (dlnode-next current) next) (setf current next))) (setf (dlnode-next current) root) (setf (dlnode-prev root) current) root)) </code></pre> <p>Having those, we can easily encode rules of the game:</p> <pre><code>(defun day9 (start marbles players) (let ((player 1) (circle (dlnode-from-list start)) (scores (make-hash-table))) (loop for marble fixnum from 1 to marbles do (if (= 0 (mod marble 23)) (progn (setf circle (go-back-dlnode circle 8)) (let ((points (gethash player scores 0))) (setf (gethash player scores) (+ points marble (dlnode-value circle)))) (setf circle (dlnode-next (dlnode-next (remove-dlnode circle))))) (setf circle (dlnode-next (add-dlnode circle marble)))) (setf player (1+ (mod player players)))) scores)) (defun best-marble-score (h) (loop for score being the hash-value of h maximizing score)) </code></pre> <p><a href="https://git.sr.ht/~ttt/aoc18/tree/master/day9.lisp">Full solution for day 9</a></p> <p>Part 1 time: 0.004 seconds of real time</p> <p>Part 2 time: 0.839 seconds of real time</p> <h2>Quadratic solution</h2> <p>Instead of doing that from the start, I've implemented solution based on simple list. I was hoping that it would be fast enough and was worried that I wouldn't know how to write doubly linked list in CL. Here is the first version:</p> <pre><code>(defun play (list steps players) (declare (optimize (speed 3) (safety 1)) (type fixnum players)) (setf players (1+ players)) (let ((len (length list)) (current 0) (player 0) (scores (make-hash-table))) (loop for step fixnum from 0 below steps do (if (and (= 0 (mod (1+ step) 23)) (&gt; step 0)) (let* ((to-remove (mod (+ len (- current 7)) len)) (removed (nth to-remove list)) (pscore (gethash player scores 0))) (declare (type fixnum len pscore removed)) (setf (gethash player scores 0) (+ pscore (1+ step) removed)) (setf current to-remove list (delete removed list :count 1 :start to-remove) len (1- len)) ) (let* ((one (mod (1+ current) len)) (left (subseq list 0 (1+ one))) (right (subseq list (1+ one))) (marble (1+ step))) (declare (type fixnum len one)) (setf list (append left (cons marble right)) current (mod (1+ one) (1+ len)) len (1+ len)))) (setf player (mod (1+ player) players)) (when (= player 0) (incf player))) scores)) </code></pre> <p>It works but is very slow - it took 23.406 seconds of real time to solve part 1 and was clearly not capable of solving problem 100 larger. So i tried to improve it by avoiding coping lists as much as I could:</p> <pre><code>(defun play2 (list steps players) (declare (optimize (speed 3) (safety 1)) (type fixnum players) (type list list)) (setf players (1+ players)) (let ((len (length list)) (current 0) (player 0) (scores (make-hash-table))) (declare (type fixnum len current player)) (loop for step fixnum from 0 below steps do (if (and (= 0 (mod (1+ step) 23)) (&gt; step 0)) (let* ((to-remove (mod (+ len (- current 7)) len)) (removed (nth to-remove list)) (pscore (gethash player scores 0)) ) (declare (type fixnum len pscore removed)) (setf (gethash player scores 0) (+ pscore (1+ step) removed)) (setf current to-remove list (delete removed list :count 1 :start to-remove) len (1- len)) ) (let* ((one (mod (1+ current) len)) (two (1+ one)) (next-len (1+ len)) (marble (1+ step))) (declare (type fixnum len one two marble)) (setf list (destructive-insert list two marble) current (mod two next-len) len next-len))) (setf player (mod (1+ player) players)) (when (= player 0) (incf player))) scores)) (declaim (inline destructive-insert)) (defun destructive-insert (list index element) (declare (optimize (speed 3) (safety 1)) (type fixnum index)) (decf index) (when (= -1 index) (return-from destructive-insert (cons element list))) (let ((cell list)) (loop for p fixnum from 0 below index do (setf cell (cdr cell))) (setf (cdr cell) (cons element (cdr cell)))) list) </code></pre> <p>That was 10 times faster - 2.739 seconds of real time but still to slow. So I tried to avoid coping even more by tracking current cons cell and mutating it in place wherever I could:</p> <pre><code>(defun play3 (list steps players) (declare (optimize (speed 3) (safety 1)) (type fixnum players) (type list list)) (setf players (1+ players)) (let ((len (length list)) (current 0) (player 0) (scores (make-hash-table)) (current-cons list)) (declare (type fixnum len current player)) (loop for step fixnum from 0 below steps do (if (and (= 0 (mod (1+ step) 23)) (&gt; step 0)) (let* ((to-remove (mod (+ len (- current 7)) len)) (removed (nth to-remove list)) (pscore (gethash player scores 0)) ) (declare (type fixnum len pscore removed)) (setf (gethash player scores 0) (+ pscore (1+ step) removed)) (setf current-cons (last list (1+ (- len to-remove )))) (setf current to-remove list (delete removed list :count 1 :start to-remove) len (1- len)) (setf current-cons (last list (1- (- len to-remove)))) (when (null current-cons) (setf current-cons list))) (let* ((one (mod (1+ current) len)) (two (1+ one)) (next-len (1+ len)) (marble (1+ step))) (declare (type fixnum len one two marble)) (setf (cdr current-cons) (cons marble (cdr current-cons))) (setf current-cons (if (&lt; one current) (last list (- len one 1)) (cdr (cdr current-cons)))) (when (null current-cons) (setf current-cons list)) (setf current (mod two next-len) len next-len))) (setf player (mod (1+ player) players)) (when (= player 0) (incf player))) scores)) </code></pre> <p>That was once again order of magnitude times faster for part 1 input - 0.721 seconds of real time but it was still quadratic. So full solution would take at least 10000 times longer - 7000s. By that time I knew I need to rewrite with doubly linked list, but just out of curiosity I left it running in the background. It took 7596.982 seconds of real time so almost exactly as predicted.</p> https://tilde.cat/posts/aoc_2018 Fri, 28 Dec 2018 00:00:00 +0000 Analyzing multi-gigabyte JSON files locally with serde <p>Recently I've read a <a href="https://thenybble.de/posts/json-analysis/">very interesting article</a> about running queries over large json documents in an almost interactive way. As is often the case, I immediately thought about doing the same in rust (RIIR syndrome?). I was curious how fast (or slow) will be the 'default' way of doing it in rust. This is obviously quite pointless since the approach in the article is nice and easy. Read on only if you want to waste some time on pointless exercise in rust!</p> <p>Before we start, keep in mind that you can find everything I show here in <a href="https://github.com/tumdum/serde_json_mmaped_value">this</a> repository.</p> <p>The problem starts with a multi-gigabyte json document where each line is a separate json object. Sadly author can't share that file with us but provides one such object as an example. I've duplicated that object 22214784 times to get ~18G test input.</p> <aside> I was really surprised that <code>zstd</code> was able to compress that file so well. When compressed with default options it resulted in a 1,7M file. Which is a big improvement over the default <code>gzip</code> invocation which produces 86M archive. </aside> <h2>serde_json::Value</h2> <p>Now that we have some input we can start writing some rust. When I mentioned earlier the default rust way, I was thinking about <a href="https://serde.rs/">serde</a> and in our case <a href="https://github.com/serde-rs/json">serde_json</a>. We will use the most general type, which is <a href="https://docs.rs/serde_json/latest/serde_json/enum.Value.html">Value</a> - it can represent arbitrary json values. To read that huge json, we will use <code>mmap</code> and let the OS worry about which pages are needed and which can be discarded. Finally, we will use <code>rayon</code> to use all the cores/threads. This solution will lack a few features from the original post - no easy-to-use query language, and no output (since in our test input, the output will be all 18G of json). But we don't have to worry about block size and can easily run on machines with a little ram. The core of this solution is this snippet:</p> <pre><code>fn main() -&gt; Result&lt;()&gt; { // ... let f = File::open(opt.input)?; let mmap = unsafe { MmapOptions::new().map(&amp;f)? }; let bytes: &amp;[u8] = mmap.as_ref(); let n = bytes .lines() .par_bridge() .flat_map(from_slice::&lt;serde_json::Value&gt;) .filter(pred) .count() ... } fn pred(v: &amp;impl Queryable) -&gt; bool { v.get_all(&quot;subArts&quot;) .into_iter() .flat_map(|v| v.get_all(&quot;subSubArts&quot;)) .flat_map(|v| v.get_all(&quot;size&quot;)) .any(|v| v.contains(&quot;snug&quot;)) } impl Queryable for serde_json::Value { fn get_all(&amp;self, key: &amp;'static str) -&gt; Vec&lt;&amp;serde_json::Value&gt; { match self { serde_json::Value::Object(v) =&gt; v.iter().filter(|(k, _)| **k == key).map(|(_, v)| v).collect(), serde_json::Value::Array(v) =&gt; v.iter().flat_map(|e| e.get_all(key)).collect(), _ =&gt; vec![], } } fn contains(&amp;self, arg: &amp;str) -&gt; bool { match self { serde_json::Value::String(v) =&gt; v.contains(arg), _ =&gt; false, } } } </code></pre> <p>Is this already fast enough? The original solution with <code>parallel</code> and <code>jq</code> was able to process similarly sized file on 8 core/16 threads Ryzen cpu in around 30s. This rust approach on 8 core/16 threads i7-1260P can run in around half that time:</p> <pre><code>Benchmark 2: ./target/release/json-big --input 18_7_G.json --value-type serde Time (mean ± σ): 16.270 s ± 0.101 s [User: 228.987 s, System: 16.478 s] Range (min … max): 16.194 s … 16.536 s 10 runs </code></pre> <h2>String interning</h2> <p>Can we make it faster without leaving <code>serde</code>? Yes, we can! We can limit the number of allocations for string keys/values that are present in all those json objects using some sort of string interning. Since we are running on many threads it would be much faster (ask me how I know this...) to use a per-thread cache instead of one global one - this will limit the amount of synchronization. The cache itself will be a <code>HashSet</code> of pointers to strings. To make it simpler we will never deallocate those strings and rely on them being always present:</p> <pre><code>thread_local! { static CACHE: RefCell&lt;Cache&gt; = RefCell::new(Cache::default()); } #[derive(Debug, Default)] struct Cache { strings: FxHashSet&lt;Box&lt;str&gt;&gt;, } impl Cache { fn get(&amp;mut self, s: &amp;str) -&gt; &amp;'static str { if let Some(ptr) = self.strings.get(s) { unsafe { transmute(&amp;**ptr) } } else { let ptr: Box&lt;str&gt; = s.into(); let ret: &amp;'static str = unsafe { transmute(&amp;*ptr) }; self.strings.insert(ptr); ret } } } #[derive(Debug, PartialEq, Eq)] pub struct InternedString(&amp;'static str); impl InternedString { pub fn new(s: &amp;str) -&gt; Self { Self(CACHE.with(|cache| cache.borrow_mut().get(s))) } } impl Deref for InternedString { type Target = str; fn deref(&amp;self) -&gt; &amp;Self::Target { &amp;*self.0 } } </code></pre> <p>Having such interned string, we can write our own version of the <code>Value</code> type:</p> <pre><code>#[derive(Debug, PartialEq, Eq)] pub enum ValueIntern { // Null, // not present in tested input // Bool(bool), // not present in tested input Number(i64), // int to skip Eq problems String(InternedString), Array(Vec&lt;ValueIntern&gt;), Map(Vec&lt;(ValueIntern, ValueIntern)&gt;), } </code></pre> <p>To get it to work with <code>serde</code> we need to implement <a href="https://docs.rs/serde/latest/serde/de/trait.Deserialize.html">Deserialize</a> for it:</p> <pre><code>impl&lt;'de&gt; Deserialize&lt;'de&gt; for ValueIntern { fn deserialize&lt;D: Deserializer&lt;'de&gt;&gt;(deserializer: D) -&gt; Result&lt;Self, D::Error&gt; { deserializer.deserialize_any(ValueVisitor {}) } } struct ValueVisitor {} impl&lt;'de&gt; Visitor&lt;'de&gt; for ValueVisitor { type Value = ValueIntern; fn expecting(&amp;self, formatter: &amp;mut fmt::Formatter) -&gt; fmt::Result { write!(formatter, &quot;a json like value&quot;) } fn visit_map&lt;A&gt;(self, mut map: A) -&gt; std::result::Result&lt;Self::Value, A::Error&gt; where A: MapAccess&lt;'de&gt;, { let mut out = vec![]; while let Some((k, v)) = map.next_entry::&lt;ValueIntern, ValueIntern&gt;()? { out.push((k, v)); } Ok(ValueIntern::Map(out)) } fn visit_seq&lt;A&gt;(self, mut seq: A) -&gt; Result&lt;Self::Value, A::Error&gt; where A: SeqAccess&lt;'de&gt;, { let mut v = vec![]; while let Some(e) = seq.next_element()? { v.push(e); } Ok(ValueIntern::Array(v)) } fn visit_borrowed_str&lt;E: serde::de::Error&gt;( self, v: &amp;'de str, ) -&gt; std::result::Result&lt;Self::Value, E&gt; { Ok(ValueIntern::String(InternedString::new(v))) } fn visit_i64&lt;E: serde::de::Error&gt;(self, v: i64) -&gt; std::result::Result&lt;Self::Value, E&gt; { Ok(ValueIntern::Number(v)) } fn visit_u64&lt;E: serde::de::Error&gt;(self, v: u64) -&gt; std::result::Result&lt;Self::Value, E&gt; { Ok(ValueIntern::Number(v as i64)) } } </code></pre> <p>Nothing fancy, just the simplest possible implementation. How fast is it? It's actually slightly faster than plain <code>serde</code>, we gain 2s:</p> <pre><code>Benchmark 3: ./target/release/json-big --input 18_7_G.json --value-type intern Time (mean ± σ): 14.414 s ± 0.047 s [User: 189.484 s, System: 24.711 s] Range (min … max): 14.345 s … 14.493 s 10 runs </code></pre> <h2>Borrowed strings</h2> <p>Can we still make it faster without abandoning <code>serde</code>? Yes! In fact, we can easily modify interned version to get the speedup. The new version of the <code>Value</code> type now gets a lifetime, and stores non-static string references directly:</p> <pre><code>pub enum ValueBorrow&lt;'a&gt; { Number(i64), // int to skip Eq problems String(&amp;'a str), Array(Vec&lt;ValueBorrow&lt;'a&gt;&gt;), Map(Vec&lt;(ValueBorrow&lt;'a&gt;, ValueBorrow&lt;'a&gt;)&gt;), } </code></pre> <p>The only difference in the <code>Deserialize</code> implementation is in the string handling, where we store directly the passed string, instead of first interning it:</p> <pre><code>impl&lt;'de&gt; Deserialize&lt;'de&gt; for ValueBorrow&lt;'de&gt; { fn deserialize&lt;D: Deserializer&lt;'de&gt;&gt;(deserializer: D) -&gt; Result&lt;Self, D::Error&gt; { deserializer.deserialize_any(ValueVisitor {}) } } struct ValueVisitor {} impl&lt;'de&gt; Visitor&lt;'de&gt; for ValueVisitor { type Value = ValueBorrow&lt;'de&gt;; // ... fn visit_borrowed_str&lt;E: serde::de::Error&gt;( self, v: &amp;'de str, ) -&gt; std::result::Result&lt;Self::Value, E&gt; { Ok(ValueBorrow::String(v)) } // ... } </code></pre> <p>If we rerun the benchmark, we will see another 3s improvement which brings us down to 11s:</p> <pre><code>Benchmark 1: ./target/release/json-big --input 18_7_G.json --value-type borrow Time (mean ± σ): 11.264 s ± 1.088 s [User: 119.407 s, System: 35.480 s] Range (min … max): 9.935 s … 13.314 s 10 runs </code></pre> <p>And that's it - no point in wasting more time.</p> <aside> One more thing! I mentioned that this solution scales nicely to lower-end systems and I meant it. I've tested it on 10 years old i5-3340M (2 cores/4 threads) with 8 gigs of ram and an old SSD. The fastest solution with borrowed strings runs in ~74s, while the slowest with plain <code>serde_json::Value</code> runs in ~103s. Pretty good for such an old system and without any changes to the source code I would say! </aside> https://tilde.cat/posts/analyzing-multi-gigabyte-json-files-locally-with-serde Mon, 20 Mar 2023 00:00:00 +0000