00:04 elcaro left 00:05 elcaro joined
timotimo m: say 4605054973 * 100 / 4594950376 00:11
camelia 100.21990655335
timotimo m: say 100 * (4605054973 R/ 4594950376)
camelia 99.78057597446
01:30 leont left, leont joined 03:04 leont left 04:04 linkable6 left, evalable6 left 04:06 evalable6 joined 04:07 linkable6 joined
nwc10 jnthn: oops, yes, was supposed to be marked ready for review. It is now 06:35
MasterDuke: nice. Particularlly the LLi miss improvement
that's actually bigger than my guess. If we can find more things like that, it will all add up. 06:36
08:07 Altai-man joined 08:12 sena_kun joined 08:14 Altai-man left
MasterDuke yeah, not sure how noticeable it was in time, but lots of little optimizations add up 08:56
09:03 zakharyas joined 09:05 Altai-man joined 09:07 sena_kun left 09:09 domidumont joined 10:35 frost-lab joined
timotimo if we could cheaply sort worklists by repr ... 10:54
jnthn The number of reprs is fixed, so you could have per-repr worklists...but why? :) 10:58
10:58 Kaiepi left, Kaiepi joined
timotimo keep the instruction cache hot by running the same repr's gc_mark over and over 10:59
jnthn Wonder if that's a measurable improvement... 11:00
timotimo in cachegrind, probably, in wallclock, probably not
nine I fear the only way to know is to try
jnthn I guess the downside is it would scatter object graphs more
timotimo ah, since new objects are allocated in the nursery again 11:01
jnthn Just that we move objects as we encounter them in the list, and so objects that reference each other are sometimes adjacent in the worklist and so end up copied into the other semispace or into gen2 one after the other, although it's less true of gen2 given the free list 11:02
So you *might* lose some memory cache hits
timotimo it's also less true of gen2 given that different reprs are likely to have slightly different sizes 11:03
jnthn But I've no idea how these two effects would play off each other
Or if either is even going to be significant
timotimo yeah
nwc10 it sounds like quite a bit of work (or am I wrong on that part?) and complexity (or wrong?) for potentially marginal gain. 11:04
timotimo yeah
i wouldn't mind a factor of 2 speedup for the gc, but this is not how we get that 11:05
nwc10 usually that sort of speed up is "better algorithm" but that's never easy, even if it's possible. 11:06
timotimo well, there is a whole research field for "better algorithm" in GC
but many of those better algorithms are not trivial to adopt to a whole system 11:07
lizmat there's even a book about it: www.bookdepository.com/The-Garbage...gKUB_D_BwE
oops
timotimo like when you have to add not only write barriers but also read barriers to all your gc-object-using C code when you change to a concurrent GC
was concurrent the word for when the gc runs while mutators also run?
lizmat www.bookdepository.com/The-Garbage...1420082791 # better link 11:08
timotimo 1kg of book for almost a hundred bucks
nwc10 jnthn pointed me at this a few weeks ago: sqlite.1065341.n5.nabble.com/50-fas...78082.html -- ... is 50% faster than the 3.7.17 release 11:09
from 16 months ago. That is to say, it does 50% more work using the same
jnthn I read that book (well, most of it) before working on the MoarVM GC :)
nwc10 number of CPU cycles.
a lot of small wins can add up. 11:10
timotimo right
nwc10 as to read barriers, my fear would be "and then the code is even more complex, and fewer people understand it, and more time is lost to bugs than was gained from speedup"
timotimo yes, absolutely
getting more and more code ported from C to nqp or similar would be a way to get this smoothed out 11:11
that's also not easy, either
lizmat fwiw, I'm considering moving the shaped array code to Raku land 11:15
to get more flexibility
the code basically predates Christmas and has been untouched basically since then
and we have now better HLL optimizing 11:16
and it would make it easier to port shaped arrays to new VM's
(think .NET :-)
MasterDuke are you talking about code that's currently in moarvm or nqp>
?
jnthn Except that the CLR and JVM both natively provide shaped arrays too 11:17
lizmat yes
jnthn And they are more efficient there than resizable ones
Heck, at the VM level that's probably true in MoarVM also; it doesn't have to do any resize check logic
lizmat fact is, that shaped arrays are still at least 5x as slow as unshaped arrays 11:18
in Raku land :-(
jnthn Yes, but that appears to be related to type checking and method resolution issues.
lizmat well, yes
but that doesn't matter to people wanting to use it :-)
jnthn Anyway, big -12 11:19
uh, -1
Unless we find we somehow can't fix the type/method issues 11:20
lizmat well, it's my intent to document all of the related nqp ops first
and grok how that part of Rakudo actually works
my prototype atm is about 5x as fast the current shaped array performance 11:21
and might well get merged with the current backend implementation if we can fix the type checking / resolution issues 11:22
jnthn Well, does the new thing use the multidim repr? That's the key part to the VM having a clue what to do with it 11:25
MasterDuke running `my $a; for 1..5 -> $x { for 1..5 -> $y { $a = $x gcd $y } }; say now - INIT now; say $a` with MVM_SPESH_DISABLE=1, why in the world would end up in this branch three times? github.com/MoarVM/MoarVM/blob/mast...ops.c#L453
jnthn (And getting the compact memory layout)
MasterDuke: Maybe the numbers produced by `now` are big enough? 11:27
m: say now.WHAT
camelia (Instant)
jnthn m: say now.^mro
camelia ((Instant) (Cool) (Any) (Mu))
jnthn I forget how Instant is represented though
MasterDuke but i'm gcd'ing `$x` and `$y`, not `now` 11:28
jnthn If now is involving rational arithmetic anywhere that uses gcd internally 11:29
lizmat jnthn: it would use a single array, with a single index internally, so it would be compact
timotimo can always bt and mvm_dump_backtrace
jnthn Maybe breakpoint it and...what timo said
lizmat jnthn: in any case, I'm exploring this in module space :-) 11:31
jnthn ok 11:32
Geth MoarVM/update-docs: ae5f7ad447 | (Elizabeth Mattijsen)++ | 8 files
Update some docs to Raku era

Unless they're specifically historically inclined.
11:54
MoarVM: lizmat++ created pull request #1394:
Update some docs to Raku era
11:55
MasterDuke interesting. i ran Daniel Lemire's benchmark code of a bunch of different gcd implementations. there is a version that takes half the time as moarvm's implementation. but if i stick it in moarvm, my example gets 1s slower (1.7s -> 2.7s) 12:14
m: my $a; for 1..5_000 -> int $x { for 1..5_000 -> int $y { $a = $x gcd $y } }; say now - INIT now; say $a
camelia 3.8432893
5000
MasterDuke hm. with spesh disabled current is still about 1s faster, but the absolute times have increased (10.4s -> 11.9s) 12:16
oh wait, i might be reading his benchmark results backwards 12:19
12:22 zakharyas left
MasterDuke would using __builtin_ctz() be a portability problem for moarvm? 12:50
looks like it's not available in visual studio 12:58
but it's only 1s faster when doing 100_000_000 gcds, this probably isn't worth it 13:04
lizmat how is memory doing ? 13:05
that could be another reason ?
I mean gcd gets used a lot for Rats
MasterDuke memory should be identical 13:06
13:07 sena_kun joined 13:09 Altai-man left
jnthn MasterDuke: typical way with unportable things is a probe to see if it's available, and a fallback approach if not 13:27
MasterDuke yeah, looks like there's a _BitScanReverse that can be used instead. but all told it doesn't seem worth the trouble right now 13:30
13:44 Geth left 13:45 Geth joined 14:03 lucasb joined 14:06 bartolin left, bartolin joined 14:09 zakharyas joined 14:25 leont joined 14:27 frost-lab left
lizmat some unexpected timings: github.com/Raku/nqp/issues/685 15:33
jnthn Can you try using $a? 15:38
Or declaring it outside of the loop?
(I suspect spesh will be dropping the atpos entirely)
lizmat try using $a ? 15:39
jnthn Yes, at the moment it's an unused variable, and atpos is a pure operation
lizmat .670 vs .1458 15:40
.670 vs 1.458
so no change really, the slow one being a little faster ? 15:41
not even that
jnthn OK, was curious how much that would be part of it
I suspect it's the extra allocations
lizmat are there other side effects to nqp::shift?
jnthn Well, the point of shift is to have an effect :) 15:42
lizmat I found one case in nqp where an iterator is used to iterate over a list just for the number of elements in the list
so not actually using the nqp::shift($iter) value
jnthn Huh, in a place it could just use nqp::elems?
lizmat yes
jnthn oops
lizmat vm/moar/QAST/QASTRegexCompilerMAST.nqp line 303 15:43
if nqp::iterator / nqp::shift would be faster than manual indexing, a lot of Rakudo internals could also benefit from that, fwiw 15:47
also: changing the nqp::list to a nqp::list_i, makes it worse 15:48
aah... oops 15:49
jnthn I wondered how much it could be GC overhead of allocating iterator objects, but it's not
lizmat hmmm... looks like it does make things way worse 15:50
jnthn So yeah, certainly room for improvement
lizmat $ time nqp -e 'my $l := nqp::list_i(1,2,3,4,5,6,7,8,9,10); my int $j := 10000000; while $j-- { my $iter := nqp::iterator($l); nqp::while($iter, my int $a := nqp::shift($iter)) }' # 2.830
jnthn Yes, because shift returns an object
So it's doing a box/unbox every element
lizmat ah...
jnthn Anyway, given it's not GC overhead, then it's the shift/boolification that wants a look 15:51
lizmat looks like... 15:52
would be nice if an nqp::iterator / nqp::shift combo would be faster
jnthn Probably can be 15:53
lizmat should I tackle that one case where nqp::shift() is not being used ?
jnthn Yeah, go for it
lizmat will do
github.com/Raku/nqp/commit/829f1d42f9 15:57
nine lizmat: using nqp::shift_i instead of nqp::shift in your example is 60 % faster 15:59
lizmat ahhh.... so feels like effectively, nqp::iterator($list) is about the same as nqp::clone($list) ? 16:00
jnthn No, it just creates an object with an index and a pointer to the list 16:01
lizmat fwiw: I was working on documenting nqp ops, and found nqp::iterator listed under list ops, rather than hash ops 16:03
jnthn It's both, I guess
lizmat yes, and then I remembered why I wasn't using it for lists
because it was slower
I hadn't realized how much slower 16:04
jnthn But still being used enough to make it worth speeding up? 16:05
lizmat afaik, nqp::iterator is *not* used in the core on lists because it was slower
however, that means that a lot of the NQP code is doing manual indexing, which is more error prone from a maintenance point of view 16:06
basically anything that runs over an IterationBuffer or a $!reified in Rakudo
nine FWIW I don't see anything that's obviously slow about nqp::iterator 16:07
lizmat and there's quite a lot of that
but shouldn't we look at nqp::shift() ?
nine Oh, now I do
nqp::atpos can be devirtualized by the JIT. nqp::shift on an nqp::iterator however always goes through REPR(target)->pos_funcs.at_pos 16:08
So the shift on the iterator is devirtualized, but not the following at_pos
jnthn That plus the integer addition and comparison are JIT straight into assembly code, whereas the boolification is maybe still a C function call 16:09
nine So keeping track of the iteration position and using nqp::atpos in HLL is actually a perfect example of how using smaller, less powerful operations leads to better optimization opportunities for the VM
jnthn That plus another example of how things change when you have a JIT rather than interpret everything 16:10
lizmat so, would that be easily fixable? 16:11
nine And spesh which takes more credit for devirtualization
lizmat or shall we deprecate support for nqp::iterator(List)
nine It's certainly possible to extend spesh and the JIT to do deep devirtualization and get rid of the boolification slow down. But then just not using nqp::iterator would get us to the same place much easier. 16:13
lizmat yeah, feels like a lot of work to get to a point we already are in most cases 16:14
otoh, those where the explicit cases of using nqp::iterator
when we say "for @array { }" would that not codegen to a nqp::iterator thing?
jnthn In NQP, yes 16:15
lizmat that happens a lot more in NQP
jnthn I wonder how we can move that to an iterator object in NQP code
Should be quite possible 16:16
And then rely on inlining to make it cheaper
Should probably check that it really does come out just as well
lizmat not quite following what should be quite possible
nine To implement an iterator object in pure NQP 16:17
jnthn Replacing the use of nqp::iterator in NQP for array iterations
So we can drop VM-levels support for nqp::iterator(List) 16:18
*level
lizmat class ListIterator { has $!list; has int $index = -1; method shift() ... } ? 16:19
nine That + has $limit = nqp::elems($!list); 16:21
lizmat I seem to recall that there is no point in doing that, as the nqp::elems() gets optimized pretty quickly
nine It's necessary for keeping the same semantics though. 16:23
Now if we even want to keep those semantics is another question
lizmat huh? why would that be needed for keeing the same semantiics ? 16:24
*keeping
MasterDuke fwiw, it looks like there (at least) a couple cases of `nqp::iterator(@...)` in the rakudo core 16:25
lizmat MasterDuke: there are?
hmmm.
nine It's a difference when the array gets changed during the loop (push or pop)
lizmat aah... ok, and nqp::shift() doesn't follow that currently?
then yes
MasterDuke looks like 9 where it's explicitly an '@'-sigiled variable, and a couple more where it's probably a list even if not '@' 16:30
lizmat yeah... looking at them now 16:32
17:00 rypervenche left 17:03 rypervenche joined 17:06 Altai-man joined 17:08 sena_kun left 17:35 patrickb joined 18:01 domidumont left 18:58 zakharyas left 20:49 zakharyas joined 21:07 sena_kun joined 21:09 Altai-man left 21:26 patrickb left
Geth MoarVM: ae5f7ad447 | (Elizabeth Mattijsen)++ | 8 files
Update some docs to Raku era

Unless they're specifically historically inclined.
21:46
MoarVM: a595d9ddc4 | (Jonathan Worthington)++ (committed using GitHub Web editor) | 8 files
Merge pull request #1394 from MoarVM/update-docs

Update some docs to Raku era
21:55 zakharyas left, sena_kun left 21:56 sena_kun joined 22:24 sena_kun left, sena_kun joined 22:35 sena_kun left