Thursday, April 1, 2010

Perl bitwise string operators

I have had a long-standing and confusing problem in the mod_perl intranet site I develop. An integer intended to be used as a bitmask, and retrieved from a DB_File database via MLDBM, seemed to be intermittenly fluctuating in value. I had been focusing my efforts on what was going on way down inside the DB layer, but the problem was like in circuit debugging when the touch of the oscilloscope probe stops the oscillation you're trying to observe with the oscilloscope.  Adding instrumentation code sometimes changed nothing, sometimes made the problem go away, and sometimes changed the observed bad behavior.


Finally, getting nowhere with that approach, but still believing that the problem was in the DB layer (or MLDBM), I decided just to replace that section with DBM::Deep. I had been wanting to start using it anyway. Luckily it was easy because I wrote that interface code after I learned to modularize, so I only had to change a couple of functions deep in my library.


Of course the problem survived the code transplant, so I started looking at the few bits of suspect code left, when I came across this in something I wrote long long ago (variable names have been changed to protect the innocent; I don't really use names like that in my code :-P ):

$hashref->{key} |= $fluctuating_val;


$hashref was also tied to MLDBM, which is why I had been concentrating on that subsystem. In any case, I started following $fluctuating_val around using Devel::Peek instead of just printing the value itself. VoilĂ ! $fluctuating_val was coming from MLDBM as a PV (string value), and so was $hashref->{key}, and the bitwise operation wasn't giving the expected result. . But I found that the bitwise operation sometimes succeeded when I added debugging print statements. This started to make sense to me when looking with Devel::Peek, because one can follow the internal state of a Perl variable and see it accumulate different kinds of values as it is used in different contexts. One-liner demo:


% perl -e 'use Devel::Peek; my $i="1234"; printf "%s\n",$i; Dump($i); printf "%d\n",$i; Dump($i);'
1234
SV = PV(0x8154b00) at 0x8154714
 REFCNT = 1
 FLAGS = (PADBUSY,PADMY,POK,pPOK)
 PV = 0x8169758 "1234"\0
 CUR = 4
 LEN = 8
1234
SV = PVIV(0x8155b10) at 0x8154714
 REFCNT = 1
 FLAGS = (PADBUSY,PADMY,IOK,POK,pIOK,pPOK)
 IV = 1234
 PV = 0x8169758 "1234"\0
 CUR = 4
 LEN = 8


$i acquires an integer value (IV) when it is accessed as an integer.  That is the way Perl variables are supposed to work.  But what if we access the variable with a bitwise operator?

% perl -e 'use Devel::Peek; my $i="1234"; $i|="5678"; printf "%s\n", $i; Dump($i);'
567

SV = PV(0x8154b00) at 0x8154714  REFCNT = 1
 FLAGS = (PADBUSY,PADMY,POK,pPOK)
 PV = 0x8169748 "567<"\0
 CUR = 4
 LEN = 8

The result of the operation between two PVs is another PV, and the value is not 1234|5678 = 0x4D2|0x162E = 0x16FE = 5886, which is the value I expected. But what if one operand has a numeric value?


% perl -e 'use Devel::Peek; my $i="1234"; $i|=5678; printf "%s\n", $i; Dump($i);'
5886
SV = PVIV(0x8155b10) at 0x8154714
 REFCNT = 1
 FLAGS = (PADBUSY,PADMY,IOK,POK,pIOK,pPOK)
 IV = 5886
 PV = 0x8169748 "5886"\0
 CUR = 4
 LEN = 8


 A PVIV! It behaves differently! And in the way that I want! I changed my problem code to


$hashref->{key} |= 1*$fluctuating_val;

and voilĂ ! again. My problem disappeared, because multiplying by 1 gave the variable an internal numerical value, making the bitwise operator reach the answer I was expecting.






But why? I started Google searching: http://www.google.com/search?hl=en&q=perl+bitwise+pv+iv, which led me to a stackoverflow post entitled "How does Perl decide to treat a scalar as a string or a number?", and a comment made by Leon Timmermans inside it:
Perl [remembers] when a variable is both a valid integer, float or string when either of those is used. However this does not affect the semantics of the variable (except in two cases, bitwise operators and syscall). – Leon Timmermans Dec 1 '08 at 16:08
Bingo.  But how are bitwise operators different? I emailed him to ask, and he helpfully pointed me to the perlop manpage (duh!), specifically a section near the bottom called Bitwise String Operators, which says
If you are intending to manipulate bitstrings, be certain that you're supplying bitstrings: If an operand is a number, that will imply a numeric bitwise operation. You may explicitly show which type of operation you intend by using "" or 0+ .
Turns out 1* works too. This is the first time I've thought about additive and multiplicative identities in who knows how long.


Leon offered this comment in his reply to my email:
This is generally considered dubious behavior, and in Perl 6 string and integral bit-operators will be split, just like all other string and numeric operators. For now, we'll have to live with it though.
Big thanks to Leon (http://search.cpan.org/~leont/).