Thursday, April 1, 2010

Perl bitwise string operators

I have had a long-standing and confusing problem in the mod_perl intranet site I develop. An integer intended to be used as a bitmask, and retrieved from a DB_File database via MLDBM, seemed to be intermittenly fluctuating in value. I had been focusing my efforts on what was going on way down inside the DB layer, but the problem was like in circuit debugging when the touch of the oscilloscope probe stops the oscillation you're trying to observe with the oscilloscope.  Adding instrumentation code sometimes changed nothing, sometimes made the problem go away, and sometimes changed the observed bad behavior.

Finally, getting nowhere with that approach, but still believing that the problem was in the DB layer (or MLDBM), I decided just to replace that section with DBM::Deep. I had been wanting to start using it anyway. Luckily it was easy because I wrote that interface code after I learned to modularize, so I only had to change a couple of functions deep in my library.

Of course the problem survived the code transplant, so I started looking at the few bits of suspect code left, when I came across this in something I wrote long long ago (variable names have been changed to protect the innocent; I don't really use names like that in my code :-P ):

$hashref->{key} |= $fluctuating_val;

$hashref was also tied to MLDBM, which is why I had been concentrating on that subsystem. In any case, I started following $fluctuating_val around using Devel::Peek instead of just printing the value itself. VoilĂ ! $fluctuating_val was coming from MLDBM as a PV (string value), and so was $hashref->{key}, and the bitwise operation wasn't giving the expected result. . But I found that the bitwise operation sometimes succeeded when I added debugging print statements. This started to make sense to me when looking with Devel::Peek, because one can follow the internal state of a Perl variable and see it accumulate different kinds of values as it is used in different contexts. One-liner demo:

% perl -e 'use Devel::Peek; my $i="1234"; printf "%s\n",$i; Dump($i); printf "%d\n",$i; Dump($i);'
SV = PV(0x8154b00) at 0x8154714
 PV = 0x8169758 "1234"\0
 CUR = 4
 LEN = 8
SV = PVIV(0x8155b10) at 0x8154714
 IV = 1234
 PV = 0x8169758 "1234"\0
 CUR = 4
 LEN = 8

$i acquires an integer value (IV) when it is accessed as an integer.  That is the way Perl variables are supposed to work.  But what if we access the variable with a bitwise operator?

% perl -e 'use Devel::Peek; my $i="1234"; $i|="5678"; printf "%s\n", $i; Dump($i);'

SV = PV(0x8154b00) at 0x8154714  REFCNT = 1
 PV = 0x8169748 "567<"\0
 CUR = 4
 LEN = 8

The result of the operation between two PVs is another PV, and the value is not 1234|5678 = 0x4D2|0x162E = 0x16FE = 5886, which is the value I expected. But what if one operand has a numeric value?

% perl -e 'use Devel::Peek; my $i="1234"; $i|=5678; printf "%s\n", $i; Dump($i);'
SV = PVIV(0x8155b10) at 0x8154714
 IV = 5886
 PV = 0x8169748 "5886"\0
 CUR = 4
 LEN = 8

 A PVIV! It behaves differently! And in the way that I want! I changed my problem code to

$hashref->{key} |= 1*$fluctuating_val;

and voilĂ ! again. My problem disappeared, because multiplying by 1 gave the variable an internal numerical value, making the bitwise operator reach the answer I was expecting.

But why? I started Google searching:, which led me to a stackoverflow post entitled "How does Perl decide to treat a scalar as a string or a number?", and a comment made by Leon Timmermans inside it:
Perl [remembers] when a variable is both a valid integer, float or string when either of those is used. However this does not affect the semantics of the variable (except in two cases, bitwise operators and syscall). – Leon Timmermans Dec 1 '08 at 16:08
Bingo.  But how are bitwise operators different? I emailed him to ask, and he helpfully pointed me to the perlop manpage (duh!), specifically a section near the bottom called Bitwise String Operators, which says
If you are intending to manipulate bitstrings, be certain that you're supplying bitstrings: If an operand is a number, that will imply a numeric bitwise operation. You may explicitly show which type of operation you intend by using "" or 0+ .
Turns out 1* works too. This is the first time I've thought about additive and multiplicative identities in who knows how long.

Leon offered this comment in his reply to my email:
This is generally considered dubious behavior, and in Perl 6 string and integral bit-operators will be split, just like all other string and numeric operators. For now, we'll have to live with it though.
Big thanks to Leon (

Monday, October 5, 2009

rooting my G1

I had been resisting rooting my T-Mobile G1 Android phone, but finally tiring of the sluggish performance of the hardware, and encouraged by Twitter friend @cym0n and this article, I decided to do it. It took a while to collect all the information I needed and get it done, but I managed.  I chose to use CyanogenMod 4.0.4.  I think it's no longer available because of the Google C&D issue, but I'll summarize anyway.  I basically followed RyeBrye's article "Android Rooting in 1-click".
  • backed up the contents of the SD card
  • recorded all the settings I wanted to reapply (WiFi passwords, notification ringtones, etc.)
  • installed Recovery Flasher (I got it from, which may or may not still work)
  • ran Recovery Flasher, "reboot to recovery mode"
  • partition the (8GB) SD card following I chose to use a 1GB ext4 partition for the apps2sd section, so my partition table looks like this (via parted run from adb shell):

    Number  Start   End     Size    Type     File system     Flags
     1      32.3kB  6893MB  6893MB  primary  fat32           lba
     2      6893MB  7916MB  1023MB  primary  ext4
     3      7916MB  7948MB  32.2MB  primary  linux-swap(v1)

  • ran Recovery Flasher again, downloaded CyanogenMod 4.0.4 image, "Back up Recovery image", and rebooted to recovery mode
  • at recovery screen, run "nandroid backup"
  • run "wipe data"
  • run "Apply any zip from SD", select CyanogenMod image
  • reboot
  • download and copy its contents to the root directory of the SD card
  • reinstalled apps from the market
  • got the MyFaves.apk file and installed it (don't quite remember where I got it; maybe here?)
  • installed "Overclock Widget" from the market and set it for 528 (screen on) and 128 (screen off)
And that pretty much got it going.  It didn't go as smoothly as this summary indicates; I had to retry a number of things until I got it figured out.

If I were doing this again, one thing I would do differently would be to use ASTRO or something like that to back up all the apps to the SD card, which would make reinstalling them a lot easier.  Especially now that I have learned to use adb from the Android SDK.

The phone has been much more fun to use since rooting.

Monday, September 7, 2009

culinary reference books

I like to keep a lot of reference books in my house. Perhaps this goes back to the pre-world-wide-web days, but even with the existence of Wikipedia, having a shelf full of answers is still satisfying. The largest sub-category in my reference library is Food & Drink, constituting a separate section of the kitchen bookshelf from the cookbooks.

The New Food Lover's Companion
This is a fantastic book and makes a great gift for the food-inclined. My copy is actually the 2nd edition, entitled just "The Food Lover's Companion". There is so much useful information in here it's hard to even summarize it.

On Food and Cooking
Harold McGee's classic reference on the science and history of all kinds of food and cooking techniques. Want to know why it's helpful to keep pastry dough cold? This is the place to check.

Cheese Primer
Steven Jenkins explains it all.

Windows on the World Complete Wine Course
I always go to Kevin Zraly with my amateurish wine questions. My copy is the original edition from 1985! Time for an update?

Somehow I find myself without a reference book on beer, despite its place in my life as my favorite beverage. Can you recommend an addition to my bookshelf to fill this gaping hole?

Thursday, August 13, 2009

debugging a memory leak in a Perl module

I needed to add a (small) wiki to the intranet web site I develop (in Perl) for work. Some customization and integration with the existing site was required, so I couldn't just drop in a standalone wiki package. But I also didn't want to roll my own from scratch. I settled on using Wiki::Toolkit from CPAN because it takes care of all the low-level details and includes an interface to SQLite, which I'm already using for a number of other purposes in the site.

A crucial requirement for this wiki is full-text searching. Wiki::Toolkit provides interfaces to three different search backends:
  • DBIx::FullTextSearch - Uses MySQL to index. I chose not to use this because I don't have a MySQL installation and because this backend doesn't provide fuzzy searching, which I would like to use.
  • Search::InvertedIndex - Can use different databases, including SQLite, but doesn't provide phrase searching, which I definitely need.
  • Plucene - A Perl interface to the Lucene search engine. Provides both fuzzy searching and phrase searching. I decided to use this.
I have since learned that this might not have been the best idea, due to performance concerns: Perhaps I should have tried KinoSearch. But I still would have needed to write a plugin for Wiki::Toolkit to incorporate it. And it turns out my wiki database is small enough that the Plucene performance hit isn't noticeable, so Plucene it is.

Except for one complication. Following is a condensed version of the learning process I went through in trying to resolve the complication. I've intentionally written this a bit pedantically to remind myself about the tools and concepts I learned about along the way.

Once I got the search function running, I began getting errors in the form of "Too many files open". It turned out there was a filehandle leak somewhere in the Plucene modules. The filehandles were being opened in Plucene::Store::InputStream, then never closed. Tracing the Plucene::Store::InputStream objects, I found they were never getting destroyed, hence the leak. Thus commenced a brute-force examination of the Plucene modules, figuring out which object contained which object, so I could eventually track down which object wasn't going out of scope. This approach didn't get me very far.

Then, a breakthrough! Playing with the system, I realized that the leak only occurred when I was searching for multiple terms. I started overriding various Plucene library methods to produce stack dumps at various helpful places. This way I determined that Plucene constructs queries for single terms using Plucene::Search::TermQuery, but when there are multiple terms connected with AND / OR, it uses Plucene::Search::BooleanQuery. This narrowed down the list of suspects quite dramatically, and I confirmed the diagnosis using Adam Kennedy's Devel::Leak::Object to examine the objects remaining when my test program (written to demonstrate the problem) exits:

Plucene::Index::FieldInfo 720
Plucene::Index::FieldInfos 120
Plucene::Index::FieldsReader 120
Plucene::Index::Norm 600
Plucene::Index::SegmentReader 120
Plucene::Index::SegmentTermDocs 240
Plucene::Index::SegmentTermEnum 120
Plucene::Index::SegmentsTermDocs 16
Plucene::Index::Term 3152
Plucene::Index::TermInfo 3016
Plucene::Index::TermInfosReader 120
Plucene::Search::BooleanScorer 8
Plucene::Search::BucketCollector 16
Plucene::Search::BucketTable 8
Plucene::Search::TermScorer 16
Plucene::Store::InputStream 960

Of particular interest is the fact that the test program executes the search in a fixed length loop; in the case that produced this output there were 8 iterations, and there are (rather suggestively) 8 each of the Plucene::Search::{BooleanScorer,BucketTable} objects. Looking at the code I found that the Plucene::Search::BooleanQuery object contains the BooleanScorer object, which contains the BucketTable object, which points back to the BooleanScorer object! Sure enough, the circular references are revealed using Lincoln Stein's Devel::Cycle:

Cycle (1):
=> \%Plucene::Search::BucketTable::B
=> \%Plucene::Search::BooleanScorer::A

Knowing this, how do I fix it? I played with it a bit, attempting to figure out when the objects are supposed to be destroyed, and manually undefining them. But then I found this: Object::Destroyer (Adam Kennedy again!) to the rescue!

# Plucene::Search::BooleanScorer has circular references that cause a
# memory leak in this persistent mod_perl setting. Override the
# constructor to add Object::Destroyer which will break the circular
# references
require Plucene::Search::BooleanScorer;
use Object::Destroyer;

*Plucene::Search::BooleanScorer::release = sub {
my $self = shift;

my $old_PSBnew = \&Plucene::Search::BooleanScorer::new;
*Plucene::Search::BooleanScorer::new = sub {
my $result = $old_PSBnew->(@_);
return Object::Destroyer->new($result, 'release');

It is so convenient to be able to modify / insert Perl library methods this way so one doesn't have to edit the source or use local copies. Learning about Perl symbol tables has served me well.

I filed a bug report for Plucene, with an example patch to Plucene/Search/ I don't know how useful it will be, but I figured I should share what I've learned in case it's useful to someone.

Wednesday, July 29, 2009

Edimax EW-7318USg USB WiFi adapter working with Linux

I wanted to experiment with a WiFi adapter on my desktop Linux box (Kubuntu 8.04), and I chose the Edimax EW-7318USg USB stick which is reported all over the web to work well under Linux. Of course it didn't for me.

I followed the procedure described in an Ubuntu forum thread to build the rt73 driver. That worked fine, but when I plugged in the adapter, it seemed that the system failed to recognize it as a network adapter:

Jul 29 12:35:31 encona kernel: [80244.903574] usb 5-8: new high speed USB device using ehci_hcd and address 12
Jul 29 12:35:32 encona kernel: [80245.174467] usb 5-8: configuration #1 chosen from 1 choice
I ended up learning a bunch about udev and sysfs, and found that the adapter didn't seem to be reporting its MAC address:

looking at parent device '/devices/pci0000:00/0000:00:1d.7/usb5/5-8':
ATTRS{bNumInterfaces}==" 1"
ATTRS{version}==" 2.00"
ATTRS{product}=="802.11 bg WLAN"
So I thought, aha! maybe there is something wrong with the adapter! How can the system know that it's a network adapter without a MAC address? There should be an ATTRS{address} entry!

I tried installing this thing on my Macbook, with the same problem. I was really starting to think that there was a hardware problem, since it failed to work on two different operating systems.

But my guess was wrong. In continued research, I happened across a post on Electric Shaman describing exactly this problem: Edimax EW-7318USg with the RT73 Enhanced Driver.

Duh. The driver didn't know the vendor id and product id. I added that in like Jeff suggested and rebuilt the driver. It's working now. Thanks Jeff.

I believe the reason it didn't work on my Macbook is that I installed the driver off the CD, which perhaps also doesn't know the (presumably) new IDs.

Monday, July 13, 2009

kwin memory leak confirmed?

I think it's pretty much confirmed that my problem is indeed due to a memory leak in kwin. I've been running a cron job every minute to gather memory usage information on kwin. The result:

(this is from the RSS field generated by the command "ps v --no-heading -C kwin"* command)

Each instance where the trace maxes out then drops down again is a time when I started getting the problem and restarted kwin. The flat spots are nights and weekends when I'm not using the computer.

Now, what to do about it? I guess I should report a bug. This is Kubuntu 8.04, so it's a little out of date, but it's an LTS release.

*I am amused by having mixed all three styles of command-line arguments in that ps command.