I spent part of the last two days solving a fun billing problem (if such a thing exists). We had a voice customer on the wrong rate plan, and had to correct the billing using his call data records (CDRs) in CSV format. I had the correct price list, also in CSV format. Enter Super Python Man!
Calls are matched against rate by their “destination”, which is the most specific match against the first digits of the number. The right data structure for this job is a Prefix Trie (http://en.wikipedia.org/wiki/Trie). I implemented a simple one using nested Python dictionaries. This is not space efficient, but it’s fine for processing 40,000 destinations no more than 7 nested levels deep on a modern PC. Once I had a working implementation, processing creating the Trie and processing the 110,000 calls took 1.4 seconds.
The code, alas, is not ready for publication, even here. I will do so once I’ve cleaned it into a generic, space-efficient implementation for number prefixes.