[DB-SIG] dbf files and compact indices

Sat Sep 18 13:44:06 EDT 2010

Carl Karsten wrote:
> On Sat, Sep 18, 2010 at 11:16 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> 
>>Carl Karsten wrote:
>>
>>>On Sat, Sep 18, 2010 at 1:11 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>>>
>>>
>>>>Does anybody have any pointers, tips, web-pages, already written
>>>>routines,
>>>>etc, on parsing *.cdx files?  I have found the pages on MS's sight for
>>>>Foxpro, but they neglect to describe the compaction algorithm used, and
>>>>my
>>>>Google-fu has failed to find any sites with that information.
>>>>
>>>>Any and all help greatly appreciated!
>>>>
>>>
>>>
>>>"Compound Index File Structure (.cdx)"
>>>
>>>http://msdn.microsoft.com/en-us/library/k35b9hs2%28v=VS.80%29.aspx
>>>
>>>which basiclly links to:
>>>http://msdn.microsoft.com/en-us/library/s8tb8f47%28v=VS.80%29.aspx
>>>
>>>Is that what you need?
>>
>>Thanks for the link, unfortunately I am already familiar with the page.
>> What I need help with is the first sentence of the note at the bottom:
>>
>>Each entry consists of the record number, duplicate byte count and
>>trailing byte count, all compacted. The key text is placed at the
>>logical end of the node, working backwards, allowing for previous key
>>entries.
>>
>>Here's a dump of the last interior node:
>>
>>-----
>>node type: 2
>>number of keys: 57
>>free space: 1 (or 256) (and is this bits, bytes, keys, what?)
>>--
>>record number mask: c8 0e 40 b0
>>duplicate byte count mask: 28
>>trailing byte count mask: 00
>>--
>>bits used for record number: 178
>>bits used for duplicate count: 29
>>bits used for trail count: 64
>>bytes used for rec num, dup count, trail count: 192
>>-----
>>12 00 ff 3f 00 00 1f 1f 0e 05 05 03 01 00 c8 0e 40 b0 28 00
>>b2 1d 40 c0 29 00 d0 42 40 d0 54 80 c0 43 40 a8 14 40 b8 40
>>40 c8 02 40 d0 08 00 b0 4c 80 b0 3a 40 a0 50 80 d0 3b 40 a8
>>09 40 b8 0a 80 88 3c 80 c0 2a 00 d8 21 c0 c0 3d 40 c0 4a 80
>>b0 26 40 b8 2b 40 c0 2c 00 c0 41 40 b8 4d 80 c8 37 00 c0 04
>>40 c8 44 80 c0 1b 40 c8 15 80 c8 27 40 c8 16 00 a8 2d c0 c8
>>51 80 b8 2e 40 c0 1e 00 b0 17 40 b8 46 40 b0 2f 80 c8 4f 80
>>a8 13 00 c8 59 00 c8 31 00 c8 1f 00 a8 3e 40 c0 22 40 a8 07
>>00 c8 23 80 d0 32 80 b0 52 80 c0 34 80 b0 20 40 b0 24 40 c0
>>47 80 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>00 4e 44 45 4e 49 44 53 4f 4e 43 43 41 4d 4d 4f 4e 54 54 48
>>45 57 53 53 4c 45 4e 52 54 49 4e 45 5a 4e 4e 4d 41 47 45 45
>>49 45 42 45 52 4d 41 4e 45 57 49 4e 53 4c 41 56 45 4e 42 45
>>52 47 4b 41 56 41 4e 4a 4f 4e 45 53 49 52 49 53 48 53 54 45
>>54 4c 45 52 52 41 4e 4f 4c 53 54 45 49 4e 45 41 44 4c 45 59
>>48 41 54 48 41 57 41 59 52 49 4d 45 53 45 41 53 4f 4e 53 53
>>47 4c 41 44 53 54 4f 4e 45 55 52 52 59 4f 53 54 52 49 4e 4b
>>52 42 45 53 4f 4c 45 59 46 49 4c 45 4e 45 4e 49 53 4e 47 4c
>>55 4e 44 45 42 45 52 4c 45 4f 44 53 4f 4e 49 4e 47 4c 45 52
>>4d 41 52 45 53 54 45 43 4b 45 52 54 4f 4e 44 41 59 57 47 45
>>52 52 4e 45 49 4c 2d 53 55 4e 44 54 4f 4f 4b 53 45 59 4c 45
>>4e 44 45 4e 49 4e 55 4e 48 49 41 50 50 45 54 54 41 52 4e 41
>>48 41 4e 43 41 4c 44 57 45 4c 4c 55 54 54 52 55 43 45 4f 43
>>41 52 44 45 4c 4f 4f 4d 42 45 52 47 4e 53 45 4c 45 45 52 42
>>41 43 48 55 47 55 53 54 4e 44 45 52 53 4f 4e 41 4c 4c 41 4e
>>-----
>>
>>The last half (roughly) consists of last names compressed together,
>>while the first half consists of 57 (in this case) entries of the record
>>number, duplicate byte count and trailing byte count, all compacted --
>>how do I uncompact them?
>>
> 
> 
> huh, I see what you mean.
> 
> What are you working on?
> 
> I know a few people that may have the answer, but it would help to
> explain why it is being worked on.
> 
> 

I have a pure-python module to read db3 and vfp 6 dbf files, and I find 
that I need to read (and write) the idx and cdx index files that foxpro 
generates.  We are in the process of switching from homegrown foxpro 
apps to homegrown python apps, but I have to support the foxpro file 
formats until the switch is complete.  Once I have the index files down, 
I'll publish another release of it (an older version can be found on PyPI).

Thanks for your help!

--
~Ethan~