[DB-SIG] dbf files and compact indices

Ethan Furman ethan at stoneleaf.us
Sat Sep 18 12:16:12 EDT 2010


Carl Karsten wrote:
> On Sat, Sep 18, 2010 at 1:11 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> 
>>Does anybody have any pointers, tips, web-pages, already written routines,
>>etc, on parsing *.cdx files?  I have found the pages on MS's sight for
>>Foxpro, but they neglect to describe the compaction algorithm used, and my
>>Google-fu has failed to find any sites with that information.
>>
>>Any and all help greatly appreciated!
>>
> 
> 
> "Compound Index File Structure (.cdx)"
> 
> http://msdn.microsoft.com/en-us/library/k35b9hs2%28v=VS.80%29.aspx
> 
> which basiclly links to:
> http://msdn.microsoft.com/en-us/library/s8tb8f47%28v=VS.80%29.aspx
> 
> Is that what you need?

Thanks for the link, unfortunately I am already familiar with the page.
  What I need help with is the first sentence of the note at the bottom:

Each entry consists of the record number, duplicate byte count and
trailing byte count, all compacted. The key text is placed at the
logical end of the node, working backwards, allowing for previous key
entries.

Here's a dump of the last interior node:

-----
node type: 2
number of keys: 57
free space: 1 (or 256) (and is this bits, bytes, keys, what?)
--
record number mask: c8 0e 40 b0
duplicate byte count mask: 28
trailing byte count mask: 00
--
bits used for record number: 178
bits used for duplicate count: 29
bits used for trail count: 64
bytes used for rec num, dup count, trail count: 192
-----
12 00 ff 3f 00 00 1f 1f 0e 05 05 03 01 00 c8 0e 40 b0 28 00
b2 1d 40 c0 29 00 d0 42 40 d0 54 80 c0 43 40 a8 14 40 b8 40
40 c8 02 40 d0 08 00 b0 4c 80 b0 3a 40 a0 50 80 d0 3b 40 a8
09 40 b8 0a 80 88 3c 80 c0 2a 00 d8 21 c0 c0 3d 40 c0 4a 80
b0 26 40 b8 2b 40 c0 2c 00 c0 41 40 b8 4d 80 c8 37 00 c0 04
40 c8 44 80 c0 1b 40 c8 15 80 c8 27 40 c8 16 00 a8 2d c0 c8
51 80 b8 2e 40 c0 1e 00 b0 17 40 b8 46 40 b0 2f 80 c8 4f 80
a8 13 00 c8 59 00 c8 31 00 c8 1f 00 a8 3e 40 c0 22 40 a8 07
00 c8 23 80 d0 32 80 b0 52 80 c0 34 80 b0 20 40 b0 24 40 c0
47 80 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 4e 44 45 4e 49 44 53 4f 4e 43 43 41 4d 4d 4f 4e 54 54 48
45 57 53 53 4c 45 4e 52 54 49 4e 45 5a 4e 4e 4d 41 47 45 45
49 45 42 45 52 4d 41 4e 45 57 49 4e 53 4c 41 56 45 4e 42 45
52 47 4b 41 56 41 4e 4a 4f 4e 45 53 49 52 49 53 48 53 54 45
54 4c 45 52 52 41 4e 4f 4c 53 54 45 49 4e 45 41 44 4c 45 59
48 41 54 48 41 57 41 59 52 49 4d 45 53 45 41 53 4f 4e 53 53
47 4c 41 44 53 54 4f 4e 45 55 52 52 59 4f 53 54 52 49 4e 4b
52 42 45 53 4f 4c 45 59 46 49 4c 45 4e 45 4e 49 53 4e 47 4c
55 4e 44 45 42 45 52 4c 45 4f 44 53 4f 4e 49 4e 47 4c 45 52
4d 41 52 45 53 54 45 43 4b 45 52 54 4f 4e 44 41 59 57 47 45
52 52 4e 45 49 4c 2d 53 55 4e 44 54 4f 4f 4b 53 45 59 4c 45
4e 44 45 4e 49 4e 55 4e 48 49 41 50 50 45 54 54 41 52 4e 41
48 41 4e 43 41 4c 44 57 45 4c 4c 55 54 54 52 55 43 45 4f 43
41 52 44 45 4c 4f 4f 4d 42 45 52 47 4e 53 45 4c 45 45 52 42
41 43 48 55 47 55 53 54 4e 44 45 52 53 4f 4e 41 4c 4c 41 4e
-----

The last half (roughly) consists of last names compressed together,
while the first half consists of 57 (in this case) entries of the record
number, duplicate byte count and trailing byte count, all compacted --
how do I uncompact them?

--
~Ethan~




More information about the Python-list mailing list