[Tutor] positional output

Fri Nov 11 15:01:39 CET 2011

On Fri, Nov 11, 2011 at 1:59 PM, Cranky Frankie
<cranky.frankie at gmail.com> wrote:
> Thank you for your help on this. Now for "the rest of the story."
>
> I'm trying to build a script to parse IBM AIX DB2 DDL to line up the
> data types (it drives me crazy when the column data types are not
> lined up). For example, typical create table DDL might be hundreds of
> lines long but will look like this:
>
> --
> -- table create DDL
> --
> CREATE TABLE FRANK.TEST (
> COLUMN1     DECIMAL(8),
> COLUMN2           CHAR(20),
> COLUMN3                    TIMESTAMP,
> COLUMN4     INTEGER,
> COLUMN5          DATE NOT NULL WITH DEFAULT,
> -- repeat for hundreds of columns
> );
> COMMENT ON TABLE FRANK.TEST IS 'TEST TABLE';
>
> This is just a small sample, there are many other possible lines, but
> I'm only concerned about the column lines like COLUMN1 through COLUMN5
> above.
>
> I have a script on Windows that reads in the DDL file and writes out
> each line to a new file. What I'm doing is using the split() function
> to test for the presence of any DB2 standard data type, like CHAR,
> INTEGER, SMALINT, etc. If I find one I want to use positional output
> to make each like look like:
>
> COLUMN1                        DECIMAL(8),
> COLUMN2                        CHAR(20),
> COLUMN3                        TIMESTAMP,
> COLUMN4                         INTEGER,
> COLUMN5                         DATE NOT NULL WITH DEFAULT,
>
> where all the COLUMNs would be in column 1 of the output file, the
> data types would be in column 40, and the comma would be next.
>
> The problem is handling lines that have NOT NULL WITH DEFAULT at the
> end. The thing is there could be other valid DDL in that position, and
> there may or may not be a comma after the data type. What I want to do
> is just take anything after the datatype, which would be element(1) in
> the split() output, and just write it out. I thought I could use
> rsplit() to do this, but you can't put the output of split() in
> rsplit() - I tried.
>
> I need to do something like, after verifying that element(1) is a
> valid DB2 datatype, just take everything else on the line after it,
> which may be a single comma, or NOT NULL WITH DEFAULT, or something
> else, and place it on the output line to be written out.
>
> So, to reiterate: I'm trying to build a script to line up the data
> types in a create table DDL file. Splitting each line into individual
> space separated elements, then checking for a valid data type, the
> rebuilding the line positionally seems to be the way to go. If there's
> an easyier way to do it I'm all ears.
>
> I don't have my script available at the moment but I could send it if
> it would be helpful.
>

I would suggest using str.split(None, 1) to limit the amount of splits
done to only 1, which means you get a list like ['COLUMN', 'DATA TYPE
ETC'].

Then you use the str.startswith function to check the second entry for
data types, something like so (note startswith accepts a tuple of
strings to look for:

if splitted_line[1].startswith(data_types):
    print "%s %s" % (splitted_line[0].ljust(39), splitted_line[1])

HTH,
Hugo