[1756489 views]

[]

Odi's astoundingly incomplete notes

New entries | Code

External tables in Oracle

Oracle can use plain-text files and provide them to the DB engine like read-only tables. That's great for master data imports.
The syntax to declare those tables is borrowed from SQLLoader. SQLLoader as we know is very powerful. It can read practically any type of CSV-like file. The most common problems with plain text files are usually:
People tend to make considerable efforts to convert both character encodings and line endings when they process flat files. Of course that's easy on Linux with the iconv and dos2unix/unix2dos tools.

With external tables (or SQLLoader) you can leave this job to Oracle - and it's even damn good at it. Oracle knows practically every oh so weird character encoding used on this planet. And it can also deal with explicit or platform specific line endings if you tell it. An external table definition looks like this:
CREATE TABLE EXT_MYTABLE
(
F1 VARCHAR2(100 BYTE),
F2 VARCHAR2(100 BYTE),
F3 VARCHAR2(100 BYTE)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY EXT_MYTABLES
ACCESS PARAMETERS
( RECORDS DELIMITED BY '\r\n' CHARACTERSET AL32UTF8
BADFILE 'mytable.bad'
DISCARDFILE 'mytable.dis'
LOGFILE 'mytable.log'
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL
)
LOCATION (EXT_MYTABLES:'mytable.csv')
)
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;
The important part is the RECORDS DELIMITED BY and CHARACTERSET parameters. Just specify it correctly and the data will appear correctly in the DB. If you are not sure about the character encoding you may want to use the Linux file command. Used with a representative file it can guess the encoding using sophisticated heuristics:
bash-3.2$ file my.csv
my.csv: ISO-8859 text
Similarly you can determine the line endings with hexdump (0D is \r, 0A is \n):
bash-3.2$ hexdump -C my.csv| head
00000000 4d fc 6e 63 68 65 6e 3b 31 30 30 3b 34 37 0a 42
00000010 e4 72 6e 3b 32 38 32 3b 32 39 0a




posted on 2007-02-16 10:50 UTC in Code | 0 comments | permalink