Sep 26, 2013

Oracle Database Charactersets

For a non-dba this a pretty confusing topic. Still gone are the days when only DBA's were installing Oracle. A lot of people now who have very limited knowledge of Oracle are today running databases on their laptops.

First of all what is this all about. This is about loading multiple languages into your application and how to manage it with Oracle. Let's verify your current settings

select parameter,
value from v$nls_parameters
WHERE parameter IN ( 'NLS_CHARACTERSET', 'NLS_LANGUAGE', 'NLS_TERRITORY','NLS_NCHAR_CHARACTERSET','NLS_LENGTH_SEMANTICS');

Here is the output of the query:
PARAMETER    VALUE
NLS_LANGUAGE    AMERICAN
NLS_TERRITORY    AMERICA
NLS_CHARACTERSET    AL32UTF8
NLS_NCHAR_CHARACTERSET    AL16UTF16
NLS_LENGTH_SEMANTICS    BYTE

First question - what is the oracle default Characterset for my database?
The answer to that is NLS_CHARACTERSET

What is NLS_NCHAR_CHARACTERSET - in layman terms if you are not using NCHAR, NVARCHAR or NCLOB forget about it. It is not important for you.

What is a unicode?
Unicode data sets allow databases to store asian, eastern european data in your alpha numeric and other columns. In other words, this provides a handle to implement multi language support.

What is the difference between AL32UTF8 and AL16UTF16 character sets?
If I store the letter 'a' in a NVARCHAR2 column, Oarcle would allocate 2 bytes of storage if the NLS_NCHAR_CHARACTERSET is AL16UTF16. If I store the letter 'a' in a NVARCHAR2 column, Oracle would allocate 1 byte of storage if the NLS_NCHAR_CHARACTERSET is AL32UTF8

If you are storing primarily English data, AL16UTF16 will cause Oracle to consume nearly twice as much space on disk and in RAM for that data as would be required if you used an AL32UTF8 character set.
This hold true even for the NLS_CHARACTERSET

What is NLS_LENGTH_SEMANTICS?
While creating tables Oracle needs to know how much space is to be allocated.
For example when you define COLUMNONE which has data type of VARCHAR2 20. Oracle needs to know how much 20 means - is it 20 Bytes or is it 20chars.
Oracle uses the NLS_LENGTH_SEMANTICS setting and the value for the parameter is used to translate 20

Why is the Byte and Char difference important?
Well if you are primarily talking about English language (and hence Latin Characterset) every char is one byte. However Eastern European languages and even Russian (Cyrillic) have 2 bytes to represent one char, asian languages have sometimes 3 and there are some others with four bytes.

My Database is unicode but I am not able to store asian languages?
Check your table scripts - the field that you are trying to insert data into is it defined as CHAR or BYTE?
Query  ALL_TAB_COLUMNS and check for the value in CHAR_USED - if it is B it means Byte, if it is C it means Char.
For example in my case I was trying to insert 16 chars of Russian into the VARCHAR2 (30) field on my database table LOCATIONS in the LOCATION_NAME column and it was throwing me and error.
The NLS_LENGTH_SEMANTICS was set as Byte and 16 russian chars meant 32 bytes which was beyiond the size.

ALTER TABLE LOCATIONS
MODIFY(LOCATION_NAME VARCHAR2(30 CHAR));
After the above change the same 16 chars were loaded successfully.

What is the fix for your application?
I hope your application supports the muti byte chars and all you are looking for in that case is alter statements. The beauty is since you are going to expand the columns Oracle will never create a problem.

Firstly change your DB setting by logging into your sys account on database
ALTER SYSTEM SET NLS_LENGTH_SEMANTICS=CHAR;

And if you are using SQLLDR it needs to support multiple languages
CHARACTERSET UTF8

Now you need to restart your database. You may do so by logging into sqlplus as sysdba and ussing "shutdown immediate" and "startup" one after the other. Please make sure you understand these and have taken precautions required.

Then login to your schema and run the following query
SELECT 'ALTER TABLE '|| TABLE_NAME ||' MODIFY( ' || COLUMN_NAME||'  VARCHAR2('||  DATA_LENGTH || ' CHAR));' QUERY
FROM user_TAB_COLUMNS WHERE  DATA_TYPE in ('VARCHAR2','CHAR')
and table_name not in (select view_name from user_views);


You may spool the results and run the output as a sql script. Basically for every CHAR and VARCHAR2 datatype in your database it will change the definition from BYTE to CHAR.


Want to read more - here is a beautiful article.

http://orathings.blogspot.com/2012/08/happy-three-friends-varchar2-unicode.html

No comments: