I have java code trimming a UTF-8 string to the size of my Oracle (11.2.0.4.0) column which ends up throwing an error because java and Oracle see the string as different byte lengths. I've verified my NLS_CHARACTERSET parameter in Oracle is 'UTF8'.
I wrote a test which illustrates my issue below using the unicode chipmunk emoji (️)
public void test() throws UnsupportedEncodingException, SQLException {
String squirrel = "\uD83D\uDC3F\uFE0F";
int squirrelByteLength = squirrel.getBytes("UTF-8").length; //this is 7
Connection connection = dataSource.getConnection();
connection.prepareStatement("drop table temp").execute();
connection.prepareStatement("create table temp (foo varchar2(" + String.valueOf(squirrelByteLength) + "))").execute();
PreparedStatement statement = connection.prepareStatement("insert into temp (foo) values (?)");
statement.setString(1, squirrel);
statement.executeUpdate();
}
This fails on the last line of the test with the following message:
ORA-12899: value too large for column
"MYSCHEMA"."TEMP"."FOO" (actual: 9, maximum: 7)
The setting of NLS_LENGTH_SEMANTICS is BYTE. Unfortunately, I cannot change this as it is a legacy system. I'm not interested in increasing the column size, just reliably being able to predict the Oracle size of a string.