Java: looking for the fastest way to check String for presence of Unicode chars in certain range -

- June 15, 2012

i need implement crude language identification algorithm. in world, there 2 languages: english , not-english. have arraylist , need determine if each string in english or other language has unicode chars in range. want check each string against range using type of "presence" test. if passes test, string not english, otherwise it's english. want try 2 type of tests:

test-any: if char in string falls within range, string passes test
test-all: if chars in string fall within range, string passes test

since array might long, need implement efficiently. fastest way of doing in java?

thx

update: checking non-english looking @ specific range of unicodes rather checking whether characters ascii, in part take care of "resume" problem mentioned below. trying figure out whether java provides classes/methods implement test-any or test-all (or similar test) efficiently possible. in other words, trying avoid reinventing wheel if wheel invented before me better anyway.

here's how ended implementing test-any:

// test-any string str = "wordtotest"; int urangelow = 1234; // can range e.g. http://www.utf8-chartable.de/unicode-utf8-table.pl int urangehigh = 2345; for(int iletter = 0; iletter < str.length() ; iletter++) {    int cp = str.codepointat(iletter);    if (cp >= urangelow && cp <= urangehigh) {       // word not english       return;    }  } // word english return;

Search This Blog

EXC

Java: looking for the fastest way to check String for presence of Unicode chars in certain range -

Comments

Post a Comment

Popular posts from this blog

django - How can I change user group without delete record -

java - Need to add SOAP security token -

java - EclipseLink JPA Object is not a known entity type -