Friday, February 7, 2020

Difference between StringTokenizer and Split method in Java?

StringTokenizer, Split method, Oracle Java Study Materials, Oracle Java Prep, Oracle Tutorial and Material, Oracle Java Guides

There are multiple ways to split a String in Java, but two of the most common ways are by using StringTokenizer and split method of String class. You can use either one of them, but when to use which one? This short article will give you some details about StringTokenizer and Split method in Java to decide which one to use.

1) The StringTokenizer is legacy, Prefer split() as more chances of its performance getting improved as happens in Java 7.

2) The StringTokenizer doesn't support regular expression, while spilt() does. However, you need to be careful, because every time you call split, it creates a new Pattern object and compiles expression into a pattern. This means if you are using the same pattern with different input, then consider using Pattern.split() method, because compiling a pattern takes more time later to check whether a given string matches a pattern or not.

3) The String.split() method returns an array (String[]) and Tokenizer returns one token at a time. which makes it easy to use a foreach loop:

for (String token : input.split("\\s+") { ... }

4) The StringTokenizer doesn't handle empty strings well. But split() does. like if you need to parse empty tokens, like a comma-separated line like

one,,three,,,six

Where the field values are "one", "", "three", "", "" and "six" where the three empty strings are indicated by the commas with nothing between them - that's a lot more work with a StringTokenizer.

By default, it gives you just "one", "three", "six" and skips the empties. You can use a special constructor that takes a boolean to tell the StringTokenizer to return delimiters, but that gets complicated too. I'll skip the details. It's much easier to use split(","), which immediately returns {"one", "", "three", "", "", "six"}, exactly right?

StringTokenizer, Split method, Oracle Java Study Materials, Oracle Java Prep, Oracle Tutorial and Material, Oracle Java Guides

5) I think the most significant difference is: with a StringTokenizer, the delimiter is just one character long. You supply a list of characters that count as delimiters, but in that list, each character is a single delimiter. With split(), the delimiter is a regular expression, which is something much more powerful (and more complicated to understand). It can be any length. Regular expressions may be harder to understand at first, but when you learn how to use them, they're much more useful.

6) Actually, String.split() doesn't always compile the pattern. Look at the source if 1.7 java, you will see that there is a check if the pattern is a single character and not an escaped one, it will split the string without regexp, so it should be quite fast.

7) Since the String split builds a new Pattern every time, it is bound to be slower than StringTokenizer. If you have a lot of Strings to operate on, creating the Pattern once and using the Pattern split() method would be the way to go for maximum speed.

8) For StringTokenizer, there is a constructor with a parameter that allows you to specify possible delimiter characters.

9)  Here is the code which uses StringTokenizer to split a String into multiple small strings:

 StringTokenizer st = new StringTokenizer("this is a test");

 while(st.hasMoreTokens()) {
         System.out.println(st.nextToken());
 }

 String[] result = "this is a test".split("\\s");
 for (int x=0; x < result.length(); x++){
    System.out.println(result[x]);
 }

 Output
      this
     is
     a
     test

That's all about the difference between StringTokenizer and the Split method in Java. You can see that both provide elegant ways to split a big string into multiple String based upon specific delimiter, but StringTokenizer is legacy and you should avoid that. Prefer split() method of String class whenever possible as it also supports regular expression.

Related Posts

0 comments:

Post a Comment