Java StringTokenizer OverviewThe string tokenizer class allows an application to break a string into tokens. The tokenization method is much simpler than the one used by the StreamTokenizer class. The StringTokenizer methods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments.
The set of delimiters (the characters that separate tokens) may be specified either at creation time or on a per-token basis. An instance of StringTokenizer behaves in one of two ways, depending on whether it was created with the returnDelims flag having the value true or false: - If the flag is false, delimiter characters serve to separate tokens. A token is a maximal sequence of consecutive characters that are not delimiters.
- If the flag is true, delimiter characters are themselves considered to be tokens. A token is thus either one delimiter character, or a maximal sequence of consecutive characters that are not delimiters.
The following is example of the use of the tokenizer. The code: StringTokenizer st = new StringTokenizer("Hello java"); while (st.hasMoreTokens()) { System.out.println(st.nextToken()); } prints the following output: Hello java
The following example illustrates how the String.split method can be used to break up a string into its basic tokens: String[] result = "this is a test".split("\\s"); for (int x=0; x<result.length; x++) System.out.println(result[x]); prints the following output: this is a test java.io.StreamTokenizer is another tokenizing class. It has a more complicated API and has more powerful features than StringTokenizer. Parsing a String into Tokens Using a Regular Expression( Java StringTokenizer tutorial)
This example implements a tokenizer that uses regular expressions. The use of this tokenizer is similar to the StringTokenizer class in that you use it like an iterator to extract the tokens. CharSequence inputStr = "a 1 2 b c 3 4"; String patternStr = "[a-z]"; // Set to false if only the tokens that match the pattern are to be returned. // If true, the text between matching tokens are also returned. boolean returnDelims = true; // Create the tokenizer Iterator tokenizer = new RETokenizer(inputStr, patternStr, returnDelims); // Get the tokens (and delimiters) for (; tokenizer.hasNext(); ) { String tokenOrDelim = (String)tokenizer.next(); } // "", "a", " 1 2 ", "b", " ", "c" class RETokenizer implements Iterator { // Holds the original input to search for tokens private CharSequence input; // Used to find tokens private Matcher matcher; // If true, the String between tokens are returned private boolean returnDelims; // The current delimiter value. If non-null, should be returned // at the next call to next() private String delim; // The current matched value. If non-null and delim=null, // should be returned at the next call to next() private String match; // The value of matcher.end() from the last successful match. private int lastEnd = 0; // patternStr is a regular expression pattern that identifies tokens. // If returnDelims delim is false, only those tokens that match the // pattern are returned. If returnDelims true, the text between // matching tokens are also returned. If returnDelims is true, the // tokens are returned in the following sequence - delimiter, token, // delimiter, token, etc. Tokens can never be empty but delimiters might // be empty (empty string). public RETokenizer(CharSequence input, String patternStr, boolean returnDelims) { // Save values this.input = input; this.returnDelims = returnDelims; // Compile pattern and prepare input Pattern pattern = Pattern.compile(patternStr); matcher = pattern.matcher(input); } // Returns true if there are more tokens or delimiters. public boolean hasNext() { if (matcher == null) { return false; } if (delim != null || match != null) { return true; } if (matcher.find()) { if (returnDelims) { delim = input.subSequence(lastEnd, matcher.start()).toString(); } match = matcher.group(); lastEnd = matcher.end(); } else if (returnDelims && lastEnd < input.length()) { delim = input.subSequence(lastEnd, input.length()).toString(); lastEnd = input.length(); // Need to remove the matcher since it appears to automatically // reset itself once it reaches the end. matcher = null; } return delim != null || match != null; } // Returns the next token (or delimiter if returnDelims is true). public Object next() { String result = null; if (delim != null) { result = delim; delim = null; } else if (match != null) { result = match; match = null; } return result; } // Returns true if the call to next() will return a token rather // than a delimiter. public boolean isNextToken() { return delim == null && match != null; } // Not supported. public void remove() { throw new UnsupportedOperationException(); } }
Improve tokenization of information-rich strings (Java StringTokenizer tutorial )In this article, you'll take advantage of the commonly used StringTokenizer class to perform better tokenization of complicated and information-rich strings. StringTokenizer limitations You can create a StringTokenizer by using any one of the following three constructors: - StringTokenizer(String sInput): Breaks on white space (" ", "\t", "\n").
- StringTokenizer(String sInput, String sDelimiter): Breaks on sDelimiter.
- StringTokenizer(String sInput, String sDelimiter, boolean bReturnTokens): Breaks on sDelimiter, but if bReturnTokens is set to true, then the delimiter is also returned as a token.
|