Java StringTokenizer tutorials

Home arrow Java Tutorials arrow Java Essentials arrow Java StringTokenizer tutorials
Java StringTokenizer tutorials Print E-mail
Written by Administrator   
Friday, 16 June 2006

Java StringTokenizer Overview

The string tokenizer class allows an application to break a string into tokens. The tokenization method is much simpler than the one used by the StreamTokenizer class. The StringTokenizer methods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments.

The set of delimiters (the characters that separate tokens) may be specified either at creation time or on a per-token basis.

An instance of StringTokenizer behaves in one of two ways, depending on whether it was created with the returnDelims flag having the value true or false:

  • If the flag is false, delimiter characters serve to separate tokens. A token is a maximal sequence of consecutive characters that are not delimiters.
  • If the flag is true, delimiter characters are themselves considered to be tokens. A token is thus either one delimiter character, or a maximal sequence of consecutive characters that are not delimiters.

The following is example of the use of the tokenizer. The code:

     StringTokenizer st = new StringTokenizer("Hello java");
     while (st.hasMoreTokens()) {
         System.out.println(st.nextToken());
     }
 prints the following output:

     Hello
     java

The following example illustrates how the String.split method can be used to break up a string into its basic tokens:

     String[] result = "this is a test".split("\\s");
     for (int x=0; x<result.length; x++)
         System.out.println(result[x]);
 prints the following output:

     this
     is
     a
     test

java.io.StreamTokenizer is another tokenizing class. It has a more complicated API and has more powerful features than StringTokenizer.

Parsing a String into Tokens Using a Regular Expression( Java StringTokenizer tutorial)


 This example implements a tokenizer that uses regular expressions. The use of this tokenizer is similar to the StringTokenizer class in that you use it like an iterator to extract the tokens.
    CharSequence inputStr = "a 1 2 b c 3 4";
    String patternStr = "[a-z]";
   
    // Set to false if only the tokens that match the pattern are to be returned.
    // If true, the text between matching tokens are also returned.
    boolean returnDelims = true;
   
    // Create the tokenizer
    Iterator tokenizer = new RETokenizer(inputStr, patternStr, returnDelims);
   
    // Get the tokens (and delimiters)
    for (; tokenizer.hasNext(); ) {
        String tokenOrDelim = (String)tokenizer.next();
    }
    // "", "a", " 1 2 ", "b", " ", "c"
   
    class RETokenizer implements Iterator {
        // Holds the original input to search for tokens
        private CharSequence input;
   
        // Used to find tokens
        private Matcher matcher;
   
        // If true, the String between tokens are returned
        private boolean returnDelims;
   
        // The current delimiter value. If non-null, should be returned
        // at the next call to next()
        private String delim;
   
        // The current matched value. If non-null and delim=null,
        // should be returned at the next call to next()
        private String match;
   
        // The value of matcher.end() from the last successful match.
        private int lastEnd = 0;
   
        // patternStr is a regular expression pattern that identifies tokens.
        // If returnDelims delim is false, only those tokens that match the
        // pattern are returned. If returnDelims true, the text between
        // matching tokens are also returned. If returnDelims is true, the
        // tokens are returned in the following sequence - delimiter, token,
        // delimiter, token, etc. Tokens can never be empty but delimiters might
        // be empty (empty string).
        public RETokenizer(CharSequence input, String patternStr, boolean returnDelims) {
            // Save values
            this.input = input;
            this.returnDelims = returnDelims;
   
            // Compile pattern and prepare input
            Pattern pattern = Pattern.compile(patternStr);
            matcher = pattern.matcher(input);
        }
   
        // Returns true if there are more tokens or delimiters.
        public boolean hasNext() {
            if (matcher == null) {
                return false;
            }
            if (delim != null || match != null) {
                return true;
            }
            if (matcher.find()) {
                if (returnDelims) {
                    delim = input.subSequence(lastEnd, matcher.start()).toString();
                }
                match = matcher.group();
                lastEnd = matcher.end();
            } else if (returnDelims && lastEnd < input.length()) {
                delim = input.subSequence(lastEnd, input.length()).toString();
                lastEnd = input.length();
   
                // Need to remove the matcher since it appears to automatically
                // reset itself once it reaches the end.
                matcher = null;
            }
            return delim != null || match != null;
        }
   
        // Returns the next token (or delimiter if returnDelims is true).
        public Object next() {
            String result = null;
   
            if (delim != null) {
                result = delim;
                delim = null;
            } else if (match != null) {
                result = match;
                match = null;
            }
            return result;
        }
   
        // Returns true if the call to next() will return a token rather
        // than a delimiter.
        public boolean isNextToken() {
            return delim == null && match != null;
        }
   
        // Not supported.
        public void remove() {
            throw new UnsupportedOperationException();
        }
    }

 


Improve tokenization of information-rich strings (Java StringTokenizer tutorial )

In this article, you'll take advantage of the commonly used StringTokenizer class to perform better tokenization of complicated and information-rich strings.

StringTokenizer limitations
You can create a StringTokenizer by using any one of the following three constructors:

  • StringTokenizer(String sInput): Breaks on white space (" ", "\t", "\n").
  • StringTokenizer(String sInput, String sDelimiter): Breaks on sDelimiter.
  • StringTokenizer(String sInput, String sDelimiter, boolean bReturnTokens): Breaks on sDelimiter, but if bReturnTokens is set to true, then the delimiter is also returned as a token.

Last Updated ( Friday, 07 July 2006 )

  home              contact us

 

©2006-2009 DeveloperZone.biz   All rights reserved     powered by Mambo Designed by Siteground