Whitespace, Shmitespace

I finally have my Sirius radio installed! I have an XM already installed here, and it is aimed perfectly; I get tons of satellite signal.  However, aiming the Sirius was a little problematic. The instructions say the satellite is in the northern midwest, so if you aim at North Dakota from wherever you are, your antenna has tons of aspect with the satellite transmitter. For whatever reason that method is not working. I wonder if since Sirius and XM merged things are working a bit differently?  I would think that I would know about any new satellites, but I can’t explain why my current aiming gets any signal at all.  My guess is that all of the houses around are causing problems, but I get a good view with the XM. Radio engineering is complicated.

Most of today was spent learning how Java handles XML.  It’s like every other XML parser out there (it is basically just Xerces, I think), and at this point it is old hat.  But I still need to write my little playground tests to see just how things work. Inevitably, there is a wrinkle!

For some reason the Java (I am using Java SE 1.7) XML Text class does not correctly report when a node is only whitespace.  I don’t understand the purists who insist that including all of the tabs, newlines, returns, and spaces should be there for all cases.  They are pretty much useless if you are just trying to grab data.  I understand needing them if you want to replicate a document or something, but every parser should be able to discard those nodes during parsing so that it reduces the amount of checking for these useless pieces when you are actually doing something with the XML.

I may be doing something wrong, but from looking at the docs I think everything is correct. In fact, for the DocumentBuilderFactory I set setIgnoringElementContentWhitespace to true and everything still had the various whitespace thingies.  From looking around the web it seems that this is a regression in Java SE 1.6, but I figured it would be fixed in 1.7? Guess not.

So I added a method to my parser class (as many people said had to be done):

/**
* Return whether a text node is whitespace only or not
* @param tn The text node to check
* @return true if the node is whitespace only, false if it has content other than whitespace
*/
public boolean IsNodeWhitespaceOnly(Text tn)
{
    // when it starts working again, return isElementContentWhitespace
    // now, have to iterate through the content

    String value=tn.getTextContent();

    for(int i=0;i<value.length();++i)
        {
            switch(value.charAt(i))
            {
            case '\n':
            case '\r':
            case '\t':
            case ' ':
                break;

            default:
                // non-whitespace found
                return false;
            }
        }

    // only whitespace found

    return true;
}

This works fine for testing Text nodes, but it is a bummer to have to incur this overhead after it could be easily set when initially parsing the data.  I hate to stay focused on this much more, but I have to rule out my Java ignorance. I need to make myself believe that Oracle is lazy and didn’t fix the bug in 1.7.  Maybe I can make a tape repeating that while I sleep.

Anyway, with the basics of XML figured out, next is writing the parser that stores the maze XML data in Maze class data structures.  Things are heating up!

Leave a Reply