It doesn't allocate or buffer, so when it says that it verifies proper nesting, I assume that doesn't mean verifying that start tags and end tags have matching names. I think that would be impossible.
I've been looking for a decent C parser similar to this to try some experiments with. Dealing with XML can be dreary, but I like to try to think of how you might make an API that doesn't make your code suck. It would be at a higher level, so I might be able to use this.
It looks like you pass the parse function a character at a time. To be fast, you'd want to somehow encourage that to get inlined. I'm not sure how easy or possible it is when you put the function in a separate .c file. There would probably be ways to stuff everything into the header.
I'm interested particularly in parsers that don't buffer. It seems like a fundamentally more flexible and potentially better way to parse XML is by memory mapping and letting the VM do what it does best.
> It doesn't allocate or buffer, so when it says that it verifies proper nesting, I assume that doesn't mean verifying that start tags and end tags have matching names. I think that would be impossible.
It does match start tags and end tags, and in order to do so it does, indeed, need a small buffer. However, the only thing that needs to be stored in this buffer is a stack of element names, and that rarely needs much space - a simple fixed-size buffer is sufficient. It can be safely allocated on the stack if the application wants to. See the documentation for the yxml_init() for more details. The "mostly-bufferless" part of the homepage refers to the lack of a complicated input buffer and the avoidance of complex buffer management for output where possible.
> It looks like you pass the parse function a character at a time. To be fast, you'd want to somehow encourage that to get inlined.
The function is quite large. You can probably get it inlined when put inside a header file and when it's only called in a single place, and you may indeed see a performance boost. I haven't measured how significant such an improvement may be.
However, if performance is the primary goal (it wasn't for yxml), a completely different design would probably be better. Yxml needs to manage an explicit state object for each input character. For each new character it has to look at the state to know how it's handled, handle it, and then update the state for the next character. I was honestly surprised to see that yxml performed quite well in my benchmarks.
A more efficient design is probably the goto-based state machine approach that, e.g. ragel[1] can output. The reason I didn't go for that approach is that it complicates buffering. You either need to have the full file mapped into memory (not possible when reading from a socket or from a compressed stream), or you need to implement your own buffering and provide hooks so that the application can fill the buffer when it is depleted. The latter approach is used by most other XML parsers.