In such cases, working with Python/BeautifulSoup4 and importing the clean and normalized data into R will save frustration over time, even offsetting the overhead of using two languages.
BUT Rvest is a BeautifulSoup inspired library and works pretty much the same way?
<jobs-list> <job> <employer> YCombinator </employer> <position> ... </position> </job> </jobs-list>
Something like that?
I know what everyone will say, it is so terse and convoluted, but maybe something like
<ul semantic-markup="jobs"> <li semantic-markup="job"> <p semantic-markup="job-employer"> YCombinator </p> <p semantic-markup="job-position"> ... </p> </li> </ul>
Seems like a lot of work though...maybe I take that back.
down voted for calling people newbs. Also it always depends on what tool works best.
Well done Hadley Wickham being inspired by libraries like Beautiful Soup and bringing a great tool to R.