Did you download the data from here?: http://www.google.com/googlebooks/uspto-patents.html
If so, how much clearer do you need it to be?:
> Patent Application Publication Full Text (2001 – present)
It is applications that were published from 2001 to present. Prior to 2001 applications were not published, that is why it begins in 2001.
The dataset also does have applications from prior to 2000, it just has (relatively) fewer.
>No guarantees are made with respect to the completeness or accuracy of this data.
And this:
>As of 2012-05-26, we have data for 1946194 patent applications, including most of the published applications in the following ranges:
... long list of application serial numbers omitted ...
The serial numbers in the list directly map to dates, as the serial numbers are sequentially assigned as the apps get filed. They just don't provide the serial number to date mapping data for whatever reason.
Further, it is data retrieved by a crawler.... Note the first sentence:
>Google has begun crawling patent documents, including image file wrappers, from the USPTO's public PAIR (Patent Application Information Retrieval) site.
So it has, unsurprisingly, only that which it has crawled and retrieved. It is not comprehensive, and it makes no assertions that it is comprehensive.