Managing Gigabytes (Witten)
Information Retrieval (Manning)
Text Compression (Bell)
Natural Language Processing (Manning)
Natural Language Understanding (Allen)
Speech and Language Processing (Jurafsky)
The Text Mining Handbook (Sanger)
Statistical Machine Translation (Koehn)
Data-Intensive Text Processing with MapReduce (Lin)
Algorithms on strings (Gusfield)
Jewels of Stringology (Crochemore)
Regular Expressions (Friedl),
also: http://swtch.com/~rsc/regexp/regexp1.html
and automata theory (Hopcroft)
Practical Text Mining with Perl (Bilisoly)
Natural Language Processing with Python (Bird)
Computational Linguistics (Hausser)
Syntactic structures (Chomsky)
also check out these links: http://measuringmeasures.blogspot.com/2010/01/learning-about...
http://measuringmeasures.com/blog/2010/3/12/learning-about-m...
http://www.cs.technion.ac.il/~gabr/resources/resources.html