I have over 50 unreleased patches. There are some bugfixes, including a compiler one, involving dynamically scoped variables used as optional parameters:
(defvar v)
(defun f (: (v v)))
(call (compile 'f)) ;; blows up in virtual machine with "frame level mismatch"
Patch for that: diff --git a/share/txr/stdlib/compiler.tl b/share/txr/stdlib/compiler.tl
index e76849db..ccdbee83 100644
--- a/share/txr/stdlib/compiler.tl
+++ b/share/txr/stdlib/compiler.tl
@@ -868,7 +868,7 @@
,*(whenlet ((spec-sub [find have-sym specials : cdr]))
(set specials [remq have-sym specials cdr])
^((bindv ,have-bind.loc ,me.(get-dreg (car spec-sub))))))))))
- (benv (if specials (new env up nenv co me) nenv))
+ (benv (if need-dframe (new env up nenv co me) nenv))
(btreg me.(alloc-treg))
(bfrag me.(comp-progn btreg benv body))
(boreg (if env.(out-of-scope bfrag.oreg) btreg bfrag.oreg))
There is now support in the printer for limiting the depth and length.I added a derived hook into the OOP system; a struct being notified that it is being inherited.
That section made me chuckle. Admirable if true.
DSLs, otoh, are in short supply. while awk or plain sed are great for shell programming, this is the only (open source) DSL i'm aware of targeting certain types of NLP-esque "munging". this space is mostly full of statistical approaches, which, while conceptually pure, don't allow the kind of flexibility that would be useful in many applications.
i wonder if, eventually, the DSL portion of TXR could be sheared off (possibly via metacircular evaluation of the TXR lisp?) into something that's portable across lisps or at least to semi-standardized scheme implementations?
Mostly true for very high level languages like Lisp/Scheme, or ML/OCaml/F#/Haskell, when faced against not-so-high-level languages like C, C++, Java.
Against Racket, i wouldn't be so sure. Nor against Ruby.
Python and Javascript are high level languages but they are crippled by some bad design decisions.
"Customized sort based on multiple columns of CSV". In R, something like this: `library(tidyverse); read_delim("file.tsv", delim = "@") %>% arrange(.[[2]]) %>% group_by(.[[2]]) %>% arrange(match(.[[3]], c("arch.", "var." "ver.", "anci.", "fam.")), .[[3]]) %>% group_by(.[[2]], .[[3]]) %>% mutate(n = n()) %>% arrange(desc(n)) %>% ungroup() %>% select(1:4)`
"Extract text from HTML table". In R, something like this would suffice: `library(rvest); library(tidyverse); read_html(URL_GOES_HERE) %>% html_nodes("div.scoreTableArea") %>% html_table() %>% write_delim("out.csv", delim = "\t")`
"Get n-th Field of Each Create Referring to Another File". In R: `library(tidyverse); file1 = read_delim("file1.txt", delim = " ", col_names = FALSE); chunks = readChar("file2.txt", 999999) %>% str_split(";") %>% unlist() %>% map(function(x) { matches = str_match(str_trim(x), '^create table "(.)"([^(])\\(((.|\n)*)\\)$'); title = matches[, 2]; fields = matches[, 4] %>% str_split(",") %>% unlist() %>% str_trim(); return(tibble(table_name = rep(title, length(fields)), n = 1:length(fields), field = fields)) }) %>% bind_rows(); file1 %>% left_join(chunks, by = c("X1" = "table_name", "X2" = "n"))`
The third example trades off a little clarity for a little robustness by adding a regex instead of assuming the SQL table definition is one field per line.
TXR Lisp has support for that type of functional transformation of structured data, with fairly tidy syntax. If a need for a full blown HTML parsing library arises, someone will come up with one; maybe me. It could end up integrated into the TXR flex/Yacc parser, which would make it fast.
In the "Get n-th Field" task, what we can do is snarf the data as a string, then remove all the commas and semicolons. It then parses as a TXR Lisp with the lisp-parse function, resulting in this:
(create table (qref "def" something)
(f01 char (10) f02 char (10) f03 char (10) f04 date)
create table (qref "abc" something)
(x01 char (10) x02 char (1) x03 char (10))
create table (qref "ghi" something)
(z01 char (10) z02 intr (10) z03 double (10) z04 char (10) z05 char (10)))
That seems to open an avenue to a solution. E.g. we can now partition it into pieces that start with the create symbol: 28> (partition *26 (op where (op eq 'create)))
((create table (qref "def" something) (f01 char (10) f02 char (10) f03 char (10) f04 date))
(create table (qref "abc" something) (x01 char (10) x02 char (1) x03 char (10)))
(create table (qref "ghi" something) (z01 char (10) z02 intr (10) z03 double (10) z04 char (10) z05
char (10))))
Now the (qref "def" something) parts are in fixed positions, followed by fixed-shape triplets.Only problem with this type of solution is that it takes the example data too literally. The user's actual data might not cleanly parse this way.
If you just put two spaces of indentationo on every line, you get a verbatim block in typewriter font,
like
this."Good luck, you're on your own!"
The HTML version that most people would be using has a TOC with two-way navigation to the section headings and is hyperlinked. Of course, man page reading allows easy searching.
Another edit preserving more of the original would be to replace the final "with no" with something like "even excluding any"...
Basic TXR matching is really quite simple. Match some patterns, generate a report at the end. The patterns are interleaved with the matching text, so it's more like a more powerful version of regexprs (but far more readable), than a normal programing language.
You can learn it quickly based on the provided examples.
It's just a few straight forward commands, although you have to wrap your mind how the backtracing parser works.
Most of the manual is about the LISP. I never used that part and I don't think it's really needed for 95+% of all text parsing/summarizing.
Edit: 10 years ago in this case.
Most transformations that we do on data do not require Turing completeness or recursion. I think it would be useful to write these down in a language with semantics that is easy to analyze.
I don't see why we would want to rule out a pattern function invoking itself (directly, or through intermediaries); if that hurts, then just don't do that.
(Though I understand that there are languages deliberately designed without unbounded loops or recursion, for justifiable reasons.)
"It's statically-typed and type-infered.
It also infers memory consumption and guarantees O(n) memory use.
It is designed for concise one-liner computations right in the shell prompt.
It features both a mathematics library and a set of data slicing and aggregation primitives.
It is faster than all other interpreted languages with a similar scope. (Perl, Python, awk, ...)
It is not Turing-complete. (But can compute virtually anything nonetheless.)
It is self-contained: distributed as a single statically linked binary and nothing else.
It has no platform dependencies."
I am a little suspicous that you may be the author ;)
(PERL = Perversion Excused by Random Lispiness)
1. Parsimony.
2. Performance vs awk and friends.
3. Multi threading.
4. Ideal use cases.
For these things TXR is great.
If you want to do multi threading or best performance it's probably not the thing to use.
I tend use Notepad++ when starting out on a data-wrangling adventure. It has an uncanny ability, unlike any other editor, to open hundreds of files at the same time and to perform regex operations on all of them without dropping dead. I uses Notepad++ for initial manual exploration to get the lay of the problem, and then switch to R for the actual analysis.
I assume, then, that your file sizes are not so big. N++ is not good with big (>25% of your ram) file sizes, refusing to open them.
Is R/tidyverse also limited on the size of the file it can handle? In my job i routinely work with up to 100GB files.
https://i.imgur.com/pvCnmSa.png
I can accept that doing something non-standard leads to some rough edges like this, but i'm not sure how many web developers know this is an issue. At least it has surprised me how many websites have this issue of assuming the default color is bright white.