Nice project! Not a quarantine project, but we've been building data frame abstractions in Python for genetics [1] [2]. We spent a lot of time studying the existing abstractions (pandas, R/dplyr, pyspark, etc.) Desinging a data frame in Python is an interesting and challenging problem. Our design is far from perfect, but I think we've found an interesting design point. Here's your example in Hail:
>>> vehicles = hl.import_table('vehicles.csv', impute=True, delimiter=',', quote='"')
>>> t = vehicles.filter(vehicles.make == "Saab")
>>> t = t.order_by(t.year)
>>> t.show(3)
+-------+--------+-------+-------+----------------+-------------------+---------------------+-------+----------+-----------+-------+-------+
| id | make | model | year | class | trans | drive | cyl | displ | fuel | hwy | cty |
+-------+--------+-------+-------+----------------+-------------------+---------------------+-------+----------+-----------+-------+-------+
| int32 | str | str | int32 | str | str | str | int32 | float64 | str | int32 | int32 |
+-------+--------+-------+-------+----------------+-------------------+---------------------+-------+----------+-----------+-------+-------+
| 380 | "Saab" | "900" | 1985 | "Compact Cars" | "Automatic 3-spd" | "Front-Wheel Drive" | 4 | 2.00e+00 | "Regular" | 19 | 16 |
| 381 | "Saab" | "900" | 1985 | "Compact Cars" | "Automatic 3-spd" | "Front-Wheel Drive" | 4 | 2.00e+00 | "Regular" | 21 | 16 |
| 382 | "Saab" | "900" | 1985 | "Compact Cars" | "Manual 5-spd" | "Front-Wheel Drive" | 4 | 2.00e+00 | "Regular" | 23 | 17 |
+-------+--------+-------+-------+----------------+-------------------+---------------------+-------+----------+-----------+-------+-------+
showing top 3 rows
Hail's tables are functional. Operations like `filter` and `order_by` return new tables. That means it would be an error to use `vehicles.year` in the `order_by`, since the input and the sort expression refer to different tables. Unfortunately, this means you can't use `.` chaining.
A little more background on the project: Hail's raison d'etre is a 3-dimensional generalization data frames we use for genetic data called a MatrixTable [3]. Conceptually, it is matrix-of-dicts rather than lists-of-dicts.
Genetic data is massive, so all of this is lazy and works on out of core data. The Python front end constructs an IR representing the query, it's fed through a query optimizer (written in Scala) and executed by a backend. We're working on multiple backends, but our primary backend right now is Spark.
[1] https://hail.is/docs/0.2/index.html
[2] https://hail.is/docs/0.2/hail.Table.html
[3] https://hail.is/docs/0.2/hail.MatrixTable.html