laxentasken on Hacker News

Ask HN: Best way to do heavy csv processing?

I got a couple of big csv files (~5-10GB with millions of rows) that needs to be processed (linked to older files and updating the data etc) and then exported to new csv files.

The data follows the relation model but just updating one field after dumping it into postgresql takes quite some time (doing a update on join) and I'm not sure this is the most effective tool/way for this kind of work. The only queries that will be run is to update or doing inserts/append new data to existing tables (eg older files).

Do you have any suggestions to look into for a workload like this?

1laxentasken7y ago4

Ask HN: Database migration project – which database?

Hi! I'm tasked with migrating a existing db project from a proprietary database to "a better solution" as the current one is becoming quite unstable due to various bugs and the fact that it reached EOL years ago.

The workflow is pretty simple: mass update existing records or appending new records once in a while (this will be done in a controlled fashion invoked by someone - no services) -> do some ETL work -> re-build existing tables from source after doing some join magic -> export tables for delivery to customers.

Tables have around 10 to 100 million rows and 10 to 300 fields (will probably normalize a bit..) The data is relational and as for now the size is ~400GB which I expect to grow around 3% each time. There wont be much reads as the tables will be exported for further process and the the writes will be in a controlled fashion as stated above.

I'm looking at mysql and postgresql - which one would you choose? Or something else?

1laxentasken8y ago3

My naturewatch (opens in new tab)

(mynaturewatch.net)

1laxentasken8y ago0

Server: Racket book now available (opens in new tab)

(lisp.sh)

4laxentasken8y ago0

The Franklin Lightning Sensor (opens in new tab)

(ams.com)

1laxentasken8y ago0

Ask HN: Going full MIT OCW

Inspired by "How MIT OpenCourseWare transformed a learner's life"[1] and the lack of "good" university courses that are available from remote location in my country I thought about doing some courses on OCW for knowledge and hopefully fun, hoping it might give back something in life and maybe increase chances for a new job.

Here is my ordered list: 6.0001 Introduction to Computer Science and Programming in Python 6.0002 Introduction to Computational Thinking and Data Science 6.S096 Introduction to C and C++ 6.001 Structure and Interpretation of Computer Programs 6.005 Software Construction 6.042J Mathematics for Computer Science. 6.006 Introduction to Algorithms (Fall 2011) 6.046J Design and Analysis of Algorithms. --- 6.824 Distributed Computer Systems Engineering 6.828 Operating System Engineering

Obviously it is a bit light on the mathematics and I will probably add more courses (or not), so that might change.

Any comments/remarks on this list?

Currently I know the basics of C, python, Java and SQL but I feel like I'm missing some more formal education on the subjects.

[1]https://news.ycombinator.com/item?id=14514686

2laxentasken9y ago0

Ask HN: Best way to do heavy csv processing?

I got a couple of big csv files (~5-10GB with millions of rows) that needs to be processed (linked to older files and updating the data etc) and then exported to new csv files.

Do you have any suggestions to look into for a workload like this?

Ask HN: Database migration project – which database?

I'm looking at mysql and postgresql - which one would you choose? Or something else?

Ask HN: Going full MIT OCW

Obviously it is a bit light on the mathematics and I will probably add more courses (or not), so that might change.

Any comments/remarks on this list?

Currently I know the basics of C, python, Java and SQL but I feel like I'm missing some more formal education on the subjects.

[1]https://news.ycombinator.com/item?id=14514686

laxentasken

Recent submissions

CS107e: Learn computer systems with the Raspberry Pi (opens in new tab)

Ask HN: Best way to do heavy csv processing?

Ask HN: Database migration project – which database?

My naturewatch (opens in new tab)

Server: Racket book now available (opens in new tab)

The Franklin Lightning Sensor (opens in new tab)

Ask HN: Going full MIT OCW

Recent submissions

CS107e: Learn computer systems with the Raspberry Pi (opens in new tab)

Ask HN: Best way to do heavy csv processing?

Ask HN: Database migration project – which database?

My naturewatch (opens in new tab)

Server: Racket book now available (opens in new tab)

The Franklin Lightning Sensor (opens in new tab)

Ask HN: Going full MIT OCW