Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would've said "Pandas with Parquet files". If you're hiring a DS it's implied that you want to do some sort of aggregate or summary statistics, which is exactly what Pandas is good for, while awk + shell scripts would require a lot of clumsy number munging. And Parquet is an order of magnitude more storage efficient than CSV, and will let you query very quickly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: