Every now and then someone asks how to deduplicate a 200GB text file. People start suggesting Spark, DuckDB, PyArrow. The answer is sort | uniq. Half of data engineering is remembering that Unix ...
One of the truly great qualities of UNIX-like operating systems is their ability to combine multiple commands. By combining commands, you can perform a wide array of tasks, limited only by your ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results