samwood.org

chdelim.awk

CSV is a great format for spreadsheets; it’s simple, portable, and easy to read. The only problem is that the delimiter character that defines the format, the comma, is rather common in actual data, meaning that it requires special treatment that can be a bit of a pain in the neck.

In my case, I wanted to be able to pull rows, columns, and individual cells from a spreadsheet using awk. AWK and CSV are a near perfect match, with AWK’s entire structure being based around lines of input delimited by some special character. AWK, though, does not have any way to handle the commas that are not delimiters, i.e. the ones in cells surrounded by double quotes.

Fortunately, AWK itself can be used to make its own perfect spreadsheet format. This script replaces all of the delimiting commas in a CSV spreadsheet with a character that can be specified by the user, so you can always pick a delimiter that won’t appear in the data, and you don’t have to worry about special cases with double quotes.

Side note on AWK

On the subject of AWK, I feel the need to say that it is a criminally underrated language. You can do all kinds of neat things with it if you know what it’s capable of, and, unlike a lot of scripting languages, it’s syntax is immediately familiar to anyone who was worked with C or anything in its family of look-alikes. Next time you write a script that shell alone isn’t suited for, don’t reach for Python or Perl right away; see if you can get away with doing it in AWK.

The manual page, as always, is a great reference, but for a more complete overview of AWK as a language, as opposed to a one-liner tool, have a look at the original book the developers wrote on how to use it. There are a great many of these really old Unix manuals out there, and they’re a fantastic resource if you can hunt them down.


Valid XHTML Basic 1.1! Valid CSS level 2.1! This website is licensed under CC-BY-ND.