How to extract specific columns from a .csv file
In my last assignment with the SQL Ledger product I had to do some imports of .csv data which always worked fine. Unfortunately the vendor of the application providing these .csv files switched the export format which caused a lot of redundancy in the .csv files which made the imports impossible at first glance.
The data necessary for imports was stored in specific files right in the middle of the particular csv columns and being a Linux fan I immediately thought of the Linux “cut” command of the GNU textutils.
However I ran into some difficulties as the csv file contained things like “first name, last name” (please note the comma which caused havoc with the specified delimiter.
So apparently “cut” wouldn't do for this purpose and I found a Python script which eventually changes the delimiter to a semicolon but that's no problem as you can specify the delimiter to use during import.
#!/usr/bin/env python3 import csv import sys for path in sys.argv[1:]: with open(path, newline='') as csvfile, open(sys.__stdout__.fileno(), mode=sys.__stdout__.mode, buffering=1, encoding=sys.__stdout__.encoding, errors=sys.__stdout__.errors, newline='') as output: reader = csv.reader(csvfile, delimiter=',') writer = csv.writer(output, delimiter=';') for row in reader: writer.writerow(row[26:36]) # Indices start a zero, ranges end excluding the last element
Please not that in the last line you can specify which rows you would like to cut out where as the first element (column) starts at zero. By default results are printed on STDOUT but you can use a redirect to get these informations to an output file.
In that respect I would like to thank seahawk1986 from ubuntuusers.de for coming up with this idea and S. Weitmann for reviewing the results.