Simple CSV Text Filters
When working with Django, it is simple to format data in your views, just use Django template filters.
What if you also need to generate CSV (Comma Separated Value) files from the same data query. The client requested that the data in the CSV file have the following formatting:
- format money with US Dollar sign, and appropriate commas
- represent percentages as xx% vs. 0.xx
I already had several Django template filters written, could I reuse them to generate .CSV files? Of course!
For example, let's look at the percent filter we're using in the Django template:
{{ result.percent_val|percentify }}
This assumes a percent filter exists as follows:
@register.filter
def percent( value):
"""Format a number as a percentage with three decimal places.
Turns 0.15 into '15%'.
If the passed value is not a number, return what came in."""
result = value
"""If input is None, turn it into a blank string. Don't want to see None on output."""
if None == result:
result = ''
try:
flt = float(value) * 100.00
result = locale.format("%0.3f", flt, grouping=True)
result += '%'
except:
result = value
return result
Could we use this same filter in our CSV file? Yes!
build_csv_array_from_dict
The following method requires three parameters:
- table: a list of dictionaries (each dictionary is a row in the database query result
- fields (and filters): for extracting the named values from each row
- header: fieldnames to put in the first row of the csv file
The header parameter is not actually required for this solution, but my specific implementation required the output to have some fields that were custom-named by the client and were different from request to request.
def build_csv_array_from_dict(table, fields, header):
"""table is a list of dictionaries
fields is a list of filenames with optional filtering:
fieledname
fieldname|filter
header is column names to put into first row of output
return is always a string
"""
cols = []
out_row = []
out_array = []
filters = {}
# map the incoming fields into a list (which is ordered) and a
# dictionary containing the optional filters
# split the string into individual columns.
in_cols = fields.split(',')
for col in in_cols:
if "|" in col:
field_name, filter = col.split('|')
cols.append(field_name)
filters[field_name] = filter
else:
cols.append(col)
# output the header information
headers = header.split(',')
for col in headers:
out_row.append(col)
out_array.append(out_row)
# walk through each table row, extract the fields by name and in order
# then apply the optional filter
for row in table:
out_row = []
for col in cols:
# get value by name from the row
val = row[col.strip()]
# try formatting it.
# throw exception if no filter associated with this column
try:
val = FILTERS[filters[col]](val)
except KeyError:
pass
out_row.append(val)
out_array.append(out_row)
return out_array
Using the Formats
Our code snippet now looks as follows:
table = (query from database with individual rows turned into dictionaries)
headers = name,return on income,gross sales
fields = name,roi|percent,gross_sales
csv_array = build_csv_array_from_dict(table, fields, headers)
But Wait!
But Wait #1! You're quite correct if you're thinking that the standard Python database interface does not return each row of a query result as a dictionary. There's a number of ways to make this transformation and selecting which one makes sense for you will depend on what sort of data is being returned and whether or not it makes sense to have all the data loaded at one time. The specific approach I'm using is provided by Conan Albrecht as an ASPN Python Recipe.
But Wait #2! The example shows only one filter, and Django allows you to string multiple filters. If you need multiple filters, you could modify the above code to split into an array, use the first element of the array as the data, and then iterate the other elements of the array and process each filter in turn. A minor modification, but beyond what I needed for my specific task at hand.
But Wait #3! Where's all the Django Object/Relational Goodness? You're calling SQL directly.
Correct! The specific instance where I'm using this is doing some extensive reporting queries (i.e., SQL queries that tend to be larger than a page of text). In several cases, we're also using the PostgreSQL cross-tab feature as well. Both things that do not fit well with the Django ORM. Sometimes thinking in SQL