Skip to content
Transformations

How do I run a Python transformation?

Create, run, and develop a Python transformation in Keboola — write the script, map input and output CSV files, run it, confirm the result, and debug it in a workspace or locally.

You want to process data with Python where SQL is awkward. A Python transformation reads your mapped input tables as CSV files, runs your script, and writes CSV outputs back to Storage. This page gets you from nothing to a successful run, then shows how to develop and debug. For limits, file paths, and packages, see the reference.

Time: ~10 minutes · You will need: a Keboola project and one table in Storage (or the sample CSV file).

  1. Open Components → Transformations, click New Transformation, and choose Python Transformation.
  2. Name it and confirm.
  1. Upload the sample CSV file to Storage as a table.
  2. In Input Mapping, add it and set its Destination to source (the script reads in/tables/source.csv).
  3. In Output Mapping, map result.csv (produced by the script) to a new Storage table, for example out.c-main.result.

Paste a script that reads in/tables/source.csv and writes out/tables/result.csv:

import csv
with open('in/tables/source.csv', mode='rt', encoding='utf-8') as in_file, open('out/tables/result.csv', mode='wt', encoding='utf-8') as out_file:
reader = csv.DictReader((line.replace('\0', '') for line in in_file), dialect='kbc')
writer = csv.DictWriter(out_file, dialect='kbc', fieldnames=['col1', 'col2'])
writer.writeheader()
for row in reader:
writer.writerow({'col1': row['first'] + 'ping', 'col2': int(row['second']) * 42})

See the reference for list-based and explicit-format variants. You can split the script into blocks.

  1. Click Run.
  2. Wait for the job to finish with a success status.
  3. Open Storage, find your output table, and confirm col1 has the ping suffix and col2 is second × 42.

The fastest way to iterate is a Python workspace (JupyterLab) with the same input mapping:

  1. Configure input (and optionally output) mapping, then Load Data and Connect to the workspace.
  2. Paste your script into the notebook — the in//out/ directory structure and input files are already prepared.
  3. Run it; optionally Unload Data to push results to Storage, or Create Transformation to scaffold a transformation with the same mapping.

To develop locally, install Python and recreate the directory structure (in/tables/, out/tables/) with your input files. A ready example is in data.zip; the same script then runs unchanged as a transformation. For an exact environment, use the Keboola Docker image.

For large data, raise the Backend size in the configuration (XSmall → Small → Medium → Large); see backend sizes. This affects time-credit consumption.

SymptomLikely causeFix
FileNotFoundError on in/tables/source.csvInput mapping destination doesn’t match the path in the scriptSet the input Destination to source (or change the path in the script).
Output table empty / not createdOutput mapping Source doesn’t match the file the script writesMap result.csv (the file your script writes to out/tables/).
IndentationError / TabErrorMixed tabs and spacesUse consistent indentation; Python is indentation-sensitive.
A defined main() never runsWrapped in if __name__ == '__main__':Call main() directly instead.
Ask Kai

Ask anything about Keboola — I'll search the docs and cite the pages I use.