Announcing Python in the canvas

Hello, Python 🐍

Announcing Python in the canvas

Today we’re excited to announce the much-anticipated release of Python in the canvas! It’s clear Python is no longer a tool just for data scientists, but is highly valued in every part of the data stack.

Our aim in bringing Python cells into the canvas is to reflect the way data teams work: use the best language for the job at each point in time, while having the flexibility of the canvas to lay everything out at once, and the transparency to communicate that with others.

You can skip ahead and check out the documentation here, and an example Python canvas here, or keep reading to learn more.

Why now?

Count has always been a SQL-first tool. We believed (and still do), that SQL is the primary language of data analytics, but over time we couldn’t deny the growing role Python was playing in data teams. SQL was the way you spoke to the database, but Python was getting used for so much more: pulling data from APIs, creating predictive models and forecasts, finding outliers, creating custom visualizations, etc., etc.

You could no longer separate SQL work from Python work. And we knew that in order to support data teams, we needed to support Python in the canvas.

How it works

Python cells in the canvas work similarly to DuckDB cells. (I encourage you to check out Jason’s article on our query architecture if you haven’t, he explains it in more detail than I will here.)

Each canvas has its own Python instance running locally in your browser using Pyodide. From this local instance, we install any packages and modules you import, and we execute any code you’ve written.

Count's query architecture

The secret sauce

Running Python in the canvas is one thing, but we knew in order for Python to really make a difference, it had to work seamlessly with every other object in the canvas.

This means you can:

  • query a database cell from a Python cell
  • query a pandas data frame with DuckDB
  • use control cells to inject parameters and variables into your code
  • create visualizations from any Python cell (using Count visuals or Python visual libraries)
  • create linked text off of any Python cell
  • maintain DAG relationships between all cell types
Python interoperability

More than that, Python cells also work with Count’s reactive cell framework. This means:

  • you don’t need to worry about the execution order of Python cells (we do that for you)
  • you don’t need to worry about things getting out of sync - every change you make is automatically distributed to all downstream cells

Making Python cells a seamless part of the canvas allows teams to work as they do now and choose the right language for the job, without sacrificing flexibility, transparency, or functionality.

Use cases

There are countless ways to use Python, so this is not an exhaustive list, but a reflection of the most common ways people are using Python in the canvas today.

You can check out this canvas to see these (and more) use cases in action.

[1] Getting data from APIs

While it would be great if all the data we needed had a nice, automated way to get into our data warehouse, that is usually not the case. We may need to use APIs to get information from services we use (e.g. Facebook Ads) or to pull in data we use as references (e.g. US Postcode boundaries).

To pull from APIs in the canvas, you can now:

  • use the count-requests package to send and receive information from the web
  • use secrets to keep API keys secure
  • use control cells to parameterize your request
  • merge results with other SQL or Python cells
  • visualize the data as you want

You can see an example of how to pull data from an API in the canvas here.

[2] When you reach the SQL ceiling

Python is far better suited to do some of the complex analytical tasks we need to perform like forecasting, model-fitting, and text analysis (to name a few).

The example below shows an interactive way to fit a model for some time series data. You’ll notice:

  • importing all the modules and packages required
  • use of control cells as parameters
  • reactive, real-time updates to any changes to control cells
  • use of Count visualizations
  • definition and use of user-defined functions
Screenshot 2023-05-31 at 8.38.48 am.png

[3] Data exploration and cleaning

Using functions or built-in methods can help immensely when doing data transformation. In the example below we’re converting a string column to an integer with some bespoke logic. We can then see the difference between the column before and after the distribution on either side of the Python cell.

Screenshot 2023-05-31 at 8.36.07 am.png

How to get started

To get started with Python in the canvas, we suggest importing an existing Jupyter notebook you have:

Also, check out these resources: