An SQL Solution for Jupyter

ktpsns · on Feb 9, 2021

This is amazing. Jupyter is already a great tool for data science and being able to directly interact with SQL (without a host/intermediate language) is great, because it is very clean and SQL is a powerful language on its own – no need for some host language. Furthermore, vega (https://vega.github.io/vega-lite/) is an amazing kind-of-declarative visualization language which mixes great with the declarative language SQL.

3l3ktr4 · on Feb 10, 2021

thanks for the kind words! :)

andreareina · on Feb 10, 2021

The only example of how to use it is an animation that is just impossible to read :(

Here are the examples in the live demo[1]:

    %LOAD sqlite3 db=chinook.db timeout=2 shared_cache=true
    %XVEGA_PLOT X_FIELD EmployeeId Y_FIELD ReportsTo BIN TRUE MARK square WIDTH 100 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
    %XVEGA_PLOT X_FIELD EmployeeId Y_FIELD ReportsTo BIN MAXBINS 3 MARK bar COLOR red WIDTH 200 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
    %XVEGA_PLOT X_FIELD EmployeeId Y_FIELD ReportsTo TYPE ordinal MARK bar COLOR green WIDTH 200 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
    %XVEGA_PLOT X_FIELD Name TYPE nominal Y_FIELD ArtistId BIN MAXBINS 1 MARK line COLOR purple WIDTH 200 HEIGHT 200 <> SELECT Name, ArtistId FROM artists LIMIT 10
    %XVEGA_PLOT X_FIELD EmployeeId GRID false Y_FIELD ReportsTo MARK area COLOR pink WIDTH 200 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees

[1] https://hub.gke2.mybinder.org/user/jupyter-xeus-xeus-sql-x64...

3l3ktr4 · on Feb 10, 2021

hey @andreareina! Actually we have a binder on the repo that you can try + several examples :) link to binder: https://mybinder.org/v2/gh/jupyter-xeus/xeus-sql/stable?urlp... link to examples:https://github.com/jupyter-xeus/xeus-sql/tree/master/example...

andreareina · on Feb 10, 2021

Thanks! I'm a fan of notebook-based work and just interactive programming in general and even if I don't use Jupyter myself it's exciting to see innovation in the space.

My complaint is really more that for this sort of thing I feel examples should be front and center; Jupyter "sells" a better experience, so show me how easy and ergonomic it is to use!

Hope that helps. Right now it looks like the sql parts are kind of the equivalent of running a subshell and displaying the results, with no communication to the "main" code. Is this accurate? Is there a way forward in the future to e.g. pull the data down with sql and then do some further munging in another language?

3l3ktr4 · on Feb 10, 2021

>My complaint is really more that for this sort of thing I feel examples should be front and center; Jupyter "sells" a better experience, so show me how easy and ergonomic it is to use!

Hum, I see. I thought you haven't found the binder, sorry. It's a good point! I'll make sure to pay attention on that on the next posts :)

>"main" code. Is this accurate? Is there a way forward in the future to e.g. pull the data down with sql and then do some further munging in another language?

I'm not sure if I follow what you mean with the main code. you mean having access to each individual value of a query result, for example, to be able to manipulate it? if that's what you mean, that's what we receive in the C++ backend code that runs on this kernel. so as long as you can integrate new libraries to this code you'd be able to manipulate the results of your query.

galacticdessert · on Feb 10, 2021

Would not the latter be the task of a language library? i.e. Pandas with read_sql()?

3l3ktr4 · on Feb 10, 2021

I think so too. It'd be amazing to have python to receive this output and do stuff with it as people said in other comments.

maininformer · on Feb 10, 2021

I know you pronounce it ES-KYU-EL because you used "an"

fulafel · on Feb 10, 2021

It's more common than "sequel" in non english speaking places IME.

KerryJones · on Feb 10, 2021

Yep, it through me off, I saw it somewhere years ago that the pronunciations were:

SQL = SEE-KWUHL MySQL = MY-ES-KYU-EL

And it's now engrained in my mind and this title through me off ;)

junon · on Feb 10, 2021

"An SQL", "through me off"....

I just woke up and my head is already spinning. ;)

KerryJones · on Feb 10, 2021

Oh boy, I think I need to go back to bed

jessikat · on Feb 10, 2021

"SEE-KWUHL" is actually the pronunciation for MS SQL Server. "ES-KYU-EL" is for all other uses of SQL.

adolph · on Feb 10, 2021

When SQL was established in the early 70s, it was called SEQUEL (Structured English Query Language). However, due to a copyright issue, it was changed it to SQL. In fact, SQL is typically pronounced “sequel” today, but some favor the non-acronym pronunciation of “ess-cue-el” (in case you didn’t know, an acronym is an abbreviation you can pronounce like a word, e.g. SQL or ANSI).

Read more at the ANSI Blog: The SQL Standard – ISO/IEC 9075:2016 (ANSI X3.135) https://blog.ansi.org/?p=158690

slt2021 · on Feb 10, 2021

sir, it is my-ass-que-el

fulafel · on Feb 10, 2021

The Github support for notebooks is so nice (was linked from the example pic caption: https://github.com/wangfenjin/xeus-tidb/blob/develop/example...)

stevesimmons · on Feb 10, 2021

Jupyter would be even better if it supported the seamless combination of Python and SQL code cells.

My notebook code typically involves a data prep stage with querying a SQL database, then downloading into Python for more complex analysis, ML modelling, integration with external data sources, etc. So the notebook has a Python kernel with SQL usually as embedded """-quoted strings.

Does anyone have a solution to treating selected code cells as SQL - with SQL highlighting and tooltips - exposed as string variables to the Python code?

Sparkmagic [1] does part of this for Python/SQL/Spark interoperability, but as far as I recall, doesn't support SQL syntax highlighting.

[1] https://github.com/jupyter-incubator/sparkmagic

galacticdessert · on Feb 10, 2021

Agree this would be awesome. I reckon you could obtain something similar with pandas with pd.read_sql("<query>"), but a dedicated SQL cell akin to markdown cells would make it much more pleasant

pletnes · on Feb 10, 2021

There’s a %%bash cell magic which turns one cell into a bash cell. What about a %%xeus cell to run a SELECT (or any sql)? Would that work?

krmk · on Feb 16, 2021

Like this: https://pypi.org/project/ipython-sql/ ?

3l3ktr4 · on Feb 10, 2021

that's pretty amazing work! there have been some efforts in the past to do that: https://github.com/minrk/allthekernels I think it'd be hard to do something like this, but not impossible. Just a lot of work.

aargh_aargh · on Feb 10, 2021

I followed the instructions and the sql cells work, but the xvega ones return empty output and I'm not sure how to debug that.

  conda create -n xeus-sql
  conda activate xeus-sql
  conda install xeus-sql soci-mysql soci-postgresql soci-sqlite jupyterlab -c conda-forge

In conda list I see, among others:

  xeus                      1.0.0                h78d96c3_0    conda-forge
  xeus-sql                  0.0.8                h118ccdd_1    conda-forge
  xvega                     0.0.10               h4bd325d_0    conda-forge
  xvega-bindings            0.0.10               h4bd325d_0    conda-forge

Btw there's a typo in the instructions: soci-postresql -> soci-postgresql

aargh_aargh · on Feb 10, 2021

Also, for mysql and postgres, are only socket connections supported? No TCP connections?

https://xeus-sql.readthedocs.io/en/latest/PostgreSQL.html#li...

aargh_aargh · on Feb 10, 2021

Answering my own question - works as expected, only the documentation is lacking.

%LOAD postgresql host=127.0.0.1 port=5432 dbname=DATABASE user=USERNAME password=PASSWORD

geebee · on Feb 10, 2021

This looks very promising! I will look into it.

When I want a tabular view, I currently either used the pandas read_sql_query method or the PandaSQL module. That does work, and I often do want to use pandas and sql together (often back and forth, depending on the operation).

That said, I can see a use for this tool in my work - I'm often really just interested in interacting with a database in a more visually friendly way than the command line, and it would be great to stay with Jupyter rather than having to go through a completely different UI.

So, yeah, nice looking tool, and thanks for posting!

seddonm1 · on Feb 10, 2021

We have used https://almond.sh/ to create a Spark SQL interpreter using Jupyter Notebooks - plus a whole lot more which you can see here: https://arc.tripl.ai/tutorial

After seeing many companies writing ETL using code we decided it was too hard to manage at scale so provided this abstraction layer - which is heavily centered around expressing business logic in SQL - to standardise development (JupyterLab) and allow rapid deployments.

realitysballs · on Feb 10, 2021

Does anyone know if can work with 32-bit ODBC drivers? My industry is behind the times and the only way to interact with our ERP is 32-bit ODBC

3l3ktr4 · on Feb 10, 2021

hey @realityballss, don't really know as I'm not too deep in the SOCI stuff. but we offer full support to SOCI meaning if these folks: https://github.com/SOCI/soci support it and the dependencies exist and work for 32bit, than yes. I'd say it looks a bit like a long shot, but may work.

jasmcole · on Feb 10, 2021

Those interested in SQL notebooks can also check out https://count.co

Vaslo · on Feb 10, 2021

Names almost every major sql engine except SQL Server

Says will work with any DB with an ODBC connection

Hmmmm

3l3ktr4 · on Feb 10, 2021

it’s depending directly of SOCI. I offer a wrapper around everything they do :)