This is amazing. Jupyter is already a great tool for data science and being able to directly interact with SQL (without a host/intermediate language) is great, because it is very clean and SQL is a powerful language on its own – no need for some host language. Furthermore, vega (https://vega.github.io/vega-lite/) is an amazing kind-of-declarative visualization language which mixes great with the declarative language SQL.
The only example of how to use it is an animation that is just impossible to read :(
Here are the examples in the live demo[1]:
%LOAD sqlite3 db=chinook.db timeout=2 shared_cache=true
%XVEGA_PLOT X_FIELD EmployeeId Y_FIELD ReportsTo BIN TRUE MARK square WIDTH 100 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
%XVEGA_PLOT X_FIELD EmployeeId Y_FIELD ReportsTo BIN MAXBINS 3 MARK bar COLOR red WIDTH 200 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
%XVEGA_PLOT X_FIELD EmployeeId Y_FIELD ReportsTo TYPE ordinal MARK bar COLOR green WIDTH 200 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
%XVEGA_PLOT X_FIELD Name TYPE nominal Y_FIELD ArtistId BIN MAXBINS 1 MARK line COLOR purple WIDTH 200 HEIGHT 200 <> SELECT Name, ArtistId FROM artists LIMIT 10
%XVEGA_PLOT X_FIELD EmployeeId GRID false Y_FIELD ReportsTo MARK area COLOR pink WIDTH 200 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
Thanks! I'm a fan of notebook-based work and just interactive programming in general and even if I don't use Jupyter myself it's exciting to see innovation in the space.
My complaint is really more that for this sort of thing I feel examples should be front and center; Jupyter "sells" a better experience, so show me how easy and ergonomic it is to use!
Hope that helps. Right now it looks like the sql parts are kind of the equivalent of running a subshell and displaying the results, with no communication to the "main" code. Is this accurate? Is there a way forward in the future to e.g. pull the data down with sql and then do some further munging in another language?
>My complaint is really more that for this sort of thing I feel examples should be front and center; Jupyter "sells" a better experience, so show me how easy and ergonomic it is to use!
Hum, I see. I thought you haven't found the binder, sorry. It's a good point! I'll make sure to pay attention on that on the next posts :)
>"main" code. Is this accurate? Is there a way forward in the future to e.g. pull the data down with sql and then do some further munging in another language?
I'm not sure if I follow what you mean with the main code. you mean having access to each individual value of a query result, for example, to be able to manipulate it? if that's what you mean, that's what we receive in the C++ backend code that runs on this kernel. so as long as you can integrate new libraries to this code you'd be able to manipulate the results of your query.
When SQL was established in the early 70s, it was called SEQUEL (Structured English Query Language). However, due to a copyright issue, it was changed it to SQL. In fact, SQL is typically pronounced “sequel” today, but some favor the non-acronym pronunciation of “ess-cue-el” (in case you didn’t know, an acronym is an abbreviation you can pronounce like a word, e.g. SQL or ANSI).
Jupyter would be even better if it supported the seamless combination of Python and SQL code cells.
My notebook code typically involves a data prep stage with querying a SQL database, then downloading into Python for more complex analysis, ML modelling, integration with external data sources, etc. So the notebook has a Python kernel with SQL usually as embedded """-quoted strings.
Does anyone have a solution to treating selected code cells as SQL - with SQL highlighting and tooltips - exposed as string variables to the Python code?
Sparkmagic [1] does part of this for Python/SQL/Spark interoperability, but as far as I recall, doesn't support SQL syntax highlighting.
Agree this would be awesome. I reckon you could obtain something similar with pandas with pd.read_sql("<query>"), but a dedicated SQL cell akin to markdown cells would make it much more pleasant
that's pretty amazing work!
there have been some efforts in the past to do that: https://github.com/minrk/allthekernels
I think it'd be hard to do something like this, but not impossible. Just a lot of work.
When I want a tabular view, I currently either used the pandas read_sql_query method or the PandaSQL module. That does work, and I often do want to use pandas and sql together (often back and forth, depending on the operation).
That said, I can see a use for this tool in my work - I'm often really just interested in interacting with a database in a more visually friendly way than the command line, and it would be great to stay with Jupyter rather than having to go through a completely different UI.
So, yeah, nice looking tool, and thanks for posting!
After seeing many companies writing ETL using code we decided it was too hard to manage at scale so provided this abstraction layer - which is heavily centered around expressing business logic in SQL - to standardise development (JupyterLab) and allow rapid deployments.
hey @realityballss, don't really know as I'm not too deep in the SOCI stuff.
but we offer full support to SOCI meaning if these folks: https://github.com/SOCI/soci support it and the dependencies exist and work for 32bit, than yes.
I'd say it looks a bit like a long shot, but may work.