Datasette.io, an official project website for Datasette

rpdillon · on Dec 24, 2020

I never tire of seeing Datasette on HN. It saved me last week when business folks sent me four huge excel sheets of data that I'd need to answer questions about during a meeting. I exported to CSV, then into SQLite, and immediately had a shareable web page I could query and filter in realtime. Vastly more useful to me (as an engineer) than Excel, and having the full power of SQL is a delight.

Wish list: I wish it were a single executable I could carry around, like fossil, jq, rclone, rg/ag, sqlite3, gitea, micro, etc.

simonw · on Dec 24, 2020

So great to hear it being used like that!

I'd really love to have a standalone executable version of Datasette too. I've been meaning to spend some time on that - bundling Python applications is tricky but it should be possible using BeeWare or PyInstaller.

The best way to install it at the moment is "brew install datasette" - but that only works if you're on a Mac. I really need to get on top of the Windows installation process too.

mastazi · on Dec 24, 2020

You could implement "scoop install datasette" or "choco install datasette". Way easier than creating an installer, for example this is how you create a Scoop manifest: https://github.com/lukesampson/scoop/wiki/Creating-an-app-ma...

ErikBjare · on Dec 24, 2020

It's a common misconception that PyInstaller is something that creates an installer.

Scoop and Chocolatey are nice, but you'll still need something like PyInstaller to bundle your Python app into something that can be installed by scoop or choco. (source: building ActivityWatch [1] which is bundled by PyInstaller and available on Chocolatey)

[1]: https://activitywatch.net

simonw · on Dec 24, 2020

That's really useful, thanks.

tecleandor · on Dec 24, 2020

Instead of PyInstaller, I had success using Nuitka. Doesn't work with every Python package, but I think it's easier to use.

nojito · on Dec 24, 2020

>Vastly more useful to me (as an engineer) than Excel,

Open Excel --> Click Data --> Get Data --> From File (Csv) -> If it's clean already just do load to (add to data model) --> alt+n+v to insert pivot and impress your business folks.

rpdillon · on Dec 24, 2020

Thanks for the tip! The main issue in this particular case was that I never work with Excel, so I was shipped Excel files via Google Sheets (since I have no Excel license on my Mac). I was mostly trying to get into some environment that I was comfortable with so I could work confidently while others were waiting. Datasette was the best tool I could think of that would allow me full SQL access while also presenting a web page others could grok easily. The terminal is home for me, but I think it's off-putting to some, and Datasette bridges that nicely.

leonim · on Dec 24, 2020

If the terminal is your home, I would suggest looking at something already mentioned by someone else VisiData (http://visidata.org), it is great for exploring all sorts of data files like CSV, Excel, Sqlite3, etc in the terminal.

I find the Datasette author's related tools sqlite-utils (https://github.com/simonw/sqlite-utils) and the Dogsheep tools (https://github.com/dogsheep) and VisiData are nicely complementary.

I prefer the interface of VisiData, but I don't usually need to share links.

rpdillon · on Dec 24, 2020

This is fantastic! Thanks so much for the pointer...the ability to traverse tons of data formats quickly to "get a feel" for them (much like simonw's use case for some of the sqlite-utils) is exactly the tooling I need in cases like this, and the TUI interface is an awesome bonus. Will definitely be playing around with this over the next few days...thanks!

tga · on Dec 24, 2020

I’ve done pretty much the same thing many years ago by connecting directly to the Excel sheet from Access, and then writing queries and reports on top of it. The advantage was that there was no import step, so edits could be sent back right away.

If I remember correctly, Excel sheets can be directly used as ODBC data sources. This means that they can be accessed directly by database apps.

hobs · on Dec 24, 2020

Worth mentioning that https://github.com/dfinke/ImportExcel does excel -> powershell -> excel in a really nice package for that kind of filtering/etc work.

simonw · on Dec 24, 2020

I posted a follow-up to this article describing how I built the search engine feature for https://datasette.io/ here: https://simonwillison.net/2020/Dec/19/dogsheep-beta/

dmix · on Dec 24, 2020

Dogsheep looks neat too. Querying multiple sqlite DBs with YAML templates is basically the deal?

There are so many useful data hacking tools these days. I wish I had important data to play with just to try these tools.

simonw · on Dec 24, 2020

Yeah, Dogsheep Beta solves the "how do I search across multiple tables in multiple databases" problem by creating a single "search index" table based on YAML configuration, then using that same YAML configuration for the HTML template fragments used for the different types.

nocman · on Dec 24, 2020

Oh, dang. Here I thought this was going to be 80's Commodore computers saving programs to cassette tape.

https://en.wikipedia.org/wiki/Commodore_Datasette

Oh, well. Looks interesting anyway :-D

fortyseven · on Dec 24, 2020

Wish they'd picked another name. As a Commodore fan, this trips me up every time.

myself248 · on Dec 24, 2020

Seriously. Like if you're gonna reuse a normal word, that's one thing, of course a product named Windows is gonna run into issues.

But when you pick a unique word that only exists as another product.... why, just, why?

simonw · on Dec 24, 2020

Because I grew up using a Commodore 64 and wrote my first database program using C64 BASIC.

I mistakenly thought it would be a unique name that would make it easy for me to track mentions. Turns out there are still a LOT of C64 fans out there actively talking about their tape drives!

dmix · on Dec 24, 2020

Don't worry your product is datasette.io and on Github as a python project. Humans are smart enough to figure out the difference.

I think its a great name for the project, even with the legacy usage.

It's not like it's super popular primary brands like Macintosh or something Atari.

lovasoa · on Dec 24, 2020

The software is great, but I think the presentation of the data makes it hard to navigate.

I made a small PR [1] to try and improve that, but if someone more knowledgeable about design could make a small contribution to this project, I think that would help a lot of people !

[1] https://github.com/simonw/datasette/pull/1159

cpfeifer · on Dec 24, 2020

I also like visidata (https://www.visidata.org/) for viewing structured files. Doesn't produce a webpage however.

zerop · on Dec 24, 2020

Awesome tool. Solves problems for Masses!! Can you also put a page on how people are using Datasette in their work and what problem they are solving. This would help ideate more use cases for others.

Jarwain · on Dec 24, 2020

This looks super cool, very useful for some of the data science stuff I've been working with at work recently.

Something I'm curious about; is it possible to use datasette as a sort of "sqlite browser"? Say I have users with sqlite databases with various data, but all the dbs conform to a known schema; can I use datasette to allow my users to upload their sqlite databases and browse them?

Or would I have to do something more custom, running datasette in the background against an uploaded db and serving the result to my user?

simonw · on Dec 24, 2020

I'm working on improvements related to this at the moment.

Currently you need to run Datasette against existing SQLite databases when it starts up. Plugins like https://datasette.io/plugins/datasette-upload-csvs can be used to let people add new tables to those existing databases, but there isn't a mechanism to add a whole new database file without restarting the server.

In the next release I hope to be able to support pointing Datasette at a directory and having any new SQLite files that are added to that directory automatically show up in the interface without having to restart the server.

You can follow this issue for updates: https://github.com/simonw/datasette/issues/417

EvRev · on Dec 24, 2020

Third time is a charm! Your tool is amazing, hoping to use it with my fitness tracker data out of Gadgetbridge.

How much overlap do you see with Graphana? Perhaps trying to target that group could yield more adopters?

simonw · on Dec 24, 2020

Grafana is much more focused on time series data - Datasette can handle that but it's not nearly as good a fit, since it doesn't do any rollups for you - you'd have to write your own SQL queries to summarize the data.

I imagine you could do that using SQLite window functions but you'd end up putting a lot of work in to get even a basic alternative to Grafana working.

The visualization features in Datasette are all provided by plugins - I'd love to people experiment with time-series visualization plugins which use SQLite (via the Datasette JSON API) on the backend.

stonecharioteer · on Dec 28, 2020

How about Apache Superset?

Also, perhaps you could add a comparison between Datasette and existing tools on your webpage.

adamsau · on Dec 24, 2020

I think it make it more complicated