More

ElPeque · 2025-06-03T13:48:26 1748958506

It seems like Safari's support of chunked-streamed mp4 is lacking. If you have issues please consider trying Chrome or Firefox for now.

ElPeque · 2025-06-03T13:13:59 1748956439

Oh shit. I fucked up. I'll fix it in a moment :facepalm:

ElPeque · 2025-06-03T13:16:27 1748956587

Ok. Should work now. :)

ElPeque · on Dec 31, 2024

Since TODAY GribStream.com now also leverages the NOAA Rapid Refresh (RAP) model https://rapidrefresh.noaa.gov/. It enables SkewT LogP charting for which I added a python example on the github repo. I hope you will find it useful.

You can check the example here: https://github.com/GribStream/python-client

ElPeque · on Dec 21, 2024

Every hour these models forecast every hour 10 or 15 days ahead. The further ahead they become less accurate of course.

Having all historical weather forecasts is useful to backtest how accurate the model is because you can compare what was being forecasted at the time with what actually happened.

It is also useful to backtest derived models. If you want to figure out how accurate your model is forecasting something like solar power generation while using one of these weather datasets as features for cloud cover and solar radiation, you will want to run the model with the forecast that was current at that point in time. Otherwise you would be "cheating" by using data you wouldn't have had if you ran the model at that time.

In other words, you'd want to "see" the weather as it was being predicted at the time in order to find out what your model would predict for solar power generation.

I hope that makes sense.

ElPeque · on Dec 21, 2024

Definitely more models right now. GribStream.com will be supporting many other models soon.

But open-meteo free access is only for non-commercial use. GribStream.com allows any use.

Also, can open-meteo query forecasts 10 days out at hourly resolution for 150.000 coordinates in a single request and take just 8 seconds? At what price?

I'll do a benchmark soon.

ElPeque · on Dec 20, 2024

Shoot me an email and I'll reach out when I can implement selecting based on bounds.

Out of the top of my head, re-slicing into grib files for the response is probably a big lift but some of the other formats like maybe netCDF or geoTIFF or just compressed array of floats might be a nice MVP.

info@gribstream.com

ElPeque · on Dec 20, 2024

This is really great feedback. The truth is that the pricing model is being figured out, so if you have a specific type of use in mind maybe we could figure out what works best and it might become the actual default pricing model.

I tried to tie the pricing with the amount of processing that the API needs to do, which is closely related to the number of grib2 files that the API needs to download and process in order to create the response. And it doesn't change as much wether I extract 1 point or 1000 points. But I thought I had to draw the line somewhere or nobody would ever pay for anything because the freetier is enough.

But I might make it same price for maybe chunks of 5000 or more points.

From the line of business I come from the main usage is actually to extract scattered coordinates (think weather where specific assets are, like hotels or solar panels or wind farms) and not whole boundaries at full resolution but it makes a lot of sense that for other types of usage that is not the case.

It is definitely in the roadmap to be able to select based on lat/lon bounds and even shapes. Also to return data not as timeseries but the gridded data itself, either as grib2 or netCDF or parquet or a plain matrix of floats or png or even mp4 video.

ElPeque · on Dec 20, 2024

Oh wow. Those are really cool visualizations. I can't compete :P

ElPeque · on Dec 20, 2024

True!

Fixed!

Thank you!

TripleChecker · on Dec 20, 2024

You got it!

ElPeque · on Dec 20, 2024

In theory it could be done. It is sort of analogous to what GribStream is doing already.

The grib2 files are the storage. They are sorted by time in the path and so that is used like a primary index. And then grib2 is just a binary format to decode to extract what you want.

I originally was going to write this as a plugin for Clickhouse but in the end I made it a Golang API cause then I'm less constrained to other things. Like, for example, I'd like to create and endpoint to live encode the gribfiles into MP4 so the data can be served as Video. And then with any video player you would be able to playback, jump to times, etc.

I might still write a clickhouse integration though because it would be amazing to join and combine with other datasets on the fly.

tomnicholas1 · on Dec 20, 2024

> It is sort of analogous to what GribStream is doing already.

The difference is presumably that you are doing some large rechunking operation on your server to hide from the user the fact that the data is actually in multiple files?

Cool project btw, would love to hear a little more about how it works underneath :)

ElPeque · on Dec 20, 2024

Yeah, exactly.

I basically scrape all the grib index files to know all the offsets into all variables for all time. I store that in clickhouse.

When the API gets a request for a time range, set of coordinates and a set of weather parameters, first I pre-compute the mapping of (lat,lon) into the 1 dimensional index in the gridded data. That is a constant across the whole dataset. Then I query the clickhouse table to find out all the files+offset that need to be processed and all of them are queued into a multi-processing pool. And then processing each parameter implies parsing a grib file. I wrote a grib2 parser from scratch in golang so as to extract the data in a streaming fashion. As in... I don't extract the whole grid only to lookup the coordinates in it. I already pre-computed the index, so I can just decode every value in the grid in order and when I hit and index that I'm looking for, I copy it to a fixed size buffer with the extracted data. When You have all the pre-computed indexes then you don't even need to finish downloading the file, I just drop the connection immediately.

It is pretty cool. It is running in very humble hardware so I'm hoping I'll get some traction so I can throw more money into it. It should scale pretty linearly.

I've tested doing multi-year requests and the golang program never goes over 80Mb of memory usage. The CPUs get pegged so that is the limiting factor.

Grib2 complex packing (what the NBM dataset uses) implies lots of bit-packing. So there is a ton more to optimize using SIMD instructions. I've been toying with it a bit but I don't want to mission creep into that yet (fascinating though!).

I'm tempted to port this https://github.com/fast-pack/simdcomp to native go ASM.

tomnicholas1 · on Dec 20, 2024

That's pretty cool! Quite specific to this file format/workload, but this is an important enough problem that people might well be interested in a tailored solution like this :)

ElPeque · on Dec 21, 2024

Thank you!