Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Potentially dumb question, but why does “who viewed my profile” (the feature mentioned in the post as the original use case at LinkedIn) require a realtime OLAP datastore anyway?


That's a great question. A bit of history. Here's more on Pinot, back when it was invented at LinkedIn, before it was ASF-incubated as "Apache Pinot."

LinkedIn originally had been using traditional OLTP systems to power the "Who Viewed My Profile" app; they just simply hit the wall. So LinkedIn looked at other analytics databases at the time. Most just couldn't handle a large number of QPS — which, as a social media platform, they readily anticipated. This is why they defined the category as "user-facing, real-time analytics."

[This is terminology mostly specific to the Apache Pinot crowd, though I see that StarRocks / CelerData also recently started talking about user-facing analytics. So I wrote up an article explaining it here:

https://startree.ai/resources/what-is-user-facing-analytics ]

Other similar extant systems LinkedIn benchmarked at the time just couldn't give them the numbers they needed at the low latency, high concurrency and large scale they anticipated — terabytes to petabytes of data. So they wrote their own solution.

Pinot was originally intended for marketing purposes to capture live intent & action data.

The same Pinot infrastructure eventually grew to other real-time use cases. "Who Viewed My Profile" was followed by "Company Follow Analytics," then sales or recruiting, and even internal A/B testing.

[I wasn't at LinkedIn while this happened; this lore was passed down to me by others. Disclosure: yes I work at StarTree.

https://engineering.linkedin.com/analytics/real-time-analyti... ]


LinkedIn is just another social network, more work related. When you post something new, you do want to see how many likes/comments. Just like Ins. "Who viewed my profile" is real in LinkedIn. It can be someone who is hiring, someone who may watch your talk, someone who may buy your product. I personally started some business conversations when I realized someone viewed my profile, or at least add they as new connections.


Sure, I understand why you’d want real-time. I guess the OLAP part is the one I’m not sure about here.

Most of these use cases seem more “pull all data for a specific user/post/company” and less “do analytical queries on millions+ of column values.”


Got it. To that end, Apache Pinot has a special index that allows certain dimensions to be drilled down on further, more granularly, than others, called the star-tree index. It's part of what makes Pinot so fast.

https://engineering.linkedin.com/blog/2019/06/star-tree-inde...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: