This is super super interesting. While I’m skeptical that many people actually want to see the content they claim to value, I reckon there are definitely large (and monied) cohorts who would value this—especially sitting atop Google. I would instabuy access for eg $10/mo to a few of these if they seemed promising. Even just filtering out all the SEO-gamed crap out of my SERP would be worth that imo, which presumably you could do by weighting heavily against common formatting techniques, keyword stuffing, etc.
I also worked on feed ranking at FB - it was interesting playing around with rankmagic. Aside from all of the larger & exciting use cases (which I have a lot of thoughts on), I think even just being able to build up better intuition around the what type of content that gets distributed when you change the weights of various signals is useful. Otherwise the feedback loop is very long by running an experiment and analyzing the results on a aggregate content level vs just tuning a knob and seeing what happens.
Curious what are your thoughts on continuous adaptation of self hosted models with distillation from LLM? My worry is the classifier we make with $1000 LLM generation cannot be robust to future distribution of data.
This is super super interesting. While I’m skeptical that many people actually want to see the content they claim to value, I reckon there are definitely large (and monied) cohorts who would value this—especially sitting atop Google. I would instabuy access for eg $10/mo to a few of these if they seemed promising. Even just filtering out all the SEO-gamed crap out of my SERP would be worth that imo, which presumably you could do by weighting heavily against common formatting techniques, keyword stuffing, etc.
I also worked on feed ranking at FB - it was interesting playing around with rankmagic. Aside from all of the larger & exciting use cases (which I have a lot of thoughts on), I think even just being able to build up better intuition around the what type of content that gets distributed when you change the weights of various signals is useful. Otherwise the feedback loop is very long by running an experiment and analyzing the results on a aggregate content level vs just tuning a knob and seeing what happens.
wow...amazing post Rob! Very generative for ideas.
Loved how you explained the current lengthy process to productionise a model. I am linking this in my upcoming substack post to drive the point of how LLMs are reducing time-to-prod. Draft preview - https://pakodas.substack.com/p/b8e847bd-3375-48cd-bc4d-cddc41620dd9
Curious what are your thoughts on continuous adaptation of self hosted models with distillation from LLM? My worry is the classifier we make with $1000 LLM generation cannot be robust to future distribution of data.
This post opened up my eyes about a whole new way to approach label generation, thank you Rob!