One of the most unforgivable things about reddit is how pathetic the search engine is, considering the amount of free, top notch information is captured there and you need google +reddit to get at it, what can we do to make federated alternatives self searchable ?
This is going to be even worse than reddit search, unfortunately. There’s not an easy way to make a search like this scale for the small amount of instances we even know about. Considering there are tons of instance out there and there will probably be more in the future, these problems are going to crop up a lot more. It’s actually much easier to search in one centralized location, however the reddit search actually ended up being implemented.
Discord, for example, means all useful information is captured by discord, never to be searched by plebs. IRC is usually ephemeral. Most web search has been diluted by SEO and content farms to the point of uselessness. Perhaps we can think about next gen search right now. A point of hope is things like gigabrain which, it would seem, use LLMs to ‘cut through the noise’, but also summarize and collate, seems like a useful way forward if distributed. Happy to look into it myself, but would like to hear others input. (pleasently ppl were commenting before I finished)
Not sure how to deal with this, but I believe I am a competent coder with ideas, perhaps this is an inappropriate community for this, happy to move the question.
Don’t know how to help but agree on how important search is. Which might be even harder to do given federation.
Also upvote for firefly user name
Eventually I hope lemmy.directory will be great for this purpose. It’s a Lemmy instance configured to pick up every Lemmy community it can find.
Simplest implementation is that an instance searches its own content while sending requests to federated instances and merging their results in with its own based on whatever method the instance admins want (whether it puts its own results at the top, or treats them as one set, or whatever). That could cause a lot of traffic and has a load of latency while your search spreads out hop by hop, to the instances that yours is federated with, to the ones they’re federated with, etc. Plus you’d need a mechanism to stop instances from sending a search to an instance that’s already got it, to avoid hammering instances that have multiple federation paths to yours. Not an easy problem.
You might be able to do some kind of index publication where an instance publishes the most notable posts for other instances to include in their indexes, so that when you search it could show you results from among hot posts elsewhere in the fediverse - not an exhaustive list, but a search within posts that are getting attention.
There’s also other stuff I’d be tempted to experiment with, like using some kind of TF-IDF ranking to choose what counts as “most notable”, rather than just activity or view count, so that posts that are particularly relevant to certain topics could be publicised. An instance could even choose to filter that, so for example an instance who chooses to focus on tech topics could publicise highly-relevant tech posts but filter out politics keywords even when a post gets high relevance scores, so that political discussion on that instance is less visible, even when searched for.
Thankyou for applying soilid thought. What there would you consider actionable ? As in could likely be coded (for free)
Any of that could be done; there’s some parts that are more challenging but there are certainly harder things that have been solved by open-source software. I know almost nothing about how Lemmy’s innards are built though, so I couldn’t hazard a guess as to how much effort any of it would take. Some of it could possibly be achieved through separate services that you could host alongside a Lemmy instance, or entirely on their own, while other parts would really work best as features within Lemmy’s own codebase.