• 0 Posts
  • 13 Comments
Joined 1 year ago
cake
Cake day: July 9th, 2023

help-circle
  • This ignores the first part of my response - if I, as a legitimate user, might get caught up in one of these trees, either by mistakenly approving a bot, or approving a user who approves a bot, and I risk losing my account if this happens, what is my incentive to approve anyone?

    Additionally, let’s assume I’m a really dumb bot creator, and I keep all of my bots in the same tree. I don’t bother to maintain a few legitimate accounts, and I don’t bother to have random users approve some of the bots. If my entire tree gets nuked, it’s still only a few weeks until I’m back at full force.

    With a very slightly smarter bot creator, you also won’t have a nice tree:

    As a new user looking for an approver, how do I know I’m not requesting (or otherwise getting) approved by a bot? To appear legitimate, they would be incentivized to approve legitimate users, in addition to bots.

    A reasonably intelligent bot creator would have several accounts they directly control and use legitimately (this keeps their foot in the door), would mix reaching out to random users for approval with having bots approve bots, and would approve legitimate users in addition to bots. The tree ends up as much more of a tangled graph.


  • This ignores the first part of my response - if I, as a legitimate user, might get caught up in one of these trees, either by mistakenly approving a bot, or approving a user who approves a bot, and I risk losing my account if this happens, what is my incentive to approve anyone?

    Additionally, let’s assume I’m a really dumb bot creator, and I keep all of my bots in the same tree. I don’t bother to maintain a few legitimate accounts, and I don’t bother to have random users approve some of the bots. If my entire tree gets nuked, it’s still only a few weeks until I’m back at full force.

    With a very slightly smarter bot creator, you also won’t have a nice tree:

    As a new user looking for an approver, how do I know I’m not requesting (or otherwise getting) approved by a bot? To appear legitimate, they would be incentivized to approve legitimate users, in addition to bots.

    A reasonably intelligent bot creator would have several accounts they directly control and use legitimately (this keeps their foot in the door), would mix reaching out to random users for approval with having bots approve bots, and would approve legitimate users in addition to bots. The tree ends up as much more of a tangled graph.


  • I think this would be too limiting for humans, and not effective for bots.

    As a human, unless you know the person in real life, what’s the incentive to approve them, if there’s a chance you could be banned for their bad behavior?

    As a bot creator, you can still achieve exponential growth - every time you create a new bot, you have a new approver, so you go from 1 -> 2 -> 4 -> 8. Even if, on average, you had to wait a week between approvals, in 25 weeks (less that half a year), you could have over 33 million accounts. Even if you play it safe, and don’t generate/approve the maximal accounts every week, you’d still have hundreds of thousands to millions in a matter of weeks.



  • This kind of reminds me of Crispin Glover, from Back to the Future. He tried to negotiate a higher pay for the second movie, so the producers hired a different actor to play the role, but deliberately made the actor up to look like Glover. In response, Glover sued the producers and won. It set a critical precedent for Hollywood, about using someone’s likeness without consent.

    The article mentions they reached out to her two days before the launch - if she had said ‘OK,’ there’s no way they could have even recorded what they needed from her, let alone trained the model in time for the presentation. So they must have had a Scarlett Johansson voice ready to go. Other than training the model on movies (really not ideal for a high quality voice model), how would they have gotten the recordings they needed?

    If they hired a “random” voice actress, they might not run into issues. But if at any point they had a job listing, a discussion with a talent manager, or anything else where they mentioned wanting a “Scarlett Johansson sound-alike,” they might have dug themselves a nice hole here.

    Specifically regarding your question about hiring a voice actor that sounds like someone else - this is commonly done to replace people for cartoons. I don’t think it’s an issue if you are playing a character. But if you deliberately impersonate a person, there might be some trouble.


  • I noted in another comment that SearXNG can’t do anything about the trackers that your browser can’t do, and solving this at the browser level is a much better solution, because it protects you everywhere, rather than just on the search engine.

    Routing over Tor is similar. Yes, you can route the search from your SearXNG instance to Google (or whatever upstream engine) over Tor, and hide your identity from Google. But then you click a link, and your IP connects to the IP of whatever site the results link to, and your ISP sees that. Knowing where you land can tell your ISP a lot about what you searched for. And the site you connected to knows your IP, so they get even more information - they know every action you took on the site, and everything you viewed. If you want to protect all of that, you should just use Tor on your computer, and protect every connection.

    This is the same argument for using Signal vs WhatsApp - yes, in WhatsApp the conversation may be E2E encrypted, but the metadata about who you’re chatting with, for how long, etc is all still very valuable to Meta.

    To reiterate/clarify what I’ve said elsewhere, I’m not making the case that people shouldn’t use SearXNG at all, only that their privacy claims are overstated, and if your goal is privacy, all the levels of security you would apply to SearXNG should be applied at your device level: Use a browser/extension to block trackers, use Tor to protect all your traffic, etc.


  • They are explicitly trying to move away from Google, and are looking for a new option because their current solution is forcing them to turn off ad-blocking. Sounds to me like they are looking for a private option. Plus, given the forum in which we are having the discussion (Lemmy), even if OP is not specifically concerned with privacy, it seems likely other users are.

    As for cookies, searxng can’t do any more than your browser (possibly with extensions) can do, and relying on your browser here is a much better solution, because it protects you on all sites, rather than just on your chosen search engine.

    “Trash mountain” results is a whole separate issue - you can certainly tune the results to your liking. But literally the second sentence of their GitHub headline is touting no tracking or profiling, so it seems worth bringing attention to the limitations, and that’s all I’m trying to do here.



  • It looks like a few people are recommending this, so just a quick note in case people are unaware:

    If you want to avoid being tracked, this is not a good solution. Searxng is a meta search engine, meaning it is effectively a proxy: you search on Searxng, it searches multiple sites and sends all the results back to you. If you use a public instance, you may be protected from the actual search engine*, because many people will use the same instance, and your queries will be mixed in with all of them. If you self host, however, all the searches will be your own - there is then no difference between using Searxng and just going to the site yourself.

    *The caveat with using the public instances is while you may be protected from the upstream engine, you have to trust the admins - nothing stops them from tracking you themselves (or passing your data on).

    Despite the claims in their docs, I would not consider this a privacy tool. If you are just looking for a good search engine, this may work, and it gives you flexibility and power to tune it yourself. But it’s probably not going to do anything good for your privacy, above and beyond what you can get from other meta search engines like Startpage and DuckDuckGo, or other “private” search engines like Brave.



  • it has its flaws.

    Yep yep. I was aware of some of what you pointed out - I think this might be a “perfect is the enemy of good” scenario, though. GitHub alone accounts for over 84% (based on the awesome-selfhosted-data repo):

    $ grep -r 'source_code_url' | cut -d ' ' -f 2 | cut -d '/' -f 3 | sort | uniq -c | sort -rn | head -n 15
       1068 github.com
         36 gitlab.com
          7 git.mills.io
          6 sourceforge.net
          6 framagit.org
          4 www.atlassian.com
          4 codeberg.org
          3 git.drupalcode.org
          3 git.cloudron.io
          2 repos.goffi.org
          2 git.tt-rss.org
          2 git.sr.ht
          2 cvsweb.openbsd.org
          1 yetishare.com
          1 www.wiz.cn
    
    $ python -c "print($(grep -r 'source_code_url' . | grep github.com | wc -l) / $(ls -1 | wc -l))"
    0.8422712933753943
    

    Adding in gitlab gets you to 87%:

    $ python -c "print($(grep -r 'source_code_url' . | grep -i -e github.com -e gitlab.com | wc -l) / $(ls -1 | wc -l))" 0.8706624605678234

    Also popularity != quality.

    True, but a thriving community generally means more resources, guides, etc, which can be important, especially for self-hosted solutions.

    In any case, the project is great, and much appreciated. Additionally, the enriched html version looks fantastic, and exposes most of the metadata* I’d want to see, regardless of how it’s sorted.

    *One other item to track, that I thought about after making my previous comment - number of contributors. It gives an additional data point on the size of the community, as well as an idea of how many people can be hit by busses before the continued development of the project gets called into question.


  • I would imagine the source for most projects is hosted on GitHub, or similar platforms? Perhaps you could consider forks, stars, and followers as “votes” and sort each sub category based on the votes. I would imagine that would be scriptable - the script could be included in the awesome list repo, and run periodically. It would be kind of interesting to tag “releases” and see how the sort order changes over time. If you wanted to get fancy, the sorting could probably happen as part of a CI task.

    If workable, the obvious benefit is you don’t have to exclude anything for subjective reasons, but it’s easier for readers of the list to quickly find the “most used” options.

    Just an idea off the top of my head. You may have already thought about it, and/or it may be full of holes.