• 0 Posts
  • 69 Comments
Joined 1 year ago
cake
Cake day: June 23rd, 2023

help-circle


  • If you’re somewhere in the world that has a TJ Maxx/TK Maxx or similar, go buy their random products that are on sale. Not all are winners, but if you change up your products and just experiment, you’ll find something you like.

    I have long wavy hair, and right now I’m on a Shea Moisture curl and shine kick, but before then it was the Verb Ghost line of products for a long time.
    Don’t sleep on after shower crap, either. My hair has been really dry lately, so I’ve been using a leave in conditioner by Shea, too (now discontinued, sadly). In the rotation is also the Verb Ghost Oil, and some random peptide leave in. JVN (Johnathon Van Ness) also has some excellent products, but we haven’t found them on sale in awhile.

    I don’t use all the after shower products at once, but each has their use. Once you get a feel for what you’re going for, it’s like having a shelf full of tools.
    And if you got a beard, well… use something and tell me if you figure out what works, because I still can’t figure that out. My hair looks great and my beard looks like it got lost in the desert.





  • You say “Not even close.” in response to the suggestion that Apple’s research can be used to improve benchmarks for AI performance, but then later say the article talks about how we might need different approaches to achieve reasoning.

    Now, mind you - achieving reasoning can only happen if the model is accurate and works well. And to have a good model, you must have good benchmarks.

    Not to belabor the point, but here’s what the article and study says:

    The article talks at length about the reliance on a standardized set of questions - GSM8K, and how the questions themselves may have made their way into the training data. It notes that modifying the questions dynamically leads to decreases in performance of the tested models, even if the complexity of the problem to be solved has not gone up.

    The third sentence of the paper (Abstract section) says this “While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics.” The rest of the abstract goes on to discuss (paraphrased in layman’s terms) that LLM’s are ‘studying for the test’ and not generally achieving real reasoning capabilities.

    By presenting their methodology - dynamically changing the evaluation criteria to reduce data pollution and require models be capable of eliminating red herrings - the Apple researchers are offering a possible way benchmarking can be improved.
    Which is what the person you replied to stated.

    The commenter is fairly close, it seems.











  • Honestly kind of excited for the company blogs to start spitting out their disaster recovery crisis management stories.

    I mean - this is just a giant test of disaster recovery crisis management plans. And while there are absolutely real-world consequences to this, the fix almost seems scriptable.

    If a company uses IPMI (Called Branded AMT and sometimes vPro by Intel), and their network is intact/the devices are on their network, they ought to be able to remotely address this.
    But that’s obviously predicated on them having already deployed/configured the tools.