OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

  • stevedidWHAT@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    2
    ·
    1 year ago

    It’s a complicated answer I’m unqualified to answer but essentially there exists some sort of baseline either for people or for how gpt responds usually and then they can figure out which way the text “leans”

      • stevedidWHAT@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        https://medium.com/@rxtang/the-science-of-detecting-llm-generated-texts-e816a14c18d

        But yes it is tho ;)

        “Existing detection methods can be roughly grouped into two categories: black-box detection and white-box detection, black-box detection methods are limited to API-level access to LLMs. They rely on collecting text samples from human and machine sources, respectively, to train a classification model that can be used to discriminate between LLM- and human-generated texts. An alternative is white-box detection, in this scenario, the detector has full access to the LLMs and can control the model’s generation behavior for traceability purposes. In practice, black-box detectors are commonly constructed by external entities, whereas white-box detection is generally carried out by LLM developers.”

      • stevedidWHAT@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        1 year ago

        Are you going to bother explaining or are you just here to be contrarian and pedantic.

        If so, please do, I love learning new things! Otherwise, I’m not really interested, my general over arching point stands

        From my understanding training a model on a dataset of real world password permutations would make guessing those passwords significantly faster (by factors) as opposed to doing a full brute force.

        • redw04@lemmy.ca
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          1 year ago

          Nah, you can go research it yourself and stop spreading misinformation online. Just because you don’t understand how something works doesn’t give you the right to be angry at it.

          • stevedidWHAT@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            I don’t need to, I use models to do this stuff regularly lmao.

            You have a misunderstanding which is apparent by your inability to actually explain. This type of response is also commonly used by trolls to just “no you” repeatedly.

            Modeling predictive text based on a data set is literally LLM basics lmao.