• andrew_bidlaw
    link
    fedilink
    arrow-up
    5
    ·
    1 year ago

    Wilkinson … has examined several data sets generated by earlier versions of the large language model, which he says lacked convincing elements when scrutinized, because they struggled to capture realistic relationships between variables.

    This revealed a mismatch in many ‘participants’ between designated sex and the sex that would typically be expected from their name. Furthermore, no correlation was found between preoperative and postoperative measures of vision capacity and the eye-imaging test. Wilkinson and Lu also inspected the distribution of numbers in some of the columns in the data set to check for non-random patterns. The eye-imaging values passed this test, but some of the participants’ age values clustered in a way that would be extremely unusual in a genuine data set: there was a disproportionate number of participants whose age values ended with 7 or 8.

    It’s 2 am and the homework is due this morning-energy. It seems they were careless and probably thought their data wouldn’t be studied at all. Relationships between columns is where forgeries like these would always suffer. It takes a good amount of understanding to make one, and LLMs lack it unless explicitly guided by a human to take them into account. Otherwise they would find their own, where post-op condition may depend on patient’s last name and 8’s and 9’s are the most popular age’s second digit to choose.