The case:

  1. We have a dump of all messages from an individual on Facebook where they play safe (thanks Zucky).
  2. And also a dump of an anonymous user, who does some petty crime like piracy or wrongthinking.
  3. We suspect they are the same person.

Probable mind process:

  1. Analyze sentences’, punctuation patterns, mistakes and compare them.
  2. Compare them both against the bigger database to see, if they both use the same words marked as rare.
  3. Thematically mark what they usually write about.
  4. Based on a sum of these three (or more?), calculate a score.

Qs about the system:

  1. Would it be efficient to run these tests on one pair or even groups of anonymous and public users to find correlations, like pulling c/Privacy against a Facebook group? For a small, middle, big amounts of computational power?
  2. Should we be worried this tool would arrive as a complimentary to usual investigative expertises done by hand?
  3. Is there any sense in varying own pattern and behavior to fool something like that (assuming they don’t have more solid data already)?

Qs about the application:

  1. Are we in for cheap LLM solutions to automate user matching and another ways to breach our privacy?
  2. Would it be possible to use them as an additional tool in investigations and a proof in itself in EU or US courts?
  3. Would commercial companies be interested in scrapping and matching it for some profit? Something dumb like calculating your insurance by matching you to depressive forums and boards.

What do you think?

  • @[email protected]
    link
    fedilink
    English
    66 months ago

    There is a topic called stylometry that deals with this. Yes you are on to something. Getting definite matches can be hard (think of “Francis Bacon wrote Shakespeare’s plays”) but the AI can scan through huge datasets to find previously unsuspected similarities. :(