After a Trump-era expansion of DNA collection, immigration officials are apparently funneling 1 million samples a year into the national crime database.
You can anonymize anything, you just strip the identifying data out and do not include it when you give the info out. Like scrubbing the metadata off a pic before you post it online.
The level of discourse in this sub has really fallen off a cliff recently…
Correct. Much like with a fingerprint, with both examples, you can determine with good accuracy if they are the same.
In addition, as tools are advancing, we can extrapolate with very large data sets to identify you even without having seen your specific code, by tracing commonalities through any relatives of yours that voluntarily or involuntarily submitted their codes. However, this does still require those codes to be identifiable. A randomized set of random people’s codes could not be used for this, anymore than a database of fingerprints where all the labels were deleted could be used. It’s just a bunch of random fingerprints, with no names attached anywhere at all, its just a bunch of bleh.
So, big concern in the hands of anyone who has not scrubbed the labels all off, which would overall render it much less useful.
Can’t show a proof without doxxing me but I’ve written a patent to anonymize medical data (not genetic) and I’m a bioinformatician working with sequencing data.
While you could probably achieve reasonable privacy levels by altering genetic data, we shouldn’t play with that under fallacious pretenses.
You can use that data for medical research, of course… but also population profiling or stratification of customers if you are an insurance company.
True. Another guy also pointed out a pic cannot be truly, fully anonymized, and this is also true. Plenty of nuance in the issue for sure. But it is still, nonetheless, possible to render a large genetic database harmless, if proper precautions are taken.
edit: Harmless was a poor word. I really just meant it can be rendered useless for identifying specific individuals, and only really able to provide info on broad population trends. If proper precautions are taken, which they are not currently being taken. So, that’s not good.
You can anonymize anything, you just strip the identifying data out and do not include it when you give the info out. Like scrubbing the metadata off a pic before you post it online.
The level of discourse in this sub has really fallen off a cliff recently…
So, paternity tests exist. Genetics, once enough data are collected, can absolutely be used to identify people.
Correct. Much like with a fingerprint, with both examples, you can determine with good accuracy if they are the same.
In addition, as tools are advancing, we can extrapolate with very large data sets to identify you even without having seen your specific code, by tracing commonalities through any relatives of yours that voluntarily or involuntarily submitted their codes. However, this does still require those codes to be identifiable. A randomized set of random people’s codes could not be used for this, anymore than a database of fingerprints where all the labels were deleted could be used. It’s just a bunch of random fingerprints, with no names attached anywhere at all, its just a bunch of bleh.
So, big concern in the hands of anyone who has not scrubbed the labels all off, which would overall render it much less useful.
You can’t anonymize genetic data because, by essence, it identifies an individual.
and, given enough samples, even those NOT in the database, are anyway by genetic relation.
By that logic you cannot anonymize a pic either. Yet everyone who has their photo taken cannot necessarily be identified in it.
Anonymized data has long been problematic and you definitely cannot meaningfully anonymize a picture in the truest sense of the word.
Can’t show a proof without doxxing me but I’ve written a patent to anonymize medical data (not genetic) and I’m a bioinformatician working with sequencing data.
While you could probably achieve reasonable privacy levels by altering genetic data, we shouldn’t play with that under fallacious pretenses.
You can use that data for medical research, of course… but also population profiling or stratification of customers if you are an insurance company.
And very often, that data turns out not to be anonymous at all.
And a lot of “anonymized” data can actually be deanonymized again
True. Another guy also pointed out a pic cannot be truly, fully anonymized, and this is also true. Plenty of nuance in the issue for sure. But it is still, nonetheless, possible to render a large genetic database harmless, if proper precautions are taken.
edit: Harmless was a poor word. I really just meant it can be rendered useless for identifying specific individuals, and only really able to provide info on broad population trends. If proper precautions are taken, which they are not currently being taken. So, that’s not good.