Introduction
ARX as a Service is a Oslo Metropolitan University (OsloMet) bachelor thesis project completed in cooperation with Norwegian Labour and Welfare Administration (NAV). The project aims to make ARX features and functionality available as a micro service. To deliver on this goal the bachelor team decided to create a web service implemented with Spring boot which exposes ARX functionality as a RESTful API. Users can then either use the companion packages and clients developed by the team to interact the web service or create their own clients.
Bachelor thesis website: https://oslomet-arx-as-a-service.github.io/
Python package: https://github.com/navikt/PyARXaaS
Web Service: https://github.com/navikt/ARXaaS
HTTP verbs
RESTful notes tries to adhere as closely as possible to standard HTTP and REST conventions in its use of HTTP verbs. The service utilizes the following HTTP verbs on the endpoints.
Verb | Usage |
---|---|
|
Used to retrieve metrics and logging data |
|
Used for requests to analyze, anonymize or generate generalization hierarchies |
HTTP status codes
RESTful notes tries to adhere as closely as possible to standard HTTP and REST conventions in its use of HTTP status codes.
Status code | Usage |
---|---|
|
The request completed successfully |
|
The request was malformed. The response body will include an error providing further information |
|
The requested resource did not exist |
Headers
Every response has the following header(s):
Name | Description |
---|---|
|
The Content-Type of the payload, e.g. |
Resources
Index
The index provides the entry point into the service.
Accessing the index
A GET
request is used to access the index
HTTP response
HTTP/1.1 200 OK
Content-Type: application/hal+json
Content-Length: 170
{"_links":{"self":{"href":"http://localhost:8080/api"},"anonymize":{"href":"http://localhost:8080/api/anonymize"},"analyze":{"href":"http://localhost:8080/api/analyze"}}}
Links
Relation | Description |
---|---|
|
Link root resource |
|
Link arxaas controller |
|
Link to analyze controller |
Analyze Controller
The Analyze controller is used to generate risk profiles for a dataset. The REST controller receives a request object containing a dataset to be analyzed and the attribute type list of the dataset. The Controller returns a response object containing a risk profile that includes the re-identification risk and distribution of risk in a dataset.
Generating a Risk profile
A POST
request is used to generate a risk profile
Request fields
Path | Type | Description |
---|---|---|
|
|
Dataset to be anonymized |
|
|
Attributes types of the dataset |
Curl request
$ curl 'http://localhost:8080/api/analyze' -i -X POST \
-H 'Content-Type: application/json' \
-d '{
"data" : [ [ "age", "gender", "zipcode" ], [ "34", "male", "81667" ], [ "35", "female", "81668" ], [ "36", "male", "81669" ], [ "37", "female", "81670" ], [ "38", "male", "81671" ], [ "39", "female", "81672" ], [ "40", "male", "81673" ], [ "41", "female", "81674" ], [ "42", "male", "81675" ], [ "43", "female", "81676" ], [ "44", "male", "81677" ] ],
"attributes" : [ {
"field" : "age",
"attributeTypeModel" : "IDENTIFYING",
"hierarchy" : null
}, {
"field" : "gender",
"attributeTypeModel" : "SENSITIVE",
"hierarchy" : null
}, {
"field" : "zipcode",
"attributeTypeModel" : "QUASIIDENTIFYING",
"hierarchy" : null
} ],
"privacyModels" : null,
"suppressionLimit" : null
}'
HTTP response
HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 4330
{
"reIdentificationRisk" : {
"measures" : {
"estimated_journalist_risk" : 1.0,
"records_affected_by_highest_prosecutor_risk" : 1.0,
"sample_uniques" : 1.0,
"lowest_risk" : 1.0,
"estimated_prosecutor_risk" : 1.0,
"highest_journalist_risk" : 1.0,
"records_affected_by_lowest_risk" : 1.0,
"average_prosecutor_risk" : 1.0,
"estimated_marketer_risk" : 1.0,
"highest_prosecutor_risk" : 1.0,
"records_affected_by_highest_journalist_risk" : 1.0,
"population_uniques" : 1.0
},
"attackerSuccessRate" : {
"successRates" : {
"Prosecutor_attacker_success_rate" : 1.0,
"Marketer_attacker_success_rate" : 1.0,
"Journalist_attacker_success_rate" : 1.0
}
},
"quasiIdentifiers" : [ "zipcode" ],
"populationModel" : "ZAYATZ"
},
"distributionOfRisk" : {
"riskIntervalList" : [ {
"interval" : "[50,100]",
"recordsWithRiskWithinInterval" : 1.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[33.4,50)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[25,33.4)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[20,25)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[16.7,20)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[14.3,16.7)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[12.5,14.3)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[10,12.5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[9,10)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[8,9)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[7,8)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[6,7)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[5,6)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[4,5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[3,4)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[2,3)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1,2)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.1,1)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.01,0.1)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.001,0.01)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.0001,0.001)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1e-5,0.0001)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1e-6,1e-5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0,1e-6)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
} ]
},
"attributeRisk" : {
"quasiIdentifierRiskList" : [ {
"identifier" : [ "zipcode" ],
"distinction" : 1.0,
"separation" : 1.0
} ]
}
}
Anonymize Controller
The Anonymize controller is used to create new dataset anonymized according to provided privacy models and transformation models. The controller receives a request object containing a dataset to be anonymized, list of attribute types containing transformation models(hierarchies) and privacy models. The controller returns a response object containing an anonymized dataset, a risk profile, and metadata for the anonymization process.
Creating a Anonymized Dataset
A POST
request is used to create a new anonymized dataset
Request fields
Path | Type | Description |
---|---|---|
|
|
Dataset to be anonymized |
|
|
Attributes types and transformation models to be applied to the dataset |
|
|
Privacy Models to be applied to the dataset |
|
|
Suppression limit to be applied to the dataset |
Curl request
$ curl 'http://localhost:8080/api/anonymize' -i -X POST \
-H 'Content-Type: application/json' \
-d '{
"data" : [ [ "age", "gender", "zipcode" ], [ "34", "male", "81667" ], [ "35", "female", "81668" ], [ "36", "male", "81669" ], [ "37", "female", "81670" ], [ "38", "male", "81671" ], [ "39", "female", "81672" ], [ "40", "male", "81673" ], [ "41", "female", "81674" ], [ "42", "male", "81675" ], [ "43", "female", "81676" ], [ "44", "male", "81677" ] ],
"attributes" : [ {
"field" : "age",
"attributeTypeModel" : "IDENTIFYING",
"hierarchy" : null
}, {
"field" : "gender",
"attributeTypeModel" : "SENSITIVE",
"hierarchy" : null
}, {
"field" : "zipcode",
"attributeTypeModel" : "QUASIIDENTIFYING",
"hierarchy" : [ [ "81667", "8166*", "816**", "81***", "8****", "*****" ], [ "81668", "8166*", "816**", "81***", "8****", "*****" ], [ "81669", "8166*", "816**", "81***", "8****", "*****" ], [ "81670", "8167*", "816**", "81***", "8****", "*****" ], [ "81671", "8167*", "816**", "81***", "8****", "*****" ], [ "81672", "8167*", "816**", "81***", "8****", "*****" ], [ "81673", "8167*", "816**", "81***", "8****", "*****" ], [ "81674", "8167*", "816**", "81***", "8****", "*****" ], [ "81675", "8167*", "816**", "81***", "8****", "*****" ], [ "81676", "8167*", "816**", "81***", "8****", "*****" ], [ "81677", "8167*", "816**", "81***", "8****", "*****" ] ]
} ],
"privacyModels" : [ {
"privacyModel" : "KANONYMITY",
"params" : {
"k" : "5"
}
}, {
"privacyModel" : "LDIVERSITY_DISTINCT",
"params" : {
"column_name" : "gender",
"l" : "2"
}
} ],
"suppressionLimit" : 0.02
}'
HTTP response
HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 7813
{
"anonymizeResult" : {
"data" : [ [ "age", "gender", "zipcode" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ] ],
"anonymizationStatus" : "ANONYMOUS",
"metrics" : {
"attributeGeneralization" : [ {
"name" : "zipcode",
"type" : "QUASI_IDENTIFYING_ATTRIBUTE",
"generalizationLevel" : 2
} ],
"processTimeMillisecounds" : 3,
"privacyModels" : [ {
"monotonicWithGeneralization" : true,
"k" : 5,
"minimalClassSize" : 5,
"requirements" : 1,
"riskThresholdJournalist" : 0.2,
"riskThresholdMarketer" : 0.2,
"riskThresholdProsecutor" : 0.2,
"localRecodingSupported" : true,
"minimalClassSizeAvailable" : true,
"dataSubset" : null,
"populationModel" : null,
"subset" : null,
"heuristicSearchSupported" : true,
"heuristicSearchWithTimeLimitSupported" : true,
"optimalSearchSupported" : true,
"monotonicWithSuppression" : true,
"sampleBased" : false,
"subsetAvailable" : false
}, {
"monotonicWithGeneralization" : true,
"attribute" : "gender",
"l" : 2.0,
"localRecodingSupported" : true,
"minimalClassSize" : 2,
"requirements" : 4,
"riskThresholdJournalist" : 0.5,
"riskThresholdMarketer" : 0.5,
"riskThresholdProsecutor" : 0.5,
"minimalClassSizeAvailable" : true,
"dataSubset" : null,
"populationModel" : null,
"subset" : null,
"heuristicSearchSupported" : true,
"heuristicSearchWithTimeLimitSupported" : true,
"optimalSearchSupported" : true,
"monotonicWithSuppression" : true,
"sampleBased" : false,
"subsetAvailable" : false
} ]
},
"attributes" : [ {
"field" : "age",
"attributeTypeModel" : "IDENTIFYING",
"hierarchy" : null
}, {
"field" : "gender",
"attributeTypeModel" : "SENSITIVE",
"hierarchy" : null
}, {
"field" : "zipcode",
"attributeTypeModel" : "QUASIIDENTIFYING",
"hierarchy" : [ [ "81667", "8166*", "816**", "81***", "8****", "*****" ], [ "81668", "8166*", "816**", "81***", "8****", "*****" ], [ "81669", "8166*", "816**", "81***", "8****", "*****" ], [ "81670", "8167*", "816**", "81***", "8****", "*****" ], [ "81671", "8167*", "816**", "81***", "8****", "*****" ], [ "81672", "8167*", "816**", "81***", "8****", "*****" ], [ "81673", "8167*", "816**", "81***", "8****", "*****" ], [ "81674", "8167*", "816**", "81***", "8****", "*****" ], [ "81675", "8167*", "816**", "81***", "8****", "*****" ], [ "81676", "8167*", "816**", "81***", "8****", "*****" ], [ "81677", "8167*", "816**", "81***", "8****", "*****" ] ]
} ]
},
"riskProfile" : {
"reIdentificationRisk" : {
"measures" : {
"estimated_journalist_risk" : 0.09090909090909091,
"records_affected_by_highest_prosecutor_risk" : 1.0,
"sample_uniques" : 0.0,
"lowest_risk" : 0.09090909090909091,
"estimated_prosecutor_risk" : 0.09090909090909091,
"highest_journalist_risk" : 0.09090909090909091,
"records_affected_by_lowest_risk" : 1.0,
"average_prosecutor_risk" : 0.09090909090909091,
"estimated_marketer_risk" : 0.09090909090909091,
"highest_prosecutor_risk" : 0.09090909090909091,
"records_affected_by_highest_journalist_risk" : 1.0,
"population_uniques" : 0.0
},
"attackerSuccessRate" : {
"successRates" : {
"Prosecutor_attacker_success_rate" : 0.09090909090909091,
"Marketer_attacker_success_rate" : 0.09090909090909091,
"Journalist_attacker_success_rate" : 0.09090909090909091
}
},
"quasiIdentifiers" : [ "zipcode" ],
"populationModel" : "DANKAR"
},
"distributionOfRisk" : {
"riskIntervalList" : [ {
"interval" : "[50,100]",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[33.4,50)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[25,33.4)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[20,25)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[16.7,20)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[14.3,16.7)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[12.5,14.3)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[10,12.5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[9,10)",
"recordsWithRiskWithinInterval" : 1.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[8,9)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[7,8)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[6,7)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[5,6)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[4,5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[3,4)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[2,3)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1,2)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.1,1)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.01,0.1)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.001,0.01)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.0001,0.001)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1e-5,0.0001)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1e-6,1e-5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0,1e-6)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
} ]
},
"attributeRisk" : {
"quasiIdentifierRiskList" : [ {
"identifier" : [ "zipcode" ],
"distinction" : 0.09090909090909091,
"separation" : 0.0
} ]
}
}
}
Formdata Analyze Controller
The Formdata Analyze controller is used to generate risk profiles for a dataset. The REST controller receives a multipartfile object containing a dataset csv file to be analyzed and a json object containing the attribute type list of the dataset. The Controller returns a response object containing a risk profile that includes the re-identification risk and distribution of risk in a dataset.
Generating a Risk profile
A POST
request is used to generate a risk profile
Request parts
Part | Description |
---|---|
|
Dataset CSV file to be analyzed |
|
Json object containing the metadata for the attributes types and transformation models to be applied to the dataset |
Curl request
$ curl 'http://localhost:8080/api/analyze/file' -i -X POST \
-H 'Content-Type: multipart/form-data' \
-F 'file=@testDataset.csv;type=text/csv' \
-F 'metadata={"attributes":[{"field":"age","attributeTypeModel":"IDENTIFYING","hierarchy":null},{"field":"gender","attributeTypeModel":"QUASIIDENTIFYING","hierarchy":0},{"field":"zipcode","attributeTypeModel":"QUASIIDENTIFYING","hierarchy":1}],"privacyModels":[{"privacyModel":"KANONYMITY","params":{"k":5}}],"suppressionLimit":0.02};type=application/json'
HTTP response
HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 4467
{
"reIdentificationRisk" : {
"measures" : {
"estimated_journalist_risk" : 1.0,
"records_affected_by_highest_prosecutor_risk" : 1.0,
"sample_uniques" : 1.0,
"lowest_risk" : 1.0,
"estimated_prosecutor_risk" : 1.0,
"highest_journalist_risk" : 1.0,
"records_affected_by_lowest_risk" : 1.0,
"average_prosecutor_risk" : 1.0,
"estimated_marketer_risk" : 1.0,
"highest_prosecutor_risk" : 1.0,
"records_affected_by_highest_journalist_risk" : 1.0,
"population_uniques" : 1.0
},
"attackerSuccessRate" : {
"successRates" : {
"Prosecutor_attacker_success_rate" : 1.0,
"Marketer_attacker_success_rate" : 1.0,
"Journalist_attacker_success_rate" : 1.0
}
},
"quasiIdentifiers" : [ "zipcode", "gender" ],
"populationModel" : "ZAYATZ"
},
"distributionOfRisk" : {
"riskIntervalList" : [ {
"interval" : "[50,100]",
"recordsWithRiskWithinInterval" : 1.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[33.4,50)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[25,33.4)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[20,25)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[16.7,20)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[14.3,16.7)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[12.5,14.3)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[10,12.5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[9,10)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[8,9)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[7,8)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[6,7)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[5,6)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[4,5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[3,4)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[2,3)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1,2)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.1,1)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.01,0.1)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.001,0.01)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.0001,0.001)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1e-5,0.0001)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1e-6,1e-5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0,1e-6)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
} ]
},
"attributeRisk" : {
"quasiIdentifierRiskList" : [ {
"identifier" : [ "zipcode" ],
"distinction" : 1.0,
"separation" : 1.0
}, {
"identifier" : [ "gender" ],
"distinction" : 0.18181818181818182,
"separation" : 0.5454545454545454
} ]
}
}
Formdata Anonymize Controller
The Formdata Anonymize controller is used to create new dataset anonymized according to provided privacy models and transformation models. The controller receives a Multipartfile object containing a dataset CSV file to be anonymized, a json object containing the attribute type list of the dataset and privacy models, and a Multipartfile array containing the hierarchy CSV files. The controller returns a response object containing an anonymized dataset, a risk profile, and metadata for the anonymization process.
Creating a Anonymized Dataset
A POST
request is used to create a new anonymized dataset
Request parts
Part | Description |
---|---|
|
Dataset CSV file to be analyzed |
|
Json object containing the metadata for the attributes types and transformation models to be applied to the dataset |
|
Hierarchy CSV files containing the transformation models |
Curl request
$ curl 'http://localhost:8080/api/anonymize/file' -i -X POST \
-H 'Content-Type: multipart/form-data' \
-F 'file=@testDataset.csv;type=text/csv' \
-F 'metadata={"attributes":[{"field":"age","attributeTypeModel":"IDENTIFYING","hierarchy":null},{"field":"gender","attributeTypeModel":"QUASIIDENTIFYING","hierarchy":0},{"field":"zipcode","attributeTypeModel":"QUASIIDENTIFYING","hierarchy":1}],"privacyModels":[{"privacyModel":"KANONYMITY","params":{"k":5}}],"suppressionLimit":0.02};type=application/json' \
-F 'hierarchies=@testGenderHierarchy.csv;type=text/csv' \
-F 'hierarchies=@testZipcodeHierarchy.csv;type=text/csv'
HTTP response
HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 7438
{
"anonymizeResult" : {
"data" : [ [ "age", "gender", "zipcode" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ] ],
"anonymizationStatus" : "ANONYMOUS",
"metrics" : {
"attributeGeneralization" : [ {
"name" : "gender",
"type" : "QUASI_IDENTIFYING_ATTRIBUTE",
"generalizationLevel" : 0
}, {
"name" : "zipcode",
"type" : "QUASI_IDENTIFYING_ATTRIBUTE",
"generalizationLevel" : 2
} ],
"processTimeMillisecounds" : 2,
"privacyModels" : [ {
"monotonicWithGeneralization" : true,
"k" : 5,
"minimalClassSize" : 5,
"requirements" : 1,
"riskThresholdJournalist" : 0.2,
"riskThresholdMarketer" : 0.2,
"riskThresholdProsecutor" : 0.2,
"localRecodingSupported" : true,
"minimalClassSizeAvailable" : true,
"dataSubset" : null,
"populationModel" : null,
"subset" : null,
"heuristicSearchSupported" : true,
"heuristicSearchWithTimeLimitSupported" : true,
"optimalSearchSupported" : true,
"monotonicWithSuppression" : true,
"sampleBased" : false,
"subsetAvailable" : false
} ]
},
"attributes" : [ {
"field" : "age",
"attributeTypeModel" : "IDENTIFYING",
"hierarchy" : null
}, {
"field" : "gender",
"attributeTypeModel" : "QUASIIDENTIFYING",
"hierarchy" : [ [ "male", "*" ], [ "female", "*" ] ]
}, {
"field" : "zipcode",
"attributeTypeModel" : "QUASIIDENTIFYING",
"hierarchy" : [ [ "81667", "8166*", "816**", "81***", "8****", "*****" ], [ "81668", "8166*", "816**", "81***", "8****", "*****" ], [ "81669", "8166*", "816**", "81***", "8****", "*****" ], [ "81670", "8167*", "816**", "81***", "8****", "*****" ], [ "81671", "8167*", "816**", "81***", "8****", "*****" ], [ "81672", "8167*", "816**", "81***", "8****", "*****" ], [ "81673", "8167*", "816**", "81***", "8****", "*****" ], [ "81674", "8167*", "816**", "81***", "8****", "*****" ], [ "81675", "8167*", "816**", "81***", "8****", "*****" ], [ "81676", "8167*", "816**", "81***", "8****", "*****" ], [ "81677", "8167*", "816**", "81***", "8****", "*****" ] ]
} ]
},
"riskProfile" : {
"reIdentificationRisk" : {
"measures" : {
"estimated_journalist_risk" : 0.2,
"records_affected_by_highest_prosecutor_risk" : 0.45454545454545453,
"sample_uniques" : 0.0,
"lowest_risk" : 0.16666666666666666,
"estimated_prosecutor_risk" : 0.2,
"highest_journalist_risk" : 0.2,
"records_affected_by_lowest_risk" : 0.5454545454545454,
"average_prosecutor_risk" : 0.18181818181818182,
"estimated_marketer_risk" : 0.18181818181818182,
"highest_prosecutor_risk" : 0.2,
"records_affected_by_highest_journalist_risk" : 0.45454545454545453,
"population_uniques" : 0.0
},
"attackerSuccessRate" : {
"successRates" : {
"Prosecutor_attacker_success_rate" : 0.18181818181818182,
"Marketer_attacker_success_rate" : 0.18181818181818182,
"Journalist_attacker_success_rate" : 0.18181818181818182
}
},
"quasiIdentifiers" : [ "zipcode", "gender" ],
"populationModel" : "DANKAR"
},
"distributionOfRisk" : {
"riskIntervalList" : [ {
"interval" : "[50,100]",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[33.4,50)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[25,33.4)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[20,25)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[16.7,20)",
"recordsWithRiskWithinInterval" : 0.45454545454545453,
"recordsWithMaximalRiskWithinInterval" : 1.0
}, {
"interval" : "[14.3,16.7)",
"recordsWithRiskWithinInterval" : 0.5454545454545454,
"recordsWithMaximalRiskWithinInterval" : 0.5454545454545454
}, {
"interval" : "[12.5,14.3)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[10,12.5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[9,10)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[8,9)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[7,8)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[6,7)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[5,6)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[4,5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[3,4)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[2,3)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1,2)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.1,1)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.01,0.1)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.001,0.01)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0.0001,0.001)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1e-5,0.0001)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[1e-6,1e-5)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
}, {
"interval" : "[0,1e-6)",
"recordsWithRiskWithinInterval" : 0.0,
"recordsWithMaximalRiskWithinInterval" : 0.0
} ]
},
"attributeRisk" : {
"quasiIdentifierRiskList" : [ {
"identifier" : [ "zipcode" ],
"distinction" : 0.09090909090909091,
"separation" : 0.0
}, {
"identifier" : [ "gender" ],
"distinction" : 0.18181818181818182,
"separation" : 0.5454545454545454
} ]
}
}
}
Hierarchy Controller
The hierarchy controller provides a interface to access ARX hierarchy builder features. The controller receives a request object containing the dataset column to create the hierarchy for, the builder type and builder specific attributes. The controller returns a response object containing the resulting hierarchy.
Currently the following builders are supported:
-
Redaction based
-
Interval based
-
Order based
Create a redaction based hierarchy
This method builds hierarchies for categorical and non-categorical values using redaction. Dataset items are:
-
aligned left-to-right or right-to-left,
-
differences in length are filled with a padding character.
-
Equally long values are redacted, character by character from left-to-right or right-to-left.
Request fields
Path | Type | Description |
---|---|---|
|
|
List of values to create the hierarchy for |
|
|
Object containing the different parameters on how to build the heirarchy for the dataset column |
|
|
Hierarchy builder type to use when creating the hierarchy |
|
|
Character to use when padding the values |
|
|
Character to use when redacting the values |
|
|
Direction in which to pad the values in the column |
|
|
Direction in which to redact symbols from the values in the column |
HTTP request
POST /api/hierarchy HTTP/1.1
Content-Type: application/json
Content-Length: 260
Host: localhost:8080
{
"column" : [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" ],
"builder" : {
"type" : "redactionBased",
"paddingCharacter" : " ",
"redactionCharacter" : "*",
"paddingOrder" : "RIGHT_TO_LEFT",
"redactionOrder" : "RIGHT_TO_LEFT"
}
}
HTTP response
HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 162
{
"hierarchy" : [ [ "0", "*" ], [ "1", "*" ], [ "2", "*" ], [ "3", "*" ], [ "4", "*" ], [ "5", "*" ], [ "6", "*" ], [ "7", "*" ], [ "8", "*" ], [ "9", "*" ] ]
}
Create a interval based hierarchy
This method builds hierarchies for non-categorical values by mapping them into given intervals.
Request fields
Path | Type | Description |
---|---|---|
|
|
List of values to create the hierarchy for |
|
|
Object containing the different parameters on how to build the heirarchy for the dataset column |
|
|
Hierarchy builder type to use when creating the hierarchy |
|
|
List containing the different intervals to be generalized from and to |
|
|
Interval to generalize from |
|
|
Interval to generalize to |
|
|
Optional label to replace the default generalized interval values |
|
|
List containing parameters on how to generalize the created intervals |
|
|
Transformation level to create a generalization |
|
|
List containing parameters on how to group the generalized column new values |
|
|
Number of items to be grouped from the new generalized column values |
|
|
Optional label to replace the default generalized value |
|
|
Object containing parameters on how to define the lower range interval |
|
|
Value to snap from when a lower value than this defined value is discoverd |
|
|
Value to start bottom coding from |
|
|
If a value is discovered which is smaller than this value an exception will be raised. |
|
|
Object containing parameters on how to define the upper range interval |
|
|
Value to snap from when a higher value than this defined value is discoverd |
|
|
Value to start top coding from |
|
|
If a value is discovered which is larger than this value an exception will be raised. |
|
|
data type of the interval to generalize |
HTTP request
POST /api/hierarchy HTTP/1.1
Content-Type: application/json
Content-Length: 832
Host: localhost:8080
{
"column" : [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" ],
"builder" : {
"type" : "intervalBased",
"intervals" : [ {
"from" : 0,
"to" : 2,
"label" : "young"
}, {
"from" : 2,
"to" : 4,
"label" : "adult"
}, {
"from" : 4,
"to" : 8,
"label" : "old"
}, {
"from" : 8,
"to" : 9223372036854775807,
"label" : "very-old"
} ],
"levels" : [ {
"level" : 0,
"groups" : [ {
"grouping" : 2,
"label" : null
} ]
} ],
"lowerRange" : {
"snapFrom" : 0,
"bottomTopCodingFrom" : 0,
"minMaxValue" : -2305843009213693952
},
"upperRange" : {
"snapFrom" : 81,
"bottomTopCodingFrom" : 100,
"minMaxValue" : 2305843009213693951
},
"dataType" : "LONG"
}
}
HTTP response
HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 352
{
"hierarchy" : [ [ "0", "young", "[0, 4[", "*" ], [ "1", "young", "[0, 4[", "*" ], [ "2", "adult", "[0, 4[", "*" ], [ "3", "adult", "[0, 4[", "*" ], [ "4", "old", "[4, 8[", "*" ], [ "5", "old", "[4, 8[", "*" ], [ "6", "old", "[4, 8[", "*" ], [ "7", "old", "[4, 8[", "*" ], [ "8", "very-old", "[8, 12[", "*" ], [ "9", "very-old", "[8, 12[", "*" ] ]
}
Create a order based hierarchy
This method builds hierarchies for categorical and non-categorical values by ordering the dataset items and merging them into groups with the defined sizes.
Request fields
Path | Type | Description |
---|---|---|
|
|
List of values to create the hierarchy for |
|
|
Object containing the different parameters on how to build the heirarchy for the dataset column |
|
|
Hierarchy builder type to use when creating the hierarchy |
|
|
List containing parameters on how to generalize the dataset column |
|
|
Transformation level to create a generalization |
|
|
List containing parameters on how to group the dataset column |
|
|
Number of items to be grouped from the dataset column values |
|
|
Optional label to replace the default generalized value |
HTTP request
POST /api/hierarchy HTTP/1.1
Content-Type: application/json
Content-Length: 371
Host: localhost:8080
{
"column" : [ "Oslo", "Bergen", "Stockholm", "London", "Paris" ],
"builder" : {
"type" : "orderBased",
"levels" : [ {
"level" : 0,
"groups" : [ {
"grouping" : 3,
"label" : "nordic-city"
} ]
}, {
"level" : 0,
"groups" : [ {
"grouping" : 2,
"label" : "mid-european-city"
} ]
} ]
}
}
HTTP response
HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 204
{
"hierarchy" : [ [ "Oslo", "nordic-city", "*" ], [ "Bergen", "nordic-city", "*" ], [ "Stockholm", "nordic-city", "*" ], [ "London", "mid-european-city", "*" ], [ "Paris", "mid-european-city", "*" ] ]
}
Create a date based hierarchy
This method builds hierarchies for date values following Java SimpleDateFormat.
Request fields
Path | Type | Description |
---|---|---|
|
|
List of values to create the hierarchy for |
|
|
Object containing the different parameters on how to build the heirarchy for the dataset column |
|
|
Hierarchy builder type to use when creating the hierarchy |
|
|
List of Date granularities to create the hierarchy after |
|
|
SimpleDateFormat string describing how the date values should be parsed |
HTTP request
POST /api/hierarchy HTTP/1.1
Content-Type: application/json
Content-Length: 386
Host: localhost:8080
{
"column" : [ "2020-07-16 15:28:024", "2019-07-16 16:38:025", "2019-07-16 17:48:025", "2019-07-16 18:48:025", "2019-06-16 19:48:025", "2019-06-16 20:48:025" ],
"builder" : {
"type" : "dateBased",
"dateFormat" : "yyyy-MM-dd HH:mm:SSS",
"granularities" : [ "SECOND_MINUTE_HOUR_DAY_MONTH_YEAR", "MINUTE_HOUR_DAY_MONTH_YEAR", "HOUR_DAY_MONTH_YEAR", "DAY_MONTH_YEAR" ]
}
}
HTTP response
HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 652
{
"hierarchy" : [ [ "2020-07-16 15:28:024", "16.07.2020-15:28:00", "16.07.2020-15:28", "16.07.2020-15:00", "16.07.2020" ], [ "2019-07-16 16:38:025", "16.07.2019-16:38:00", "16.07.2019-16:38", "16.07.2019-16:00", "16.07.2019" ], [ "2019-07-16 17:48:025", "16.07.2019-17:48:00", "16.07.2019-17:48", "16.07.2019-17:00", "16.07.2019" ], [ "2019-07-16 18:48:025", "16.07.2019-18:48:00", "16.07.2019-18:48", "16.07.2019-18:00", "16.07.2019" ], [ "2019-06-16 19:48:025", "16.06.2019-19:48:00", "16.06.2019-19:48", "16.06.2019-19:00", "16.06.2019" ], [ "2019-06-16 20:48:025", "16.06.2019-20:48:00", "16.06.2019-20:48", "16.06.2019-20:00", "16.06.2019" ] ]
}