Introduction

ARX as a Service is a Oslo Metropolitan University (OsloMet) bachelor thesis project completed in cooperation with Norwegian Labour and Welfare Administration (NAV). The project aims to make ARX features and functionality available as a micro service. To deliver on this goal the bachelor team decided to create a web service implemented with Spring boot which exposes ARX functionality as a RESTful API. Users can then either use the companion packages and clients developed by the team to interact the web service or create their own clients.

HTTP verbs

RESTful notes tries to adhere as closely as possible to standard HTTP and REST conventions in its use of HTTP verbs. The service utilizes the following HTTP verbs on the endpoints.

Verb Usage

GET

Used to retrieve metrics and logging data

POST

Used for requests to analyze, anonymize or generate generalization hierarchies

HTTP status codes

RESTful notes tries to adhere as closely as possible to standard HTTP and REST conventions in its use of HTTP status codes.

Status code Usage

200 OK

The request completed successfully

400 Bad Request

The request was malformed. The response body will include an error providing further information

404 Not Found

The requested resource did not exist

Headers

Every response has the following header(s):

Name Description

Content-Type

The Content-Type of the payload, e.g. application/hal+json

Resources

Index

The index provides the entry point into the service.

Accessing the index

A GET request is used to access the index

HTTP response

HTTP/1.1 200 OK
Content-Type: application/hal+json
Content-Length: 170

{"_links":{"self":{"href":"http://localhost:8080/api"},"anonymize":{"href":"http://localhost:8080/api/anonymize"},"analyze":{"href":"http://localhost:8080/api/analyze"}}}
Relation Description

self

Link root resource

anonymize

Link arxaas controller

analyze

Link to analyze controller

Analyze Controller

The Analyze controller is used to generate risk profiles for a dataset. The REST controller receives a request object containing a dataset to be analyzed and the attribute type list of the dataset. The Controller returns a response object containing a risk profile that includes the re-identification risk and distribution of risk in a dataset.

Generating a Risk profile

A POST request is used to generate a risk profile

Request fields

Path Type Description

data

Array

Dataset to be anonymized

attributes

Array

Attributes types of the dataset

Curl request

$ curl 'http://localhost:8080/api/analyze' -i -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "data" : [ [ "age", "gender", "zipcode" ], [ "34", "male", "81667" ], [ "35", "female", "81668" ], [ "36", "male", "81669" ], [ "37", "female", "81670" ], [ "38", "male", "81671" ], [ "39", "female", "81672" ], [ "40", "male", "81673" ], [ "41", "female", "81674" ], [ "42", "male", "81675" ], [ "43", "female", "81676" ], [ "44", "male", "81677" ] ],
  "attributes" : [ {
    "field" : "age",
    "attributeTypeModel" : "IDENTIFYING",
    "hierarchy" : null
  }, {
    "field" : "gender",
    "attributeTypeModel" : "SENSITIVE",
    "hierarchy" : null
  }, {
    "field" : "zipcode",
    "attributeTypeModel" : "QUASIIDENTIFYING",
    "hierarchy" : null
  } ],
  "privacyModels" : null,
  "suppressionLimit" : null
}'

HTTP response

HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 4330

{
  "reIdentificationRisk" : {
    "measures" : {
      "estimated_journalist_risk" : 1.0,
      "records_affected_by_highest_prosecutor_risk" : 1.0,
      "sample_uniques" : 1.0,
      "lowest_risk" : 1.0,
      "estimated_prosecutor_risk" : 1.0,
      "highest_journalist_risk" : 1.0,
      "records_affected_by_lowest_risk" : 1.0,
      "average_prosecutor_risk" : 1.0,
      "estimated_marketer_risk" : 1.0,
      "highest_prosecutor_risk" : 1.0,
      "records_affected_by_highest_journalist_risk" : 1.0,
      "population_uniques" : 1.0
    },
    "attackerSuccessRate" : {
      "successRates" : {
        "Prosecutor_attacker_success_rate" : 1.0,
        "Marketer_attacker_success_rate" : 1.0,
        "Journalist_attacker_success_rate" : 1.0
      }
    },
    "quasiIdentifiers" : [ "zipcode" ],
    "populationModel" : "ZAYATZ"
  },
  "distributionOfRisk" : {
    "riskIntervalList" : [ {
      "interval" : "[50,100]",
      "recordsWithRiskWithinInterval" : 1.0,
      "recordsWithMaximalRiskWithinInterval" : 1.0
    }, {
      "interval" : "[33.4,50)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[25,33.4)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[20,25)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[16.7,20)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[14.3,16.7)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[12.5,14.3)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[10,12.5)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[9,10)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[8,9)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[7,8)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[6,7)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[5,6)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[4,5)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[3,4)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[2,3)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[1,2)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0.1,1)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0.01,0.1)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0.001,0.01)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0.0001,0.001)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[1e-5,0.0001)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[1e-6,1e-5)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0,1e-6)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    } ]
  },
  "attributeRisk" : {
    "quasiIdentifierRiskList" : [ {
      "identifier" : [ "zipcode" ],
      "distinction" : 1.0,
      "separation" : 1.0
    } ]
  }
}

Anonymize Controller

The Anonymize controller is used to create new dataset anonymized according to provided privacy models and transformation models. The controller receives a request object containing a dataset to be anonymized, list of attribute types containing transformation models(hierarchies) and privacy models. The controller returns a response object containing an anonymized dataset, a risk profile, and metadata for the anonymization process.

Creating a Anonymized Dataset

A POST request is used to create a new anonymized dataset

Request fields

Path Type Description

data

Array

Dataset to be anonymized

attributes

Array

Attributes types and transformation models to be applied to the dataset

privacyModels

Array

Privacy Models to be applied to the dataset

suppressionLimit

Number

Suppression limit to be applied to the dataset

Curl request

$ curl 'http://localhost:8080/api/anonymize' -i -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "data" : [ [ "age", "gender", "zipcode" ], [ "34", "male", "81667" ], [ "35", "female", "81668" ], [ "36", "male", "81669" ], [ "37", "female", "81670" ], [ "38", "male", "81671" ], [ "39", "female", "81672" ], [ "40", "male", "81673" ], [ "41", "female", "81674" ], [ "42", "male", "81675" ], [ "43", "female", "81676" ], [ "44", "male", "81677" ] ],
  "attributes" : [ {
    "field" : "age",
    "attributeTypeModel" : "IDENTIFYING",
    "hierarchy" : null
  }, {
    "field" : "gender",
    "attributeTypeModel" : "SENSITIVE",
    "hierarchy" : null
  }, {
    "field" : "zipcode",
    "attributeTypeModel" : "QUASIIDENTIFYING",
    "hierarchy" : [ [ "81667", "8166*", "816**", "81***", "8****", "*****" ], [ "81668", "8166*", "816**", "81***", "8****", "*****" ], [ "81669", "8166*", "816**", "81***", "8****", "*****" ], [ "81670", "8167*", "816**", "81***", "8****", "*****" ], [ "81671", "8167*", "816**", "81***", "8****", "*****" ], [ "81672", "8167*", "816**", "81***", "8****", "*****" ], [ "81673", "8167*", "816**", "81***", "8****", "*****" ], [ "81674", "8167*", "816**", "81***", "8****", "*****" ], [ "81675", "8167*", "816**", "81***", "8****", "*****" ], [ "81676", "8167*", "816**", "81***", "8****", "*****" ], [ "81677", "8167*", "816**", "81***", "8****", "*****" ] ]
  } ],
  "privacyModels" : [ {
    "privacyModel" : "KANONYMITY",
    "params" : {
      "k" : "5"
    }
  }, {
    "privacyModel" : "LDIVERSITY_DISTINCT",
    "params" : {
      "column_name" : "gender",
      "l" : "2"
    }
  } ],
  "suppressionLimit" : 0.02
}'

HTTP response

HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 7813

{
  "anonymizeResult" : {
    "data" : [ [ "age", "gender", "zipcode" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ] ],
    "anonymizationStatus" : "ANONYMOUS",
    "metrics" : {
      "attributeGeneralization" : [ {
        "name" : "zipcode",
        "type" : "QUASI_IDENTIFYING_ATTRIBUTE",
        "generalizationLevel" : 2
      } ],
      "processTimeMillisecounds" : 3,
      "privacyModels" : [ {
        "monotonicWithGeneralization" : true,
        "k" : 5,
        "minimalClassSize" : 5,
        "requirements" : 1,
        "riskThresholdJournalist" : 0.2,
        "riskThresholdMarketer" : 0.2,
        "riskThresholdProsecutor" : 0.2,
        "localRecodingSupported" : true,
        "minimalClassSizeAvailable" : true,
        "dataSubset" : null,
        "populationModel" : null,
        "subset" : null,
        "heuristicSearchSupported" : true,
        "heuristicSearchWithTimeLimitSupported" : true,
        "optimalSearchSupported" : true,
        "monotonicWithSuppression" : true,
        "sampleBased" : false,
        "subsetAvailable" : false
      }, {
        "monotonicWithGeneralization" : true,
        "attribute" : "gender",
        "l" : 2.0,
        "localRecodingSupported" : true,
        "minimalClassSize" : 2,
        "requirements" : 4,
        "riskThresholdJournalist" : 0.5,
        "riskThresholdMarketer" : 0.5,
        "riskThresholdProsecutor" : 0.5,
        "minimalClassSizeAvailable" : true,
        "dataSubset" : null,
        "populationModel" : null,
        "subset" : null,
        "heuristicSearchSupported" : true,
        "heuristicSearchWithTimeLimitSupported" : true,
        "optimalSearchSupported" : true,
        "monotonicWithSuppression" : true,
        "sampleBased" : false,
        "subsetAvailable" : false
      } ]
    },
    "attributes" : [ {
      "field" : "age",
      "attributeTypeModel" : "IDENTIFYING",
      "hierarchy" : null
    }, {
      "field" : "gender",
      "attributeTypeModel" : "SENSITIVE",
      "hierarchy" : null
    }, {
      "field" : "zipcode",
      "attributeTypeModel" : "QUASIIDENTIFYING",
      "hierarchy" : [ [ "81667", "8166*", "816**", "81***", "8****", "*****" ], [ "81668", "8166*", "816**", "81***", "8****", "*****" ], [ "81669", "8166*", "816**", "81***", "8****", "*****" ], [ "81670", "8167*", "816**", "81***", "8****", "*****" ], [ "81671", "8167*", "816**", "81***", "8****", "*****" ], [ "81672", "8167*", "816**", "81***", "8****", "*****" ], [ "81673", "8167*", "816**", "81***", "8****", "*****" ], [ "81674", "8167*", "816**", "81***", "8****", "*****" ], [ "81675", "8167*", "816**", "81***", "8****", "*****" ], [ "81676", "8167*", "816**", "81***", "8****", "*****" ], [ "81677", "8167*", "816**", "81***", "8****", "*****" ] ]
    } ]
  },
  "riskProfile" : {
    "reIdentificationRisk" : {
      "measures" : {
        "estimated_journalist_risk" : 0.09090909090909091,
        "records_affected_by_highest_prosecutor_risk" : 1.0,
        "sample_uniques" : 0.0,
        "lowest_risk" : 0.09090909090909091,
        "estimated_prosecutor_risk" : 0.09090909090909091,
        "highest_journalist_risk" : 0.09090909090909091,
        "records_affected_by_lowest_risk" : 1.0,
        "average_prosecutor_risk" : 0.09090909090909091,
        "estimated_marketer_risk" : 0.09090909090909091,
        "highest_prosecutor_risk" : 0.09090909090909091,
        "records_affected_by_highest_journalist_risk" : 1.0,
        "population_uniques" : 0.0
      },
      "attackerSuccessRate" : {
        "successRates" : {
          "Prosecutor_attacker_success_rate" : 0.09090909090909091,
          "Marketer_attacker_success_rate" : 0.09090909090909091,
          "Journalist_attacker_success_rate" : 0.09090909090909091
        }
      },
      "quasiIdentifiers" : [ "zipcode" ],
      "populationModel" : "DANKAR"
    },
    "distributionOfRisk" : {
      "riskIntervalList" : [ {
        "interval" : "[50,100]",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[33.4,50)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[25,33.4)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[20,25)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[16.7,20)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[14.3,16.7)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[12.5,14.3)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[10,12.5)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[9,10)",
        "recordsWithRiskWithinInterval" : 1.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[8,9)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[7,8)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[6,7)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[5,6)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[4,5)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[3,4)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[2,3)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[1,2)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0.1,1)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0.01,0.1)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0.001,0.01)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0.0001,0.001)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[1e-5,0.0001)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[1e-6,1e-5)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0,1e-6)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      } ]
    },
    "attributeRisk" : {
      "quasiIdentifierRiskList" : [ {
        "identifier" : [ "zipcode" ],
        "distinction" : 0.09090909090909091,
        "separation" : 0.0
      } ]
    }
  }
}

Formdata Analyze Controller

The Formdata Analyze controller is used to generate risk profiles for a dataset. The REST controller receives a multipartfile object containing a dataset csv file to be analyzed and a json object containing the attribute type list of the dataset. The Controller returns a response object containing a risk profile that includes the re-identification risk and distribution of risk in a dataset.

Generating a Risk profile

A POST request is used to generate a risk profile

Request parts

Part Description

file

Dataset CSV file to be analyzed

metadata

Json object containing the metadata for the attributes types and transformation models to be applied to the dataset

Curl request

$ curl 'http://localhost:8080/api/analyze/file' -i -X POST \
    -H 'Content-Type: multipart/form-data' \
    -F 'file=@testDataset.csv;type=text/csv' \
    -F 'metadata={"attributes":[{"field":"age","attributeTypeModel":"IDENTIFYING","hierarchy":null},{"field":"gender","attributeTypeModel":"QUASIIDENTIFYING","hierarchy":0},{"field":"zipcode","attributeTypeModel":"QUASIIDENTIFYING","hierarchy":1}],"privacyModels":[{"privacyModel":"KANONYMITY","params":{"k":5}}],"suppressionLimit":0.02};type=application/json'

HTTP response

HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 4467

{
  "reIdentificationRisk" : {
    "measures" : {
      "estimated_journalist_risk" : 1.0,
      "records_affected_by_highest_prosecutor_risk" : 1.0,
      "sample_uniques" : 1.0,
      "lowest_risk" : 1.0,
      "estimated_prosecutor_risk" : 1.0,
      "highest_journalist_risk" : 1.0,
      "records_affected_by_lowest_risk" : 1.0,
      "average_prosecutor_risk" : 1.0,
      "estimated_marketer_risk" : 1.0,
      "highest_prosecutor_risk" : 1.0,
      "records_affected_by_highest_journalist_risk" : 1.0,
      "population_uniques" : 1.0
    },
    "attackerSuccessRate" : {
      "successRates" : {
        "Prosecutor_attacker_success_rate" : 1.0,
        "Marketer_attacker_success_rate" : 1.0,
        "Journalist_attacker_success_rate" : 1.0
      }
    },
    "quasiIdentifiers" : [ "zipcode", "gender" ],
    "populationModel" : "ZAYATZ"
  },
  "distributionOfRisk" : {
    "riskIntervalList" : [ {
      "interval" : "[50,100]",
      "recordsWithRiskWithinInterval" : 1.0,
      "recordsWithMaximalRiskWithinInterval" : 1.0
    }, {
      "interval" : "[33.4,50)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[25,33.4)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[20,25)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[16.7,20)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[14.3,16.7)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[12.5,14.3)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[10,12.5)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[9,10)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[8,9)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[7,8)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[6,7)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[5,6)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[4,5)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[3,4)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[2,3)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[1,2)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0.1,1)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0.01,0.1)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0.001,0.01)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0.0001,0.001)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[1e-5,0.0001)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[1e-6,1e-5)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    }, {
      "interval" : "[0,1e-6)",
      "recordsWithRiskWithinInterval" : 0.0,
      "recordsWithMaximalRiskWithinInterval" : 0.0
    } ]
  },
  "attributeRisk" : {
    "quasiIdentifierRiskList" : [ {
      "identifier" : [ "zipcode" ],
      "distinction" : 1.0,
      "separation" : 1.0
    }, {
      "identifier" : [ "gender" ],
      "distinction" : 0.18181818181818182,
      "separation" : 0.5454545454545454
    } ]
  }
}

Formdata Anonymize Controller

The Formdata Anonymize controller is used to create new dataset anonymized according to provided privacy models and transformation models. The controller receives a Multipartfile object containing a dataset CSV file to be anonymized, a json object containing the attribute type list of the dataset and privacy models, and a Multipartfile array containing the hierarchy CSV files. The controller returns a response object containing an anonymized dataset, a risk profile, and metadata for the anonymization process.

Creating a Anonymized Dataset

A POST request is used to create a new anonymized dataset

Request parts

Part Description

file

Dataset CSV file to be analyzed

metadata

Json object containing the metadata for the attributes types and transformation models to be applied to the dataset

hierarchies

Hierarchy CSV files containing the transformation models

Curl request

$ curl 'http://localhost:8080/api/anonymize/file' -i -X POST \
    -H 'Content-Type: multipart/form-data' \
    -F 'file=@testDataset.csv;type=text/csv' \
    -F 'metadata={"attributes":[{"field":"age","attributeTypeModel":"IDENTIFYING","hierarchy":null},{"field":"gender","attributeTypeModel":"QUASIIDENTIFYING","hierarchy":0},{"field":"zipcode","attributeTypeModel":"QUASIIDENTIFYING","hierarchy":1}],"privacyModels":[{"privacyModel":"KANONYMITY","params":{"k":5}}],"suppressionLimit":0.02};type=application/json' \
    -F 'hierarchies=@testGenderHierarchy.csv;type=text/csv' \
    -F 'hierarchies=@testZipcodeHierarchy.csv;type=text/csv'

HTTP response

HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 7438

{
  "anonymizeResult" : {
    "data" : [ [ "age", "gender", "zipcode" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ], [ "*", "female", "816**" ], [ "*", "male", "816**" ] ],
    "anonymizationStatus" : "ANONYMOUS",
    "metrics" : {
      "attributeGeneralization" : [ {
        "name" : "gender",
        "type" : "QUASI_IDENTIFYING_ATTRIBUTE",
        "generalizationLevel" : 0
      }, {
        "name" : "zipcode",
        "type" : "QUASI_IDENTIFYING_ATTRIBUTE",
        "generalizationLevel" : 2
      } ],
      "processTimeMillisecounds" : 2,
      "privacyModels" : [ {
        "monotonicWithGeneralization" : true,
        "k" : 5,
        "minimalClassSize" : 5,
        "requirements" : 1,
        "riskThresholdJournalist" : 0.2,
        "riskThresholdMarketer" : 0.2,
        "riskThresholdProsecutor" : 0.2,
        "localRecodingSupported" : true,
        "minimalClassSizeAvailable" : true,
        "dataSubset" : null,
        "populationModel" : null,
        "subset" : null,
        "heuristicSearchSupported" : true,
        "heuristicSearchWithTimeLimitSupported" : true,
        "optimalSearchSupported" : true,
        "monotonicWithSuppression" : true,
        "sampleBased" : false,
        "subsetAvailable" : false
      } ]
    },
    "attributes" : [ {
      "field" : "age",
      "attributeTypeModel" : "IDENTIFYING",
      "hierarchy" : null
    }, {
      "field" : "gender",
      "attributeTypeModel" : "QUASIIDENTIFYING",
      "hierarchy" : [ [ "male", "*" ], [ "female", "*" ] ]
    }, {
      "field" : "zipcode",
      "attributeTypeModel" : "QUASIIDENTIFYING",
      "hierarchy" : [ [ "81667", "8166*", "816**", "81***", "8****", "*****" ], [ "81668", "8166*", "816**", "81***", "8****", "*****" ], [ "81669", "8166*", "816**", "81***", "8****", "*****" ], [ "81670", "8167*", "816**", "81***", "8****", "*****" ], [ "81671", "8167*", "816**", "81***", "8****", "*****" ], [ "81672", "8167*", "816**", "81***", "8****", "*****" ], [ "81673", "8167*", "816**", "81***", "8****", "*****" ], [ "81674", "8167*", "816**", "81***", "8****", "*****" ], [ "81675", "8167*", "816**", "81***", "8****", "*****" ], [ "81676", "8167*", "816**", "81***", "8****", "*****" ], [ "81677", "8167*", "816**", "81***", "8****", "*****" ] ]
    } ]
  },
  "riskProfile" : {
    "reIdentificationRisk" : {
      "measures" : {
        "estimated_journalist_risk" : 0.2,
        "records_affected_by_highest_prosecutor_risk" : 0.45454545454545453,
        "sample_uniques" : 0.0,
        "lowest_risk" : 0.16666666666666666,
        "estimated_prosecutor_risk" : 0.2,
        "highest_journalist_risk" : 0.2,
        "records_affected_by_lowest_risk" : 0.5454545454545454,
        "average_prosecutor_risk" : 0.18181818181818182,
        "estimated_marketer_risk" : 0.18181818181818182,
        "highest_prosecutor_risk" : 0.2,
        "records_affected_by_highest_journalist_risk" : 0.45454545454545453,
        "population_uniques" : 0.0
      },
      "attackerSuccessRate" : {
        "successRates" : {
          "Prosecutor_attacker_success_rate" : 0.18181818181818182,
          "Marketer_attacker_success_rate" : 0.18181818181818182,
          "Journalist_attacker_success_rate" : 0.18181818181818182
        }
      },
      "quasiIdentifiers" : [ "zipcode", "gender" ],
      "populationModel" : "DANKAR"
    },
    "distributionOfRisk" : {
      "riskIntervalList" : [ {
        "interval" : "[50,100]",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[33.4,50)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[25,33.4)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[20,25)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[16.7,20)",
        "recordsWithRiskWithinInterval" : 0.45454545454545453,
        "recordsWithMaximalRiskWithinInterval" : 1.0
      }, {
        "interval" : "[14.3,16.7)",
        "recordsWithRiskWithinInterval" : 0.5454545454545454,
        "recordsWithMaximalRiskWithinInterval" : 0.5454545454545454
      }, {
        "interval" : "[12.5,14.3)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[10,12.5)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[9,10)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[8,9)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[7,8)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[6,7)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[5,6)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[4,5)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[3,4)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[2,3)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[1,2)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0.1,1)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0.01,0.1)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0.001,0.01)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0.0001,0.001)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[1e-5,0.0001)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[1e-6,1e-5)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      }, {
        "interval" : "[0,1e-6)",
        "recordsWithRiskWithinInterval" : 0.0,
        "recordsWithMaximalRiskWithinInterval" : 0.0
      } ]
    },
    "attributeRisk" : {
      "quasiIdentifierRiskList" : [ {
        "identifier" : [ "zipcode" ],
        "distinction" : 0.09090909090909091,
        "separation" : 0.0
      }, {
        "identifier" : [ "gender" ],
        "distinction" : 0.18181818181818182,
        "separation" : 0.5454545454545454
      } ]
    }
  }
}

Hierarchy Controller

The hierarchy controller provides a interface to access ARX hierarchy builder features. The controller receives a request object containing the dataset column to create the hierarchy for, the builder type and builder specific attributes. The controller returns a response object containing the resulting hierarchy.

Currently the following builders are supported:

  • Redaction based

  • Interval based

  • Order based

Create a redaction based hierarchy

This method builds hierarchies for categorical and non-categorical values using redaction. Dataset items are:

  1. aligned left-to-right or right-to-left,

  2. differences in length are filled with a padding character.

  3. Equally long values are redacted, character by character from left-to-right or right-to-left.

Request fields

Path Type Description

column

Array

List of values to create the hierarchy for

builder

Object

Object containing the different parameters on how to build the heirarchy for the dataset column

builder.type

String

Hierarchy builder type to use when creating the hierarchy

builder.paddingCharacter

String

Character to use when padding the values

builder.redactionCharacter

String

Character to use when redacting the values

builder.paddingOrder

String

Direction in which to pad the values in the column

builder.redactionOrder

String

Direction in which to redact symbols from the values in the column

HTTP request

POST /api/hierarchy HTTP/1.1
Content-Type: application/json
Content-Length: 260
Host: localhost:8080

{
  "column" : [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" ],
  "builder" : {
    "type" : "redactionBased",
    "paddingCharacter" : " ",
    "redactionCharacter" : "*",
    "paddingOrder" : "RIGHT_TO_LEFT",
    "redactionOrder" : "RIGHT_TO_LEFT"
  }
}

HTTP response

HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 162

{
  "hierarchy" : [ [ "0", "*" ], [ "1", "*" ], [ "2", "*" ], [ "3", "*" ], [ "4", "*" ], [ "5", "*" ], [ "6", "*" ], [ "7", "*" ], [ "8", "*" ], [ "9", "*" ] ]
}

Create a interval based hierarchy

This method builds hierarchies for non-categorical values by mapping them into given intervals.

Request fields

Path Type Description

column

Array

List of values to create the hierarchy for

builder

Object

Object containing the different parameters on how to build the heirarchy for the dataset column

builder.type

String

Hierarchy builder type to use when creating the hierarchy

builder.intervals

Array

List containing the different intervals to be generalized from and to

builder.intervals[].from

Number

Interval to generalize from

builder.intervals[].to

Number

Interval to generalize to

builder.intervals[].label

String

Optional label to replace the default generalized interval values

builder.levels

Array

List containing parameters on how to generalize the created intervals

builder.levels[].level

Number

Transformation level to create a generalization

builder.levels[].groups

Array

List containing parameters on how to group the generalized column new values

builder.levels[].groups[].grouping

Number

Number of items to be grouped from the new generalized column values

builder.levels[].groups[].label

Null

Optional label to replace the default generalized value

builder.lowerRange

Object

Object containing parameters on how to define the lower range interval

builder.lowerRange.snapFrom

Number

Value to snap from when a lower value than this defined value is discoverd

builder.lowerRange.bottomTopCodingFrom

Number

Value to start bottom coding from

builder.lowerRange.minMaxValue

Number

If a value is discovered which is smaller than this value an exception will be raised.

builder.upperRange

Object

Object containing parameters on how to define the upper range interval

builder.upperRange.snapFrom

Number

Value to snap from when a higher value than this defined value is discoverd

builder.upperRange.bottomTopCodingFrom

Number

Value to start top coding from

builder.upperRange.minMaxValue

Number

If a value is discovered which is larger than this value an exception will be raised.

builder.dataType

String

data type of the interval to generalize

HTTP request

POST /api/hierarchy HTTP/1.1
Content-Type: application/json
Content-Length: 832
Host: localhost:8080

{
  "column" : [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" ],
  "builder" : {
    "type" : "intervalBased",
    "intervals" : [ {
      "from" : 0,
      "to" : 2,
      "label" : "young"
    }, {
      "from" : 2,
      "to" : 4,
      "label" : "adult"
    }, {
      "from" : 4,
      "to" : 8,
      "label" : "old"
    }, {
      "from" : 8,
      "to" : 9223372036854775807,
      "label" : "very-old"
    } ],
    "levels" : [ {
      "level" : 0,
      "groups" : [ {
        "grouping" : 2,
        "label" : null
      } ]
    } ],
    "lowerRange" : {
      "snapFrom" : 0,
      "bottomTopCodingFrom" : 0,
      "minMaxValue" : -2305843009213693952
    },
    "upperRange" : {
      "snapFrom" : 81,
      "bottomTopCodingFrom" : 100,
      "minMaxValue" : 2305843009213693951
    },
    "dataType" : "LONG"
  }
}

HTTP response

HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 352

{
  "hierarchy" : [ [ "0", "young", "[0, 4[", "*" ], [ "1", "young", "[0, 4[", "*" ], [ "2", "adult", "[0, 4[", "*" ], [ "3", "adult", "[0, 4[", "*" ], [ "4", "old", "[4, 8[", "*" ], [ "5", "old", "[4, 8[", "*" ], [ "6", "old", "[4, 8[", "*" ], [ "7", "old", "[4, 8[", "*" ], [ "8", "very-old", "[8, 12[", "*" ], [ "9", "very-old", "[8, 12[", "*" ] ]
}

Create a order based hierarchy

This method builds hierarchies for categorical and non-categorical values by ordering the dataset items and merging them into groups with the defined sizes.

Request fields

Path Type Description

column

Array

List of values to create the hierarchy for

builder

Object

Object containing the different parameters on how to build the heirarchy for the dataset column

builder.type

String

Hierarchy builder type to use when creating the hierarchy

builder.levels

Array

List containing parameters on how to generalize the dataset column

builder.levels[].level

Number

Transformation level to create a generalization

builder.levels[].groups

Array

List containing parameters on how to group the dataset column

builder.levels[].groups[].grouping

Number

Number of items to be grouped from the dataset column values

builder.levels[].groups[].label

String

Optional label to replace the default generalized value

HTTP request

POST /api/hierarchy HTTP/1.1
Content-Type: application/json
Content-Length: 371
Host: localhost:8080

{
  "column" : [ "Oslo", "Bergen", "Stockholm", "London", "Paris" ],
  "builder" : {
    "type" : "orderBased",
    "levels" : [ {
      "level" : 0,
      "groups" : [ {
        "grouping" : 3,
        "label" : "nordic-city"
      } ]
    }, {
      "level" : 0,
      "groups" : [ {
        "grouping" : 2,
        "label" : "mid-european-city"
      } ]
    } ]
  }
}

HTTP response

HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 204

{
  "hierarchy" : [ [ "Oslo", "nordic-city", "*" ], [ "Bergen", "nordic-city", "*" ], [ "Stockholm", "nordic-city", "*" ], [ "London", "mid-european-city", "*" ], [ "Paris", "mid-european-city", "*" ] ]
}

Create a date based hierarchy

This method builds hierarchies for date values following Java SimpleDateFormat.

Request fields

Path Type Description

column

Array

List of values to create the hierarchy for

builder

Object

Object containing the different parameters on how to build the heirarchy for the dataset column

builder.type

String

Hierarchy builder type to use when creating the hierarchy

builder.granularities

Array

List of Date granularities to create the hierarchy after

builder.dateFormat

String

SimpleDateFormat string describing how the date values should be parsed

HTTP request

POST /api/hierarchy HTTP/1.1
Content-Type: application/json
Content-Length: 386
Host: localhost:8080

{
  "column" : [ "2020-07-16 15:28:024", "2019-07-16 16:38:025", "2019-07-16 17:48:025", "2019-07-16 18:48:025", "2019-06-16 19:48:025", "2019-06-16 20:48:025" ],
  "builder" : {
    "type" : "dateBased",
    "dateFormat" : "yyyy-MM-dd HH:mm:SSS",
    "granularities" : [ "SECOND_MINUTE_HOUR_DAY_MONTH_YEAR", "MINUTE_HOUR_DAY_MONTH_YEAR", "HOUR_DAY_MONTH_YEAR", "DAY_MONTH_YEAR" ]
  }
}

HTTP response

HTTP/1.1 200 OK
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Content-Type: application/json
Content-Length: 652

{
  "hierarchy" : [ [ "2020-07-16 15:28:024", "16.07.2020-15:28:00", "16.07.2020-15:28", "16.07.2020-15:00", "16.07.2020" ], [ "2019-07-16 16:38:025", "16.07.2019-16:38:00", "16.07.2019-16:38", "16.07.2019-16:00", "16.07.2019" ], [ "2019-07-16 17:48:025", "16.07.2019-17:48:00", "16.07.2019-17:48", "16.07.2019-17:00", "16.07.2019" ], [ "2019-07-16 18:48:025", "16.07.2019-18:48:00", "16.07.2019-18:48", "16.07.2019-18:00", "16.07.2019" ], [ "2019-06-16 19:48:025", "16.06.2019-19:48:00", "16.06.2019-19:48", "16.06.2019-19:00", "16.06.2019" ], [ "2019-06-16 20:48:025", "16.06.2019-20:48:00", "16.06.2019-20:48", "16.06.2019-20:00", "16.06.2019" ] ]
}