Using the client¶
The django-crosswalk client lets you interact with your crosswalk database much like you would any standard library, albeit through an API.
Generally, we do not recommend interacting with django-crosswalk’s API directly. Instead, use the methods built into the client, which have more verbose validation and error messages and are well tested.
Install¶
The client is maintained as a separate package, which you can install via pip.
$ pip install django-crosswalk-client
Client configuration¶
Creating a client instance¶
Create a client instance by passing your API token and the URL to the root of your hosted django-crosswalk API.
from crosswalk_client import Client
# Your API token, created in Django admin
token = "<TOKEN>"
# Address of django-crosswalk's API
service = "https://mysite.com/crosswalk/api/"
client = Client(token, service)
You can also instantiate a client with defaults.
client = Client(
token,
service,
domain=None, # default
scorer="fuzzywuzzy.default_process", # default
threshold=80, # default
)
Set the default domain¶
In order to query, create or edit entities, you must specify a domain. You can set a default anytime:
# Using domain instance
client.set_domain(states)
# ... or a domain's slug
client.set_domain("states")
Set the default scorer¶
The string module path to a scorer function in crosswalk.scorers
.
client.set_scorer("fuzzywuzzy.token_sort_ratio_process")
Set the default threshold¶
The default threshold is used when creating entities based on a match score. For all scorers, the match score should be an integer between 0 - 100.
client.set_threshold(90)
Client domain methods¶
Create a domain¶
states = client.create_domain("U.S. states")
states.name == "U.S. states"
states.slug == "u-s-states" # Name of domain is always slugified!
# Create with a parent domain instance
client.create_domain("counties", parent=states)
# ... or a parent domain's slug
client.create_domain("cities", parent="u-s-states")
Get a domain¶
# Use a domain's slug
states = client.get_domain("u-s-states")
states.name == "U.S. states"
Get all domains¶
states = client.get_domains()[0]
states.slug == "u-s-states"
# Filter domains by a parent domain instance
client.get_domains(parent=states)
# ... or parent domain's slug
client.get_domains(parent="u-s-states")
Update a domain¶
# Using the domain's slug
states = client.update_domain("u-s-states", {"parent": "countries"})
# ... or the domain instance
client.update_domain(states, {"parent": "country"})
Delete a domain¶
# Using domain's slug
client.delete_domain('u-s-states')
# ... or the domain instance
client.delete_domain(states)
Client entity methods¶
Create entities¶
Create a single entity as a shallow dictionary.
entities = client.create({"name": "Kansas", "postal_code": "KS"}, domain=states)
Create a list of shallow dictionaries for each entity you’d like to create. This method uses Django’s bulk_create
method.
import us
state_entities = [
{
"name": state.name,
"fips": state.fips,
"postal_code": state.abbr,
} for state in us.states.STATES
]
entities = client.bulk_create(state_entities, domain=states)
Note
Django-crosswalk will create UUIDs for any new entities, which are automatically serialized and deserialized by the client.
You can also create entities with your own UUIDs. For example:
from uuid import uuid4()
uuid = uuid4()
entities = [
{
"uuid": uuid,
"name": "some entity",
}
]
entity = client.bulk_create(entities)[0]
entity.uuid == uuid
# True
Warning
You can’t re-run a bulk create. If your script needs the equivalent of get_or_create
or update_or_create
, use the match
or match_or_create
methods and then update if needed it using the built-in entity update
method.
Get entities in a domain¶
entities = client.get_entities(domain=states)
entities[0].name
# Alabama
Pass a dictionary of block attributes to filter entities in the domain.
entities = client.get_entities(
domain=states,
block_attrs={"postal_code": "KS"}
)
entities[0].name
# Kansas
Find an entity¶
Pass a query dictionary to find an entity that exactly matches.
client.match({"name": "Missouri"}, domain=states)
# Pass block attributes to filter possible matches
client.match(
{"name": "Texas"},
block_attrs={"postal_code": "TX"},
domain=states
)
You can also fuzzy match on your query dictionary and return the entity that best matches.
entity = client.best_match({"name": "Kalifornia"}, domain=states)
# Pass block attributes to filter possible matches
entity = client.best_match(
{"name": "Kalifornia"},
block_attrs={"postal_code": "CA"},
domain=states
)
entity.name == "California"
Note
If the match for your query is an alias of another entity, this method will return the canonical entity with entity.aliased = True
. To ignore aliased entities, set return_canonical=False
and the method will return the best match for your query, regardless of whether it is an alias for another entity.
client.best_match(
{"name": "Misouri"},
return_canonical=False
)
Find a match or create a new entity¶
You can create a new entity if an exact match isn’t found.
entity = client.match_or_create({"name": "Narnia"})
entity.created
# True
Or use a fuzzy matcher to find the best match. If one isn’t found above a match threshold returned by your scorer, create a new entity.
entity = client.best_match_or_create({"name": "Narnia"})
entity.created
# True
# Set a custom threshold for the match scorer instead of using the default
entity = client.best_match_or_create(
{"name": "Narnia"},
threshold=80,
)
Note
If the best match for your query is an alias of another entity and is above your match threshold, this method will return the canonical entity with entity.aliased = True
. To ignore aliased entities, set return_canonical=False
.
client.best_match_or_create(
{"name": "Misouri"},
return_canonical=False,
)
Pass a dictionary of block attributes to filter match candidates.
entity = client.match_or_create(
{"name": "Narnia"},
block_attrs={"postal_code": "NA"},
)
entity = client.best_match_or_create(
{"name": "Narnia"},
block_attrs={"postal_code": "NA"},
)
If a sufficient match is not found, you can pass a dictionary of attributes to create your entity with. These will be combined with your query when creating a new entity.
import uuid
id = uuid.uuid4()
entity = client.match_or_create(
{"name": "Xanadu"},
create_attrs={"uuid": id},
)
entity = client.best_match_or_create(
{"name": "Xanadu"},
create_attrs={"uuid": id},
)
entity.name
# Xanadu
entity.uuid == id
# True
entity.created
# True
Create an alias or create a new entity¶
Create an alias if an entity above a certain match score threshold is found or create a new entity. Method returns the aliased entity.
client.set_domain('states')
entity = client.alias_or_create({"name": "Kalifornia"}, threshold=85)
entity.name
# California
entity.aliased
# True
entity = client.alias_or_create(
{"name": "Alderaan"},
create_attrs={"galaxy": "Far, far away"}
threshold=90
)
entity.name
# Alderaan
entity.aliased
# False
Note
If the best match for your query is an alias of another entity, this method will return the canonical entity with entity.aliased = True
. To ignore aliased entities, set return_canonical=False
and the method will return the best match for your query, regardless of whether it is an alias for another entity.
client.alias_or_create(
{"name": "Missouri"},
return_canonical=False
)
Update an entity by ID¶
entity = client.best_match({"name": "Kansas"})
entity = client.update_by_id(
entity.uuid,
{"capital": "Topeka"}
)
entity.capital
# Topeka
Update a matched entity¶
entity = client.update_match(
{"name": "Missouri"},
update_attrs={"capital": "Jefferson City"},
domain=states
)
entity.capital
# Jefferson City
entity = client.update_match(
{"name": "Texas", "postal_code": "TX"},
update_attrs={"capital": "Austin"},
domain=states
)
entity.capital
# Jefferson City
Note
If your block attributes return more than one matched entity to be updated, an UnspecificQueryError
will be raised and no entities will be updated.
Delete an entity by ID¶
entity = client.match({"name": "New York"})
deleted = client.delete_by_id(entity.uuid)
deleted
# True
Delete a matched entity¶
deleted = client.delete_match({"name": "Xanadu"})
deleted
# True
deleted = client.delete_match({"name": "Narnia", "postal_code": "NA"})
deleted
# True
Note
If your block attributes return more than one matched entity to be deleted, an UnspecificQueryError
will be raised and no entities will be deleted.
Domain object methods¶
Update a domain¶
domain = client.get_domain('u-s-states')
domain.update({"parent": "countries"})
Set a parent domain¶
parent_domain = client.get_domain('countries')
domain = client.get_domain('u-s-states')
domain.set_parent(parent_domain)
Remove a parent domain¶
domain = client.get_domain('u-s-states')
domain.remove_parent()
domain.parent
# None
Delete a domain¶
domain = client.get_domain('u-s-states')
domain.delete()
domain.deleted
# True
Get domain’s entities¶
domain = client.get_domain('u-s-states')
# Get all states
domain.get_entities()
# Filter entities using block attributes
entities = domain.get_entities({"postal_code": "KS"})
entities[0].name == "Kansas"
Entity object methods¶
Access an entity’s attributes¶
entity = client.match({"name": "Texas"})
# See what user-defined attributes are set
entity.attrs() == ["fips", "name", "postal_code", "uuid"]
# Access a specific attribute
entity.attrs("postal_code") == "TX"
entity.postal_code == "TX"
# Raise AttributeError if undefined
entity.attrs("undefined_attr")
entity.undefined_attr
Update an entity¶
entity = client.best_match({"name": "Texas"})
entity.update({"capitol": "Austin"})
Alias entities¶
entity = client.best_match({"name": "Missouri"})
alias = client.best_match({"name": "Show me state"})
alias.set_alias_for(entity)
alias.alias_for == entity.uuid
# True
Remove an alias¶
alias = client.best_match({"name": "Show me state"})
alias.remove_alias_for()
alias.alias_for
# None
Set a superseding entity¶
superseded = client.best_match({"name": "George W. Bush"}, domain="politicians")
entity = client.best_match({"name": "George W. Bush"}, domain="presidents")
superseded.set_superseded_by(entity)
superseded.superseded_by == entity.uuid
# True
Remove a superseding entity¶
superseded = client.best_match({"name": "George W. Bush"}, domain="politicians")
superseded.remove_superseded_by()
superseded.superseded_by
# None
Delete an entity¶
entity = client.best_match({"name": "Texas"})
entity.delete()
entity.deleted
# True