frex.utils package
Submodules
frex.utils.class_generator module
- class frex.utils.class_generator.ClassGenerator(*, onto_file: str, save_dir: pathlib.Path)[source]
Bases:
object
The ClassGenerator utility is used to generate python dataclasses based on an ontology’s data models. These generated dataclasses should be suitable for use with the DomainKgQueryService, as its basic query implementation relies on some properties that are automatically included in classes generated by this tool.
Dataclasses generated by this utility have some type hints, but the type hints are not extremely detailed. In particular, in cases where properties are known to point to a certain data type that is another domain object, this utility will simply add a type hint of “URIRef”. This implementation is partially based on the fact that some restrictions on property ranges in owl are difficult to parse in a meaningful way, and partially based on the fact that we can’t necessarily guarantee that a user would want to fully parse through the URIs that a given property points to. Trying to re-query and convert results to objects might also result in cycles if URIs point to each other for certain properties, to simply stopping at a point that a property refers to a URI simplifies the process.
- add_restriction(*, p: owlready2.class_construct.Restriction, properties: List)[source]
Add properties based on owl restrictions. Restrictions should correspond to class restrictions in owl, such as requiring some property be filled for a class to be valid.
- Parameters
p – The target class restriction to parse into a property for code generation
properties – The ongoing list of properties for the current class that is being updated
- convert_to_py_class(c: owlready2.entity.ThingClass) Tuple[str, List[str]] [source]
Produce a string to generate a python dataclass corresponding to the input owl class.
- Parameters
c – The target owl class to generate code for
- Returns
A tuple, containing the string that will be output to a file for the class and a list of
superclasses that the generated class will inherit from. The superclasses are necessary to ensure that import ordering is correct and circular import errors aren’t caused down the line.
- get_inner_restrictions(*, p: <module 'owlready2.class_construct' from 'c:\\users\\sola\\pycharmprojects\\frex_code\\venv\\lib\\site-packages\\owlready2\\class_construct.py'>, properties: List)[source]
Parse restrictions that are nested within a class construct. This should be getting called when a restriction that is a logical construct (AND and OR types) ocurs.
- Parameters
p – The target class construct to parse into a property for code generation
properties – The ongoing list of properties for the current class that is being updated
- get_property_names_and_types(c: owlready2.entity.ThingClass) List[Tuple[str, Any, str]] [source]
For the target owl class, extract the property names and types that the class should have. These properties are based on owl restrictions that define the class.
- Parameters
c – The target class to extract property names for
- Returns
A list of tuples, ordered as (prop_name, prop_type, prop_iri).
- get_superclass_names(c: owlready2.entity.ThingClass) List[str] [source]
Identify all superclasses of a given owl class, and if those classes are present in the main ontology, return a list their names.
- Parameters
c – The target owl class to get superclasses for
- Returns
a list of class names that are valid superclasses of the target class
- populate_template(*, name: str, superclasses: List[str], properties: List[Tuple[str, Any, str]]) str [source]
Populate a template for producing generated python dataclasses. The current template is based on implementations in python 3.8 - in future versions, some minor details (like keyword-only dataclasses) might be introduced, which may call for change. For the moment, templates are populated to assume that none of the dataclass’s properties have default values, and instead we will assume that the querying service will properly handle adding default values in cases where the appropriate properties weren’t returned as part of a SPARQL query.
- Parameters
name – The name of the class to be generating
superclasses – A list of superclass names that the generated class should inherit from. All superclasses
in this list are expected to also be generated by this same code generation script. :param properties: A list of tuples, ordered as (prop_name, prop_type, prop_iri) corresponding to the properties that this dataclass should include. :return: A string, corresponding to the content of the new python dataclass that will be written to a file
frex.utils.common module
frex.utils.constraint_solver module
- class frex.utils.constraint_solver.ConstraintSolver(*, scaling: int = 1)[source]
Bases:
object
A class to perform constraint solving to produce a final solution of items using constraints on the overall set of items.
- add_item_selection_constraint(*, item_a_uri: rdflib.term.URIRef, item_b_uri: rdflib.term.URIRef, constraint_type: frex.models.constraints.constraint_type.ConstraintType)[source]
Require that candidates chosen in the final solution have some relationship based on the constraint, e.g., EQ to ensure either both item_a and item_b are selected/not selected, or LEQ to ensure that if item_a is selected then item_b must also be selected. :param item_a_uri: The domain object’s URI of the first item :param item_b_uri: The domain object’s URI of the second item :param constraint_type: The type of constraint to apply for how the final items are selected :return:
- add_overall_count_constraint(*, min_count: Optional[int] = None, max_count: Optional[int] = None, exact_count: Optional[int] = None)[source]
Set constraints on the total number of items chosen for the solution. This function will check for an exact count first, and if it exists it will only create a constraint for making sure the number of items assigned to the target section is equal to that quantity. Otherwise, both a min and max count of items assigned to a section can be specified.
- Parameters
min_count – The minimum number of items to assign to the target section
max_count – The maximum number of items to assign to the target section
exact_count – An exact number of items to assign to the target section
- Returns
- add_overall_item_constraint(*, attribute_name: str, constraint_type: frex.models.constraints.constraint_type.ConstraintType, constraint_value: int) frex.utils.constraint_solver.ConstraintSolver [source]
Add a constraint to be applied to the entire solution. E.g., a constraint on the cost of all items chosen across all sections of the solution.
- Parameters
attribute_name – The domain_object’s attribute to apply the constraint to
constraint_type – The type of constraint - i.e. ==, <=, or >=
constraint_value – The value to constraint the solution to
- Returns
self, with a new Constraint added to the overall_constraints list
- add_required_item_selection(*, target_uri: rdflib.term.URIRef)[source]
Require that the final solution selects a candidate whose domain object has the target URI.
- Parameters
target_uri – the URI of the item that must be included in the final solution
- Returns
- set_candidates(*, candidates: Tuple[frex.models.candidate.Candidate, ...])[source]
Set the candidates that will be used to produce the solution. Candidates are expected to be produced as the output of some pipeline, which handles scoring. The solver will not handle any sort of scoring for the candidates, but rather it will produce an optimized solution based on the Candidates’ total_score (which should be computed by a pipeline) and other constraints.
- Parameters
candidates – A tuple of Candidate objects, with corresponding domain_objects and scores
- Returns
self, with an updated list of candidates
- set_section_set_constraints(*, section_sets: Tuple[frex.models.constraints.section_set_constraint.SectionSetConstraint, ...])[source]
Set all the SectionSetConstraints that need to be solved to produce a valid solution.
- Parameters
section_sets – A tuple of SectionSetConstraints that will be applied to the solution
- Returns
- solve(*, output_uri: rdflib.term.URIRef) Optional[frex.models.constraints.constraint_solution.ConstraintSolution] [source]
Perform integer programming to solve constraints and maximize an objective function based on the total scores applied to candidates. This function expects candidates that are the result of some recommendation pipeline (i.e., candidates have scores, and problematic candidates have already been filtered out).
This will produce outputs assigning candidates to ‘sections’. A section can be thought of as e.g. a day in a meal plan, or a semester in a student’s plan-of-study.
Currently assumes that (1) the objective function is always to maximize the total score of the final output, (2) each candidate can only be a part of one section, (3) each section must have an exact number of candidates assigned to it, and (4) the order of sections does not matter.
- Output_uri
The URI to attach to the output constraint solution
- Returns
frex.utils.vector_similarity_utils module
- class frex.utils.vector_similarity_utils.VectorSimilarityUtils[source]
Bases:
object
- static cosine_sim(*, comparison_vector: numpy.array, comparison_matrix: numpy.array) numpy.array [source]
Return the cosine similarity between a given vector and the rows of a matrix.
- Parameters
comparison_vector – The vector to serve as the source of comparison
comparison_matrix – A matrix containing rows with which the comparison_vector will be compared
- Returns
An array of cosine similarities between the comparison_vector and each row of the comparison_matrix
- static get_item_vector_similarity(*, target_item: Any, target_vector: numpy.array, comparison_items: List[Any], comparison_contents: List[numpy.array]) List[Tuple[Any, float]] [source]
Convert a tuple of comparison_items and their corresponding vectors into a matrix and return a list of items and scores. The shape of item content vectors is expected to be (1, N) for each item. The target item and its vector should not be contained in comparison_items or comparison_contents.
- Parameters
target_item – The item to get similarities for. currently unused.
target_vector – A vector representing the target_item.
comparison_items – A list of other items to compare the target_item with.
comparison_contents – A list of vectors that represent each item in comparison_items
- Returns
A list of tuples (x, y) where x is an item and y is the similarity of that item and the target_item
- static get_top_n_candidates(*, candidate_score_dict: List[Tuple[Any, float]], top_n: int) List[Tuple[Any, float]] [source]
Get the top N candidates out of a list of tuples, where the second index of the tuple is the item’s score. This score should typically be something like a similarity score, e.g. what comes out of the get_item_vector_similarity function.
- Parameters
candidate_score_dict – A list of tuples (x, y) where x is an item and y is some score for that item
top_n – The number of items to return
- Returns
A list of the top N items from candidate_score_dict in descending order
- static jaccard_sim(*, comparison_vector: numpy.array, comparison_matrix: numpy.array) numpy.array [source]
Return the jaccard similarity between a given vector and the rows of a matrix.
- Parameters
comparison_vector – The vector to serve as the source of comparison
comparison_matrix – A matrix containing rows with which the comparison_vector will be compared
- Returns
An array of jaccard similarities between the comparison_vector and each row of the comparison_matrix