How To Get Probability Of Prediction Per Entity From Spacy Ner Model?

June 29, 2023 Post a Comment

I used this official example code to train a NER model from scratch using my own training samples. When I predict using this model on new text, I want to get the probability of pr

Solution 1:

Getting the probabilities of prediction per entity from a Spacy NER model is not trivial. Here is the solution adapted from here :

import spacy
from collections import defaultdict

texts = ['John works at Microsoft.']

# Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem.
beam_width = 16# This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective.
beam_density = 0.0001 
nlp = spacy.load('en_core_web_md')


docs = list(nlp.pipe(texts, disable=['ner']))
beams = nlp.entity.beam_parse(docs, beam_width=beam_width, beam_density=beam_density)

for doc, beam inzip(docs, beams):
    entity_scores = defaultdict(float)
    for score, ents in nlp.entity.moves.get_beam_parses(beam):
        for start, end, label in ents:
            entity_scores[(start, end, label)] += score

l= []
for k, v in entity_scores.items():
    l.append({'start': k[0], 'end': k[1], 'label': k[2], 'prob' : v} )

for a insorted(l, key= lambda x: x['start']):
    print(a)

### Output: ####

{'start': 0, 'end': 1, 'label': 'PERSON', 'prob': 0.4054479906820232}
{'start': 0, 'end': 1, 'label': 'ORG', 'prob': 0.01002015005487447}
{'start': 0, 'end': 1, 'label': 'PRODUCT', 'prob': 0.0008592912552754791}
{'start': 0, 'end': 1, 'label': 'WORK_OF_ART', 'prob': 0.0007666755792166002}
{'start': 0, 'end': 1, 'label': 'NORP', 'prob': 0.00034931990870877333}
{'start': 0, 'end': 1, 'label': 'TIME', 'prob': 0.0002786051849320804}
{'start': 3, 'end': 4, 'label': 'ORG', 'prob': 0.9990115861687987}
{'start': 3, 'end': 4, 'label': 'PRODUCT', 'prob': 0.0003378157477046507}
{'start': 3, 'end': 4, 'label': 'FAC', 'prob': 8.249734411749544e-05}

Solution 2:

Sorry I do not have any better answer - I can only confirm that the 'beam' solution does provide some 'probabilities' - though in my case I am getting way too many entities with prob=1.0, even in cases where I can only shake my head and blame it on too little training data.

I find it quite strange that Spacy reports an 'entity' without having any confidence attached to it. I would assume there is some threshold to decide WHEN Spacy reports an entity and when it does NOT (perhaps I missed it). In my case, I see confidences 0.6 reported as 'this is an entity' while entity with confidence 0.001 is NOT reported.

In my use-case, the confidence is essential. For a given text, Spacy (and for example Google ML) report multiple instances of 'MY_ENTITY'. My code has to decide which ones are to be 'trusted' and which ones are false positive. I have yet to see IF the 'probability' returned by the above code has any practical value.

Python College

How To Get Probability Of Prediction Per Entity From Spacy Ner Model?

Solution 1:

Solution 2:

Post a Comment for "How To Get Probability Of Prediction Per Entity From Spacy Ner Model?"