ckip_transformers.nlp.util module¶
This module implements the utilities for CKIP Transformers NLP drivers.
-
class
ckip_transformers.nlp.util.CkipTokenClassification(model_name: str, tokenizer_name: Optional[str] = None, *, device: int = - 1)[source]¶ Bases:
objectThe base class for token classification task.
- Parameters
model_name (
str) – The pretrained model name (e.g.'ckiplab/bert-base-chinese-ws').tokenizer_name (
str, optional, defaults to model_name) – The pretrained tokenizer name (e.g.'bert-base-chinese').device (
int, optional, defaults to -1,) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.
-
__call__(input_text: Union[List[str], List[List[str]]], *, use_delim: bool = False, delim_set: Optional[str] = ',,。::;;!!??', batch_size: int = 256, max_length: Optional[int] = None, show_progress: bool = True)[source]¶ Call the driver.
- Parameters
input_text (
List[str]orList[List[str]]) – The input sentences. Each sentence is a string or a list of string.use_delim (
bool, optional, defaults to False) – Segment sentence (internally) usingdelim_set.delim_set (str, optional, defaults to
',,。::;;!!??') – Used for sentence segmentation ifuse_delim=True.batch_size (
int, optional, defaults to 256) – The size of mini-batch.max_length (
int, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e.tokenizer.model_max_length).show_progress (
int, optional, defaults to True) – Show progress bar.
-
class
ckip_transformers.nlp.util.NerToken(word: str, ner: str, idx: Tuple[int, int])[source]¶ Bases:
tupleA named-entity recognition token.
-
property
word¶ str, the token word.
-
property
ner¶ str, the NER-tag.
-
property
idx¶ Tuple[int, int], the starting / ending index in the sentence.
-
__getnewargs__()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__(_cls, word: str, ner: str, idx: Tuple[int, int])¶ Create new instance of NerToken(word, ner, idx)
-
__repr__()¶ Return a nicely formatted representation string
-
property