ckip_transformers.nlp.util module

This module implements the utilities for CKIP Transformers NLP drivers.

class ckip_transformers.nlp.util.CkipTokenClassification(model_name: str, tokenizer_name: Optional[str] = None, *, device: Union[int, device] = -1)[source]

Bases: object

The base class for token classification task.

Parameters:
  • model_name (str) – The pretrained model name (e.g. 'ckiplab/bert-base-chinese-ws').

  • tokenizer_name (str, optional, defaults to model_name) – The pretrained tokenizer name (e.g. 'bert-base-chinese').

  • device (int or torch.device, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

__call__(input_text: Union[List[str], List[List[str]]], *, use_delim: bool = False, delim_set: Optional[str] = ',,。::;;!!??', batch_size: int = 256, max_length: Optional[int] = None, show_progress: bool = True, pin_memory: bool = True)[source]

Call the driver.

Parameters:
  • input_text (List[str] or List[List[str]]) – The input sentences. Each sentence is a string or a list of string.

  • use_delim (bool, optional, defaults to False) – Segment sentence (internally) using delim_set.

  • delim_set (str, optional, defaults to ',,。::;;!!??') – Used for sentence segmentation if use_delim=True.

  • batch_size (int, optional, defaults to 256) – The size of mini-batch.

  • max_length (int, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e. tokenizer.model_max_length).

  • show_progress (bool, optional, defaults to True) – Show progress bar.

  • pin_memory (bool, optional, defaults to True) – Pin memory in order to accelerate the speed of data transfer to the GPU. This option is incompatible with multiprocessing. Disabled on CPU device.

class ckip_transformers.nlp.util.NerToken(word: str, ner: str, idx: Tuple[int, int])[source]

Bases: tuple

A named-entity recognition token.

property word

str, the token word.

property ner

str, the NER-tag.

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, word: str, ner: str, idx: Tuple[int, int])

Create new instance of NerToken(word, ner, idx)

__repr__()

Return a nicely formatted representation string

property idx

Tuple[int, int], the starting / ending index in the sentence.