ckip_transformers.nlp.driver module

This module implements the CKIP Transformers NLP drivers.

class ckip_transformers.nlp.driver.CkipWordSegmenter(model_name: Optional[str] = 'ckiplab/bert-base-chinese-ws', tokenizer_name: Optional[str] = None)[source]

Bases: ckip_transformers.nlp.util.CkipTokenClassification

The word segmentation driver.

Parameters
  • model_name (str, optional, defaults to 'ckiplab/bert-base-chinese-ws') – The pretrained model name.

  • tokenizer_name (str, optional, defaults to model_name) – The pretrained tokenizer name.

__call__(input_text: List[str], *, max_length: Optional[int] = None) → List[List[str]][source]

Call the driver.

Parameters
  • input_text (List[str]) – The input sentences. Each sentence is a string.

  • max_length (int, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e. tokenizer.model_max_length).

Returns

List[List[NerToken]] – A list of list of words (str).

class ckip_transformers.nlp.driver.CkipPosTagger(model_name: Optional[str] = 'ckiplab/bert-base-chinese-pos', tokenizer_name: Optional[str] = None)[source]

Bases: ckip_transformers.nlp.util.CkipTokenClassification

The part-of-speech tagging driver.

Parameters
  • model_name (str, optional, defaults to 'ckiplab/bert-base-chinese-pos') – The pretrained model name.

  • tokenizer_name (str, optional, defaults to model_name) – The pretrained tokenizer name.

__call__(input_text: List[List[str]], *, max_length: Optional[int] = None) → List[List[str]][source]

Call the driver.

Parameters
  • input_text (List[List[str]]) – The input sentences. Each sentence is a list of strings (words).

  • max_length (int, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e. tokenizer.model_max_length).

Returns

List[List[NerToken]] – A list of list of POS tags (str).

class ckip_transformers.nlp.driver.CkipNerChunker(model_name: Optional[str] = 'ckiplab/bert-base-chinese-ner', tokenizer_name: Optional[str] = None)[source]

Bases: ckip_transformers.nlp.util.CkipTokenClassification

The named-entity recognition driver.

Parameters
  • model_name (str, optional, defaults to 'ckiplab/bert-base-chinese-ner') – The pretrained model name.

  • tokenizer_name (str, optional, defaults to model_name) – The pretrained tokenizer name.

__call__(input_text: List[str], *, max_length: Optional[int] = None) → List[List[ckip_transformers.nlp.util.NerToken]][source]

Call the driver.

Parameters
  • input_text (List[str]) – The input sentences. Each sentence is a string.

  • max_length (int, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e. tokenizer.model_max_length).

Returns

List[List[NerToken]] – A list of list of entities (NerToken).