ckip_transformers.nlp.driver module
This module implements the CKIP Transformers NLP drivers.
- class ckip_transformers.nlp.driver.CkipWordSegmenter(model: str = 'bert-base', **kwargs)[source]
Bases:
CkipTokenClassification
The word segmentation driver.
- Parameters
model (
str
optional, defaults to “bert-base”.) – The pretrained model name provided by CKIP Transformers.model_name (
str
optional, overwrites model) – The custom pretrained model name (e.g.'ckiplab/bert-base-chinese-ws'
).device (
int
, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.
- __call__(input_text: List[str], *, use_delim: bool = False, **kwargs) List[List[str]] [source]
Call the driver.
- Parameters
input_text (
List[str]
) – The input sentences. Each sentence is a string.use_delim (
bool
, optional, defaults to False) – Segment sentence (internally) usingdelim_set
.delim_set (str, optional, defaults to
',,。::;;!!??'
) – Used for sentence segmentation ifuse_delim=True
.batch_size (
int
, optional, defaults to 256) – The size of mini-batch.max_length (
int
, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e.tokenizer.model_max_length
).show_progress (
int
, optional, defaults to True) – Show progress bar.pin_memory (
bool
, optional, defaults to True) – Pin memory in order to accelerate the speed of data transfer to the GPU. This option is incompatible with multiprocessing.
- Returns
List[List[str]]
– A list of list of words (str
).
- class ckip_transformers.nlp.driver.CkipPosTagger(model: str = 'bert-base', **kwargs)[source]
Bases:
CkipTokenClassification
The part-of-speech tagging driver.
- Parameters
model (
str
optional, defaults to “bert-base”.) – The pretrained model name provided by CKIP Transformers.model_name (
str
optional, overwrites model) – The custom pretrained model name (e.g.'ckiplab/bert-base-chinese-pos'
).device (
int
, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.
- __call__(input_text: List[List[str]], *, use_delim: bool = True, **kwargs) List[List[str]] [source]
Call the driver.
- Parameters
input_text (
List[List[str]]
) – The input sentences. Each sentence is a list of strings (words).use_delim (
bool
, optional, defaults to True) – Segment sentence (internally) usingdelim_set
.delim_set (str, optional, defaults to
',,。::;;!!??'
) – Used for sentence segmentation ifuse_delim=True
.batch_size (
int
, optional, defaults to 256) – The size of mini-batch.max_length (
int
, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e.tokenizer.model_max_length
).show_progress (
int
, optional, defaults to True) – Show progress bar.pin_memory (
bool
, optional, defaults to True) – Pin memory in order to accelerate the speed of data transfer to the GPU. This option is incompatible with multiprocessing.
- Returns
List[List[str]]
– A list of list of POS tags (str
).
- class ckip_transformers.nlp.driver.CkipNerChunker(model: str = 'bert-base', **kwargs)[source]
Bases:
CkipTokenClassification
The named-entity recognition driver.
- Parameters
model (
str
optional, defaults to “bert-base”.) – The pretrained model name provided by CKIP Transformers.model_name (
str
optional, overwrites model) – The custom pretrained model name (e.g.'ckiplab/bert-base-chinese-ner'
).device (
int
, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.
- __call__(input_text: List[str], *, use_delim: bool = False, **kwargs) List[List[NerToken]] [source]
Call the driver.
- Parameters
input_text (
List[str]
) – The input sentences. Each sentence is a string.use_delim (
bool
, optional, defaults to False) – Segment sentence (internally) usingdelim_set
.delim_set (str, optional, defaults to
',,。::;;!!??'
) – Used for sentence segmentation ifuse_delim=True
.batch_size (
int
, optional, defaults to 256) – The size of mini-batch.max_length (
int
, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e.tokenizer.model_max_length
).show_progress (
int
, optional, defaults to True) – Show progress bar.pin_memory (
bool
, optional, defaults to True) – Pin memory in order to accelerate the speed of data transfer to the GPU. This option is incompatible with multiprocessing.
- Returns
List[List[NerToken]]
– A list of list of entities (NerToken
).