Products

Studio

Studio Web

StudioX

Automation Ops

Document Understanding

Explore

Professional Services

Products

Document Understanding

Explore

Professional Services

Apache Open NLP Activities

Create your first automation in just a few minutes.Try Studio Web →

Apache Open NLP Activities

by YouTube

Activity

<100

Summary

Natural Language Processing Toolkit from Apache Open NLP platform

Overview

Activities:

Detect Language
Detect Sentences
Named Entity Recognition
POS Tagger
Tokenize Sentences

NOTE: The activities work only for English language models.

Detect Language

The OpenNLP Language Detector classifies a document in ISO-639-3 languages according to the model capabilities.

Input Parameters:

Input Sentence - String
OpenNLP_ModelsPath - Path of the OpenNLP Models folder

Output Parameters:

Confidence - Confidence of language extraction
Language - Detected language

Detect Sentences

The OpenNLP Sentence Detector can detect if a punctuation character marks the end of a sentence or not. In this sense a sentence is defined as the longest white space trimmed character sequence between two punctuation marks. The first and last sentence make an exception to this rule. The first non whitespace character is assumed to be the beginning of a sentence and the last non whitespace character is assumed to be a sentence end.

Input Parameters:

InputParagraph - String
OpenNLP_ModelsPath - Path of the OpenNLP Models folder

Output Parameter:

Sentences - Array of sentences found from the InputParagraph

Named Entity Recognition

The Name Finder can detect named entities and numbers in text. To be able to detect entities the Name Finder needs a model. The model is dependent on the language and entity type it was trained for. The OpenNLP projects offers a number of pre-trained name finder models which are trained on various freely available corpora. They can be downloaded at our model download page. To find names in raw text the text must be segmented into tokens and sentences. A detailed description is given in the sentence detector and tokenizer tutorial. It is important that the tokenization for the training data and the input text is identical.

Input Parameters:

SelectNER - Select one of the NER models from the dropdown
SentenceTokens - Tokenized sentence
OpenNLP_ModelsPath - Path of the OpenNLP Models folder

Output Parameters:

NamesList - List of entities found in the SentenceTokens
Rawoutput - Raw op of the NER

POS Tagger

The Part of Speech Tagger marks tokens with their corresponding word type based on the token itself and the context of the token. A token might have multiple pos tags depending on the token and the context. The OpenNLP POS Tagger uses a probability model to predict the correct pos tag out of the tag set. To limit the possible tags for a token a tag dictionary can be used which increases the tagging and runtime performance of the tagger.

Input Parameters:

InputSentence - Input sentence
OpenNLP_ModelsPath - Path of the OpenNLP Models folder

Output Parameter:

POSTagString - POS Tagged string

Tokenize Sentences

The OpenNLP Tokenizers segment an input character sequence into tokens. Tokens are usually words, punctuation, numbers, etc.

OpenNLP offers multiple tokenizer implementations:

Whitespace Tokenizer - A whitespace tokenizer, non whitespace sequences are identified as tokens
Simple Tokenizer - A character class tokenizer, sequences of the same character class are tokens
Learnable Tokenizer - A maximum entropy tokenizer, detects token boundaries based on probability model

Most part-of-speech taggers, parsers and so on, work with text tokenized in this manner. It is important to ensure that your tokenizer produces tokens of the type expected by your later text processing components.

With OpenNLP (as with many systems), tokenization is a two-stage process: first, sentence boundaries are identified, then tokens within each sentence are identified.

Input Parameters:

InputSentence - Input sentence
OpenNLP_ModelsPath - Path of the OpenNLP Models folder

Output Parameter:

Tokens - Tokens Array

Features

Out of the box NLP Activities which don't need any server to be setup like in Stanford NLP. This also has a lot of options for customization and training custom models as compared to Stanford NLP.

Additional Information

Dependencies

NOTE: No dependency is required to be installed in UiPath Studio. The dependencies mentioned below are just for information purpose: OpenNLP nuget pakage: Install-Package OpenNLP -Version 1.3.5 System.Runtime.Caching nuget package :Install-Package System.Runtime.Caching

Code Language

Visual Basic

Runtime

Windows Legacy (.Net Framework 4.6.1)

Publisher

YouTube

Visit publisher's page

Trusted Source

License & Privacy

License Agreement

Privacy Terms

Technical

Version

1.0.0

Updated

February 18, 2020

Works with

All previous and current versions of UiPath Studio

Certification

Silver Certified

Similar Listings

Activity

R.Activities

by YouTube

<100

A package of custom activities that provides seamless integration of UiPath with R. Use R Scope and Run R Script activities to invoke your prediction and machine-learning models direct from UiPath.

Free

Activity

Voice enabled RPA - AI

by YouTube

1.2k

Google TextToSpeech and SpeechToText activities that allows you to give voice to your robots

Free

Activity

Indico Pre-Trained Machine Learning Activities

by YouTube

<100

This package contains a set of activities that allow you to use Indico's machine learning APIs directly in your workflows.