Products & Services
Via our Github, you can experiment the IAHLT open-source annotated content and decide if you would like to become IAHLT member to access our large Hebrew & Arabic datasets and models.
Our products in IAHLT Github - Click here
Services and Tools for Hebrew & Arabic
Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing nearly 200 treebanks in over 100 languages.
IAHLT public contribution
The UD Hebrew-IAHLTWiki treebank consists of 5,000 contemporary Hebrew sentences representing a variety of texts originating from Wikipedia entries:
https://github.com/UniversalDependencies/UD_Hebrew-IAHLTwiki
Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
IAHLT Automatic Annotations Demos
Automatic Hebrew NER demo
https://huggingface.co/spaces/iahlt/iahlt-span-marker-alephbert-small-nemo-mt-he
Automatic Arabic NER demo
https://huggingface.co/spaces/iahlt/iahlt-span-marker-xlm-roberta-base-ar
IAHLT Open Source Use
Name | Description | Use | URL |
---|---|---|---|
Sourcehut | Forge with hosting for git/mailing list/CI and more | Code hosting/continuous integration/mailing list/issue tracking | https://sourcehut.org |
NECKar | Wikidata entity extractor | Entity linking and NE preannotation | https://event.ifi.uni-heidelberg.de/?page_id=532 |
sacr | Coreference annotation tool | Coreference annotation | http://boberle.com/projects/sacr |
trankit | UD parser and NE recognizer | UD parsing and NE recognition | https://github.com/nlp-uoregon/trankit |
udpipe | Classical UD parser | Sentence segmentation (HE + AR) and lemmatization | https://github.com/ufal/udpipe |
arborator | Universal dependencies annotation tool | Annotation for UD | https://github.com/Arborator |
Doccano | Named entity annotation tool | Named entity annotation | https://github.com/doccano |
Grew | Graph-based corpus search tool | Corpus search and validation for lemmatization and UD | https://grew.fr |
Hebrew LLM Project
פרויקט משותף למודל שפה גנרטיבי גדול בעברית, פתוח, וחזק
האיגוד הישראלי לטכנולוגיות שפת אנוש
מרכז דיקטה, בשיתוף מפא"ת / התכנית הלאומית
אינטל