Java Read Xml File From Inside Jar Memory Usage

RMLMapper

Maven Central

The RMLMapper execute RML rules to generate Linked Data. Information technology is a Java library, which is available via the command line (API docs online). The RMLMapper loads all data in memory, and so be aware when working with big datasets.

Table of contents

  • Features
    • Supported
    • Future
  • Releases
  • Build
  • Usage
    • CLI
    • Library
    • Docker
    • Including functions
    • Generating metadata
  • Testing
    • RDBs
  • Deploy on Central Repository
  • Dependencies
  • Commercial Back up
  • Remarks
    • XML file parsing functioning
    • Language tag support
    • Indistinguishable removal and serialization format
  • Documentation
    • UML Diagrams

Features

Supported

  • local data sources:
    • Excel (.xlsx)
    • LibreOffice (.ods)
    • CSV files (including CSVW)
    • JSON files (JSONPath)
    • XML files (XPath)
  • remote information sources:
    • relational databases (MySQL, PostgreSQL, Oracle, and SQLServer)
    • Web APIs with W3C Web of Things
    • SPARQL endpoints
    • files via HTTP urls (via Become)
      • CSV files
      • JSON files (JSONPath (@ can be used to select the current object.))
      • XML files (XPath)
  • functions (nigh cases)
    • For examples on how to utilize functions inside RML mapping documents, you can have a look at the RML+FnO test cases
  • configuration file
  • metadata generation
  • output formats: nquads (default), turtle, trig, trix, jsonld, hdt
  • join weather
  • targets:
    • local file
    • VoID dataset
    • SPARQL endpoint with SPARQL UPDATE

Hereafter

  • functions (all cases)
  • conditions (all cases)
  • data sources:
    • NoSQL databases
    • TPF servers

Releases

The standalone jar file for every release tin be plant on the release'south folio on GitHub. You can detect the latest release here.

Build

The RMLMapper is build using Maven: mvn install. A standalone jar can be found in /target.

Two jars are found in /target: a slim jar without bundled dependencies, and a standalone jar (suffixed with -all.jar) with all dependencies bundled.

Usage

CLI

The following options are most mutual.

  • -m, --mapping <arg>: ane or more mapping file paths and/or strings (multiple values are concatenated).
  • -o, --output <arg>: path to output file
  • -s,--serialization <arg>: serialization format (nquads (default), trig, trix, jsonld, hdt)

All options can be plant when executing coffee -jar rmlmapper.jar --help, that output is constitute below.

              usage: coffee -jar mapper.jar <options> options:  -c,--configfile <arg>               path to configuration file  -d,--duplicates                     remove duplicates in the output  -dsn,--r2rml-jdbcDSN <arg>          DSN of the database when using R2RML                                      rules  -e,--metadatafile <arg>             path to output metadata file  -f,--functionfile <arg>             one or more function file paths (dynamic                                      functions with relative paths are establish                                      relative to the cwd)  -h,--assist                           show help info  -fifty,--metadataDetailLevel <arg>      generate metadata on given particular level                                      (dataset - triple - term)  -thou,--mappingfile <arg>              one or more than mapping file paths and/or                                      strings (multiple values are                                      concatenated). r2rml is converted to rml                                      if needed using the r2rml arguments.  -psd,--privatesecuritydata <arg>    i or more private security files                                       containing all private security                                       information such as usernames, passwords,                                       certificates, etc.  -o,--outputfile <arg>               path to output file (default: stdout)  -p,--r2rml-password <arg>           password of the database when using                                      R2RML rules  -due south,--serialization <arg>            serialization format (nquads (default),                                      turtle, trig, trix, jsonld, hdt)  -t,--triplesmaps <arg>              IRIs of the triplesmaps that should be                                      executed in order, separate past ',' (default                                      is all triplesmaps)  -u,--r2rml-username <arg>           username of the database when using                                      R2RML rules  -five,--verbose                        show more details in debugging output  --strict                            Enable strict fashion. In strict manner, the                                       mapper volition neglect on invalid IRIs instead                                       of skipping them.  -b --base-IRI <arg>                 base of operations IRI used to expand relative IRIs in                                       mapped terms. If not set and not in --strict                                       mode, will default to the @base directive                                       inside the provided mapping file.                          

Accessing Web APIs with authentication

The W3C Spider web of Things Security Ontology is used to draw how Web APIs hallmark should be performed but does not include the necessary credentials to access the Spider web API. These credentials can exist supplied using the -psd <PATH> CLI statement. The PATH argument must point to one or more than private security files which incorporate the necessary credentials to admission the Web API.

An example can be plant in the test cases src/examination/resources/web-of-things.

Accessing Oracle Database

Y'all need to add the Oracle JDBC commuter manually to the class path if you desire to access an Oracle Database. The required driver is ojdbc8.

  • Download ojdbc8.jar from Oracle.
  • Execute the RMLMapper via
              java -cp 'rmlmapper.jar:ojdbc8-12.ii.0.1.jar' be.ugent.rml.cli.Main -m rules.rml.ttl                          

The options do the post-obit:

  • -cp 'rmlmapper.jar:ojdbc8-12.2.0.one.jar': Put the jar of the RMLMapper and JDBC driver in the classpath.
  • exist.ugent.rml.cli.Master: be.ugent.rml.cli.Primary is the entry bespeak of the RMLMapper.
  • -m rules.rml.ttl: Use the RML rules in the file rules.rml.ttl. The verbal aforementioned options every bit the ones mentioned earlier are supported.

Library

An example of how y'all can use the RMLMapper as an external library can exist found at ./src/test/java/be/ugent/rml/readme/ReadmeTest.java

Docker

Dockerhub

Nosotros publish our Docker images automatically on Dockerhub for every release. Yous tin find our images here: rmlio/rmlmapper-java.

Build image

Y'all can use Docker to run the RMLMapper by following these steps:

  • Build the Docker prototype: docker build -t rmlmapper ..
  • Run a Docker container: docker run --rm -v $(pwd):/data rmlmapper -thousand mapping.ttl.

The aforementioned parameters are available equally via the CLI. The RMLMapper is executed in the /data folder in the Docker container.

Including functions

There are two ways to include (new) functions inside the RML Mapper

  • dynamic loading: you lot add links to java files or jar files, and those files are loaded dynamically at runtime
  • preloading: you annals functionality via code, and you lot demand to rebuild the mapper to use that functionality

Registration of functions is done using a Turtle file, which you tin find in src/main/resources/functions.ttl

The snippet beneath for case links an fno:function to a library, provided past a jar-file (GrelFunctions.jar).

              @prefix              dcterms:              <http://purl.org/dc/terms/>              .              @prefix              doap:              <http://usefulinc.com/ns/doap#>              .              @prefix              fno:              <https://w3id.org/function/ontology#>              .              @prefix              fnoi:              <https://w3id.org/function/vocabulary/implementation#>              .              @prefix              fnom:              <https://w3id.org/function/vocabulary/mapping#>              .              @prefix              grel:              <http://users.ugent.be/~bjdmeest/part/grel.ttl#>              .              @prefix              grelm:              <http://fno.io/grel/rmlmapping#>              .              @prefix              rdfs:              <http://world wide web.w3.org/2000/01/rdf-schema#>              .  grel:toUpperCase              a              fno:Function ;   fno:name                              "to Uppercase"                            ;   rdfs:characterization                              "to Uppercase"                            ;   dcterms:description                              "Returns the input with all letters in upper case."                            ;   fno:expects ( grel:valueParam ) ;   fno:returns ( grel:stringOut ) .  grelm:javaString              a              fnoi:JavaClass ;     doap:download-folio                              "GrelFunctions.jar"                            ;     fnoi:grade-proper name                              "io.fno.grel.StringFunctions"                            .  grelm:uppercaseMapping              a              fnoi:Mapping ;     fno:office         grel:toUpperCase ;     fno:implementation   grelm:javaString ;     fno:methodMapping    [              a              fnom:StringMethodMapping ;                            fnom:method-name                              "toUppercase"                            ] .

Dynamic loading

Just put the java or jar-file in the resources folder, at the root folder of the jar-location, or the parent folder of the jar-location, information technology will exist establish dynamically.

Note: the java or jar-files are found relative to the cwd. You can change the functions.ttl path (or employ multiple functions.ttl paths) using a commandline-option (-f).

Preloading

This overrides the dynamic loading. An example of how you can use Preload a custom function tin be found at ./src/test/java/be/ugent/rml/readme/ReadmeFunctionTest.coffee

Generating metadata

Conform to how it is described in the scientific paper [1], the RMLMapper allows to automatically generate PROV-O metadata. Specifically, you need the CLI arguments below. You tin can specify in which output file the metadata should be stored, and upwards to which level metadata should be stored (dataset, triple, or term level metadata).

                              -e,--metadatafile <arg>          path to output metadata file  -50,--metadataDetailLevel <arg>   generate metadata on given detail level                                   (dataset - triple - term)                          

Testing

Run the tests via exam.sh.

Derived tests

Some tests (Excel, ODS) are derived from other tests (CSV) using a script (./generate_spreadsheet_test_cases.sh)

RDBs

Make certain you take Docker running.

Bug

  • A problem with Docker (can't start the container) causes the SQLServer tests to fail locally. These tests will ever succeed locally.
  • A problem with Docker (can't start the container) causes the PostgreSQL tests to fail locally on Windows 7 machines.

Dependencies

Dependency License
ch.qos.logback logback-classic Eclipse Public License 1.0 & GNU Lesser Full general Public License 2.one
commons-cli commons-lang Apache License two.0
com.opencsv opencsv Apache License two.0
commons-cli commons-cli Apache License two.0
org.eclipse.rdf4j rdf4j-runtime Eclipse Public License i.0
junit junit Eclipse Public License 1.0
com.jayway.jsonpath json-path Apache License 2.0
javax.xml.parsers jaxp-api Apache License two.0
org.jsoup MIT
mysql mysql-connector-coffee GNU Full general Public License v2.0
ch.vorbuger.mariaDB4j mariaDB4j Apache License 2.0
postgresql postgresql BSD
com.microsoft.sqlserver mssql-jdbc MIT
com.spotify docker-client Apache License 2.0
com.fasterxml.jackson.core jackson-core Apache License 2.0
org.eclipse.jetty jetty-server Eclipse Public License i.0 & Apache License 2.0
org.eclipse.jetty jetty-security Eclipse Public License ane.0 & Apache License 2.0
org.apache.jena apache-jena-libs Apache License ii.0
org.apache.jena jena-fuseki-embedded Apache License 2.0
com.github.bjdmeest hdt-java GNU Lesser Full general Public License v3.0
eatables-validator commons-validator Apache License ii.0
com.github.fnoio grel-functions-java MIT

Commercial Support

Practice you need...

  • preparation?
  • specific features?
  • different integrations?
  • bugfixes, on your timeline?
  • custom code, congenital by experts?
  • commercial support and licensing?

You're welcome to contact us regarding on-premise, enterprise, and internal installations, integrations, and deployments.

Nosotros have commercial support bachelor.

Nosotros as well offer consulting for all-things-RML.

Remarks

Typed spreadsheet files

All spreadsheet files are as of yet regarded every bit plain CSV files. No type information similar Currency, Date... is used.

XML file parsing performance

The RMLMapper'south XML parsing implementation (javax.xml.parsers) has been chosen to support total XPath. This implementation causes a large memory consumption (upwardly to ten times larger than the original XML file size). However, the RMLMapper tin can exist hands adapted to use a unlike XML parsing implementation that might be better suited for a specific use instance.

Language tag back up

The processor checks whether correct language tags are not, using a regular expression. The regex has no support for languages of length 5-8, simply this currently merely applies to 'qaa..qtz'.

Duplicate removal and serialization format

Performance depends on the serialization format (--serialization <format>) and if duplicate removal is enabled (--duplicates). Experimenting with various configurations may lead to better performance for your apply case.

I take a question! Where can I get assistance?

Do yous accept any question related to writing RML mapping rules, the RML specification, etc., feel gratis to ask them here: https://github.com/kg-construct/rml-questions ! If you have found a bug or demand a characteristic for the RMLMapper itself, you lot tin can make an issue in this repository.

Documentation

Generate static files at /docs/apidocs with:

UML Diagrams

Architecture UML Diagram

How to generate with IntelliJ Thought

(Requires Ultimate edition)

  • Right click on package: "exist.ugent.rml"
  • Diagrams > Evidence Diagram > Java Form Diagrams
  • Choose what properties of the classes you want to testify in the upper left corner
  • Export to file > .png | Save diagram > .uml

Sequence Diagram

Edit on depict.io
  • Become to draw.io
  • Click on 'Open Existing Diagram' and choose the .html file

[1]: A. Dimou, T. De Nies, R. Verborgh, E. Mannens, P. Mechant, and R. Van de Walle, "Automated metadata generation for linked data generation and publishing workflows," in Proceedings of the 9th Workshop on Linked Data on the Web, Montreal, Canada, 2016, pp. 1–10. PDF

marrphree1941.blogspot.com

Source: https://github.com/RMLio/rmlmapper-java

0 Response to "Java Read Xml File From Inside Jar Memory Usage"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel