Java Read Xml File From Inside Jar Memory Usage
RMLMapper
The RMLMapper execute RML rules to generate Linked Data. Information technology is a Java library, which is available via the command line (API docs online). The RMLMapper loads all data in memory, and so be aware when working with big datasets.
Table of contents
- Features
- Supported
- Future
- Releases
- Build
- Usage
- CLI
- Library
- Docker
- Including functions
- Generating metadata
- Testing
- RDBs
- Deploy on Central Repository
- Dependencies
- Commercial Back up
- Remarks
- XML file parsing functioning
- Language tag support
- Indistinguishable removal and serialization format
- Documentation
- UML Diagrams
Features
Supported
- local data sources:
- Excel (.xlsx)
- LibreOffice (.ods)
- CSV files (including CSVW)
- JSON files (JSONPath)
- XML files (XPath)
- remote information sources:
- relational databases (MySQL, PostgreSQL, Oracle, and SQLServer)
- Web APIs with W3C Web of Things
- SPARQL endpoints
- files via HTTP urls (via Become)
- CSV files
- JSON files (JSONPath (
@
can be used to select the current object.)) - XML files (XPath)
- functions (nigh cases)
- For examples on how to utilize functions inside RML mapping documents, you can have a look at the RML+FnO test cases
- configuration file
- metadata generation
- output formats: nquads (default), turtle, trig, trix, jsonld, hdt
- join weather
- targets:
- local file
- VoID dataset
- SPARQL endpoint with SPARQL UPDATE
Hereafter
- functions (all cases)
- conditions (all cases)
- data sources:
- NoSQL databases
- TPF servers
Releases
The standalone jar file for every release tin be plant on the release'south folio on GitHub. You can detect the latest release here.
Build
The RMLMapper is build using Maven: mvn install
. A standalone jar can be found in /target
.
Two jars are found in /target
: a slim jar without bundled dependencies, and a standalone jar (suffixed with -all.jar
) with all dependencies bundled.
Usage
CLI
The following options are most mutual.
-
-m, --mapping <arg>
: ane or more mapping file paths and/or strings (multiple values are concatenated). -
-o, --output <arg>
: path to output file -
-s,--serialization <arg>
: serialization format (nquads (default), trig, trix, jsonld, hdt)
All options can be plant when executing coffee -jar rmlmapper.jar --help
, that output is constitute below.
usage: coffee -jar mapper.jar <options> options: -c,--configfile <arg> path to configuration file -d,--duplicates remove duplicates in the output -dsn,--r2rml-jdbcDSN <arg> DSN of the database when using R2RML rules -e,--metadatafile <arg> path to output metadata file -f,--functionfile <arg> one or more function file paths (dynamic functions with relative paths are establish relative to the cwd) -h,--assist show help info -fifty,--metadataDetailLevel <arg> generate metadata on given particular level (dataset - triple - term) -thou,--mappingfile <arg> one or more than mapping file paths and/or strings (multiple values are concatenated). r2rml is converted to rml if needed using the r2rml arguments. -psd,--privatesecuritydata <arg> i or more private security files containing all private security information such as usernames, passwords, certificates, etc. -o,--outputfile <arg> path to output file (default: stdout) -p,--r2rml-password <arg> password of the database when using R2RML rules -due south,--serialization <arg> serialization format (nquads (default), turtle, trig, trix, jsonld, hdt) -t,--triplesmaps <arg> IRIs of the triplesmaps that should be executed in order, separate past ',' (default is all triplesmaps) -u,--r2rml-username <arg> username of the database when using R2RML rules -five,--verbose show more details in debugging output --strict Enable strict fashion. In strict manner, the mapper volition neglect on invalid IRIs instead of skipping them. -b --base-IRI <arg> base of operations IRI used to expand relative IRIs in mapped terms. If not set and not in --strict mode, will default to the @base directive inside the provided mapping file.
Accessing Web APIs with authentication
The W3C Spider web of Things Security Ontology is used to draw how Web APIs hallmark should be performed but does not include the necessary credentials to access the Spider web API. These credentials can exist supplied using the -psd <PATH>
CLI statement. The PATH
argument must point to one or more than private security files which incorporate the necessary credentials to admission the Web API.
An example can be plant in the test cases src/examination/resources/web-of-things.
Accessing Oracle Database
Y'all need to add the Oracle JDBC commuter manually to the class path if you desire to access an Oracle Database. The required driver is ojdbc8
.
- Download
ojdbc8.jar
from Oracle. - Execute the RMLMapper via
java -cp 'rmlmapper.jar:ojdbc8-12.ii.0.1.jar' be.ugent.rml.cli.Main -m rules.rml.ttl
The options do the post-obit:
-
-cp 'rmlmapper.jar:ojdbc8-12.2.0.one.jar'
: Put the jar of the RMLMapper and JDBC driver in the classpath. -
exist.ugent.rml.cli.Master
:be.ugent.rml.cli.Primary
is the entry bespeak of the RMLMapper. -
-m rules.rml.ttl
: Use the RML rules in the filerules.rml
.ttl. The verbal aforementioned options every bit the ones mentioned earlier are supported.
Library
An example of how y'all can use the RMLMapper as an external library can exist found at ./src/test/java/be/ugent/rml/readme/ReadmeTest.java
Docker
Dockerhub
Nosotros publish our Docker images automatically on Dockerhub for every release. Yous tin find our images here: rmlio/rmlmapper-java.
Build image
Y'all can use Docker to run the RMLMapper by following these steps:
- Build the Docker prototype:
docker build -t rmlmapper .
. - Run a Docker container:
docker run --rm -v $(pwd):/data rmlmapper -thousand mapping.ttl
.
The aforementioned parameters are available equally via the CLI. The RMLMapper is executed in the /data
folder in the Docker container.
Including functions
There are two ways to include (new) functions inside the RML Mapper
- dynamic loading: you lot add links to java files or jar files, and those files are loaded dynamically at runtime
- preloading: you annals functionality via code, and you lot demand to rebuild the mapper to use that functionality
Registration of functions is done using a Turtle file, which you tin find in src/main/resources/functions.ttl
The snippet beneath for case links an fno:function to a library, provided past a jar-file (GrelFunctions.jar
).
@prefix dcterms: <http://purl.org/dc/terms/> . @prefix doap: <http://usefulinc.com/ns/doap#> . @prefix fno: <https://w3id.org/function/ontology#> . @prefix fnoi: <https://w3id.org/function/vocabulary/implementation#> . @prefix fnom: <https://w3id.org/function/vocabulary/mapping#> . @prefix grel: <http://users.ugent.be/~bjdmeest/part/grel.ttl#> . @prefix grelm: <http://fno.io/grel/rmlmapping#> . @prefix rdfs: <http://world wide web.w3.org/2000/01/rdf-schema#> . grel:toUpperCase a fno:Function ; fno:name "to Uppercase" ; rdfs:characterization "to Uppercase" ; dcterms:description "Returns the input with all letters in upper case." ; fno:expects ( grel:valueParam ) ; fno:returns ( grel:stringOut ) . grelm:javaString a fnoi:JavaClass ; doap:download-folio "GrelFunctions.jar" ; fnoi:grade-proper name "io.fno.grel.StringFunctions" . grelm:uppercaseMapping a fnoi:Mapping ; fno:office grel:toUpperCase ; fno:implementation grelm:javaString ; fno:methodMapping [ a fnom:StringMethodMapping ; fnom:method-name "toUppercase" ] .
Dynamic loading
Just put the java or jar-file in the resources folder, at the root folder of the jar-location, or the parent folder of the jar-location, information technology will exist establish dynamically.
Note: the java or jar-files are found relative to the cwd. You can change the functions.ttl path (or employ multiple functions.ttl paths) using a commandline-option (
-f
).
Preloading
This overrides the dynamic loading. An example of how you can use Preload a custom function tin be found at ./src/test/java/be/ugent/rml/readme/ReadmeFunctionTest.coffee
Generating metadata
Conform to how it is described in the scientific paper [1], the RMLMapper allows to automatically generate PROV-O metadata. Specifically, you need the CLI arguments below. You tin can specify in which output file the metadata should be stored, and upwards to which level metadata should be stored (dataset, triple, or term level metadata).
-e,--metadatafile <arg> path to output metadata file -50,--metadataDetailLevel <arg> generate metadata on given detail level (dataset - triple - term)
Testing
Run the tests via exam.sh
.
Derived tests
Some tests (Excel, ODS) are derived from other tests (CSV) using a script (./generate_spreadsheet_test_cases.sh
)
RDBs
Make certain you take Docker running.
Bug
- A problem with Docker (can't start the container) causes the SQLServer tests to fail locally. These tests will ever succeed locally.
- A problem with Docker (can't start the container) causes the PostgreSQL tests to fail locally on Windows 7 machines.
Dependencies
Dependency | License |
---|---|
ch.qos.logback logback-classic | Eclipse Public License 1.0 & GNU Lesser Full general Public License 2.one |
commons-cli commons-lang | Apache License two.0 |
com.opencsv opencsv | Apache License two.0 |
commons-cli commons-cli | Apache License two.0 |
org.eclipse.rdf4j rdf4j-runtime | Eclipse Public License i.0 |
junit junit | Eclipse Public License 1.0 |
com.jayway.jsonpath json-path | Apache License 2.0 |
javax.xml.parsers jaxp-api | Apache License two.0 |
org.jsoup | MIT |
mysql mysql-connector-coffee | GNU Full general Public License v2.0 |
ch.vorbuger.mariaDB4j mariaDB4j | Apache License 2.0 |
postgresql postgresql | BSD |
com.microsoft.sqlserver mssql-jdbc | MIT |
com.spotify docker-client | Apache License 2.0 |
com.fasterxml.jackson.core jackson-core | Apache License 2.0 |
org.eclipse.jetty jetty-server | Eclipse Public License i.0 & Apache License 2.0 |
org.eclipse.jetty jetty-security | Eclipse Public License ane.0 & Apache License 2.0 |
org.apache.jena apache-jena-libs | Apache License ii.0 |
org.apache.jena jena-fuseki-embedded | Apache License 2.0 |
com.github.bjdmeest hdt-java | GNU Lesser Full general Public License v3.0 |
eatables-validator commons-validator | Apache License ii.0 |
com.github.fnoio grel-functions-java | MIT |
Commercial Support
Practice you need...
- preparation?
- specific features?
- different integrations?
- bugfixes, on your timeline?
- custom code, congenital by experts?
- commercial support and licensing?
You're welcome to contact us regarding on-premise, enterprise, and internal installations, integrations, and deployments.
Nosotros have commercial support bachelor.
Nosotros as well offer consulting for all-things-RML.
Remarks
Typed spreadsheet files
All spreadsheet files are as of yet regarded every bit plain CSV files. No type information similar Currency, Date... is used.
XML file parsing performance
The RMLMapper'south XML parsing implementation (javax.xml.parsers
) has been chosen to support total XPath. This implementation causes a large memory consumption (upwardly to ten times larger than the original XML file size). However, the RMLMapper tin can exist hands adapted to use a unlike XML parsing implementation that might be better suited for a specific use instance.
Language tag back up
The processor checks whether correct language tags are not, using a regular expression. The regex has no support for languages of length 5-8, simply this currently merely applies to 'qaa..qtz'.
Duplicate removal and serialization format
Performance depends on the serialization format (--serialization <format>
) and if duplicate removal is enabled (--duplicates
). Experimenting with various configurations may lead to better performance for your apply case.
I take a question! Where can I get assistance?
Do yous accept any question related to writing RML mapping rules, the RML specification, etc., feel gratis to ask them here: https://github.com/kg-construct/rml-questions ! If you have found a bug or demand a characteristic for the RMLMapper itself, you lot tin can make an issue in this repository.
Documentation
Generate static files at /docs/apidocs with:
UML Diagrams
Architecture UML Diagram
How to generate with IntelliJ Thought
(Requires Ultimate edition)
- Right click on package: "exist.ugent.rml"
- Diagrams > Evidence Diagram > Java Form Diagrams
- Choose what properties of the classes you want to testify in the upper left corner
- Export to file > .png | Save diagram > .uml
Sequence Diagram
Edit on depict.io
- Become to draw.io
- Click on 'Open Existing Diagram' and choose the .html file
[1]: A. Dimou, T. De Nies, R. Verborgh, E. Mannens, P. Mechant, and R. Van de Walle, "Automated metadata generation for linked data generation and publishing workflows," in Proceedings of the 9th Workshop on Linked Data on the Web, Montreal, Canada, 2016, pp. 1–10. PDF
Source: https://github.com/RMLio/rmlmapper-java
0 Response to "Java Read Xml File From Inside Jar Memory Usage"
Post a Comment