Big data collections with MapDB

Eingestellt von Unknown Juni 11, 2014

Big data collections with MapDB

This article gives a short overview over the open source software MapDB which is now in version 1.0.3.

What is MapDB?

Original designed as a storage engine for an astronomical desktop application it had two design goals minimal overhead and simplicity. Over the time the engine had evolved and the third goal provide an alternative Java memory model was added. So now it is a storage engine which is specialized for big data collections and for that has some cool features.
For example:

Write to Heap, OffHeap, File or TempFile
Synchronization of Maps/TreeMaps/Sets and Queues

Maps can also be build with composite keys
bidirectional maps
synchronization between maps (in case you have a 1-N association)

Caching

expiration on disk usage, access or write time

Compression
Faceting aka Histogram
Simulated Auto-Increment
Transactions (Note: a single transaction can only be used once)
Querying

Small Example

The following example shows the simplicy in the context of IoT where i put 10 million temperature values into a collection which is backed by an off-heap and group the values into five groups (cold, fresh, warm, hot and burns). For filling the cache i also use auto increment.

public class TemperatureRepository {
    private final Atomic.Long keyinc;
    private ConcurrentHashMap<String, Long> histogram;
    private HTreeMap<Long, Integer> temperatureMap;

    public TemperatureRepository() {
        //Create off-heap memory cache
        temperatureMap = DBMaker.newCache(1.0);

        //Get Autoincrement counter
        DB db = new DB(temperatureMap.getEngine());
        keyinc = db.getAtomicLong("map_temp");

        // histogram, category is a key, count is a value
        histogram = new ConcurrentHashMap<String, Long>(); //any map will do

        // bind histogram to primary map
        // we need function which returns category for each map entry
        Bind.histogram(temperatureMap, histogram, (key, value) -> {
            String ret = null;

            if (value < 0) {
                ret = "cold";
            } else if (value < 10) {
                ret = "fresh";
            } else if (value < 20) {
                ret = "warm";
            } else if (value < 30) {
                ret = "hot";
            } else {
                ret = "burns";
            }
            return ret;
        });
    }

    public void add(int temperature) {
        temperatureMap.put(keyinc.incrementAndGet(), temperature);
    }

    public void printHistogram() {
        System.out.println(histogram);
    }

    public static void main(String[] args) {
        TemperatureRepository temperatureRepository = new TemperatureRepository();
        new Random().ints(-10,40).parallel().limit(1_000_000).forEach(e-> temperatureRepository.add(e));
        temperatureRepository.printHistogram();
    }
}

Fazit

Until now i had not the chance to use MapDB in a productive environment but on our playground at www.rapidpm.org it makes a very good impression.

Dieses Blog durchsuchen

RapidPM

Big data collections with MapDB

What is MapDB?

Small Example

Fazit

Kommentare

Kommentar veröffentlichen

Beliebte Posts

Lego Mindstorms EV3 Components: Color Sensor - Part 1

Lego Mindstorms EV3 Components: Infrared Sensor - Part 3