受欢迎的博客标签

Redis key design for real-time stock application

Published

I am trying to build a real-time stock application. Every seconds I can get some data from web service like below:

[{"amount":"20","date":1386832664,"price":"183.8","tid":5354831,"type":"sell"},{"amount":"22","date":1386832664,"price":"183.61","tid":5354833,"type":"buy"}]

tid is the ticket ID for stock buying and selling; date is the second from 1970.1.1; price/amount is at what price and how many stock traded.

Reuirement My requirement is show user highest/lowest price at every minute/5 minutes/hour/day in real-time; show user the sum of amount in every minute/5 minutes/hour/day in real-time.

Question

My question is how to store the data to redis, so that I can easily and quickly get highest/lowest trade from DB for different periods.

My design is something like below:

[date]:[tid]:amount

[date]:[tid]:price

[date]:[tid]:type

I am new in redis. If the design is this is that means I need to use sorted set, will there any performance issue? Or is there any other way to get highest/lowest price for different periods. Looking forward for your suggestion and design.

Answer

My suggestion is to store min/max/total for all intervals you are interested in and update it for current ones with every arriving data point. To avoid network latency when reading previous data for comparison, you can do it entirely inside Redis server using Lua scripting.

One key per data point (or, even worse, per data point field) is going to consume too much memory. For the best results, you should group it into small lists/hashes (see http://redis.io/topics/memory-optimization). Redis only allows one level of nesting in its data structures:

if you data has multiple fields and you want to store more than one item per key, you need to somehow encode it yourself.

Fortunately, standard Redis Lua environment includes msgpack support which is very a efficient binary JSON-like format. JSON entries in your example encoded with msgpack "as is" will be 52-53 bytes long.

I suggest grouping by time so that you have 100-1000 entries per key. Suppose one-minute interval fits this requirement. Then the keying scheme would be like this:

YYmmddHHMMSS — a hash from tid to msgpack-encoded data points for the given minute. 

5m:YYmmddHHMM, 

1h:YYmmddHH, 

1d:YYmmdd — window data hashes which contain min, max, sumfields. Let's look at a sample Lua script that will accept one data point and update all keys as necessary.

Due to the way Redis scripting works we need to explicitly pass the names of all keys that will be accessed by the script, i.e. the live data and all three window keys.

Redis Lua has also JSON parsing library available, so for the sake of simplicity let's assume we just pass it JSON dictionary. That means that we have to parse data twice:

on the application side and on the Redis side, but the performance effects of it are not clear.

local function update_window(winkey, price, amount)

local windata = redis.call('HGETALL', winkey)

if price > tonumber(windata.max or 0)

then

redis.call('HSET', winkey,'max', price)

end if price < tonumber(windata.min or 1e12)

then

redis.call('HSET', winkey,'min', price)

end

redis.call('HSET', winkey,'sum',(windata.sum or 0)+ amount)

end local currkey, fiveminkey, hourkey, daykey = unpack(KEYS)

local data = cjson.decode(ARGV[1])

local packed = cmsgpack.pack(data)

local tid = data.tid redis.call('HSET', currkey, tid, packed)

local price = tonumber(data.price)

local amount = tonumber(data.amount)

update_window(fiveminkey, price, amount)

update_window(hourkey, price, amount)

update_window(daykey, price, amount)

This setup can do thousands of updates per second, not very hungry on memory, and window data can be retrieved instantly.

UPDATE: On the memory part, 50-60 bytes per point is still a lot if you want to store more a few millions. With this kind of data I think you can get as low as 2-3 bytes per point using custom binary format, delta encoding, and subsequent compression of chunks using something like snappy. It depends on your requirements, whether it's worth doing this.  .