読者です 読者をやめる 読者になる 読者になる

中年engineerの独り言 - crumbjp

LinuxとApacheの憂鬱

Mongo threads

This article is for the man who try to read the source code.

I have to make a lot of memos when I read the large percent of the source code.

This is the summary of it.

Startup sequence

Threads

Thread name will appear in the log.
For instance,

Tue Feb  5 19:15:33.544 [rsBackgroundSync] replSet syncing to: 192.168.159.134:27017

The part of [rsBackgroundSync] stands for the thread name.

main

This thread name will be changed by its phase.

//github.com/mongodb/mongo/blob/r2.3.2/src/mongo/db/db.cpp#L740">(noname): At the first
//github.com/mongodb/mongo/blob/r2.3.2/src/mongo/db/db.cpp#L568">initandlisten: The initial phase of mongod. See the mongo-startup sequence diagram.
//github.com/mongodb/mongo/blob/r2.3.2/src/mongo/util/net/listen.cpp#L208">conn(class MyMessageHandler): The service phase of mongod. Server sock thread.
interruptThread

Unique signal handler in this process.

This is common sense.

DataFileSync

Call msync() to flush MMAP as the datafiles.

sleep
[--syncdelay]*1000 (60*1000 is default , 0 is never)

You can change "syncdelay" parameter online.

  admin db.runCommand({'setParameter':1,syncdelay:1})


But I think, This interval of msync() is too long to work.
The kernel will sync mre quickly automatically.

So this can be useless thread !

journal

Write to JOURNAL and DATAFILE (group commit feature)

sleep
(journalCommitInterval/3) + 1

This thread will sleep 1 / 3 of journalCommitInterval to check limit of uncommitted bytes.

This is the reason of the "--journalCommitInterval" ranging from 2 to 300
I had wondered it why start from 2 until know it. (^^

Implements:

  1. Write to journal
  2. notify commited to getlasterror, awaitCommit() with "j" option at db/dbcommand.cpp
  3. write to data files
indexRebuilder : (since 2.4)

Temporary thread.
Try to repair the halfway index when startup.

sleep
die after work

When startup, It may detect the crashed halfway building index.

Then this thread retry to build index. ( it'll also obey the "--noIndexBuildRetry" option )

SnapshotThread

Logging thread ?

sleep
4000

Like as

   cpu: elapsed: 4000  writelock: 0%"

This thread seems like it never do important things.

clientcursormon

Report warnings and correct stale cursors.

sleep
4000


Output warnings if the number of cursor is more than 100000.

warning number of open cursors is very large: ??

Correct timeouted cursors with...

  killing old cursor [id] [ns] idle: ??ms
timeout of cursor
600000 msec
PeriodicTask::Runner

Run the regular tasks.

sleep
60000

Bellow tasks are run in mongos and mongo-client.

  • Cleaner (writeback query cleaner)
  • DBConnectionPool (staled connection cleaner)
TTLMonitor : (since 2.2)

Correct expired documents.

sleep
60000
   db.foo.ensureIndex({"status":1},{expireAfterSeconds:3600})
rsStart

See startThreads()

Temporary thread.
Merely starts some threads are required for replica set.

sleep
die after work
rsHealthPoll

Send hartbeat message to other mongod.

sleep
2000
  1. Heartbeat request ( & get response)
  2. Send "update heartbeat" message to rsMgr
  3. Send "check new state" message to rsMgr
rsMgr on on task::Server

Async messaging framework.

sleep
by mutex cond wait

Run lambda function (message) is pushd by someone.

It seems like used by rsHealthPoll mainly.

rsSync

It seems like work following threads

  • rsBackgroundSync
  • rsSyncNotifier
  • rsGhostSync

But these are too complicated to understand precisely for me...

sleep
1000

First of all, Sync task is for slave, so do nothing when primary.

  1. Do initial sync at the first time.
  2. Enter the loop
  3. Do sync data from OPQueue is inner oplog queue.

in SyncTail::oplogApplication()

  1. Pop oplogs from OPQueue following to replBatchLimitBytes while considering --slaveDelay.
  2. Multi apply to me.
  3. Write lastOp.
  4. Notify to rsbackgroundsync and rsSyncNotifier.
rsBackgroundSync

Read oplog from foreign oplog and queuing to OPQueue.

sleep
no wait
  1. Determin _currentSyncTarget (by getOplogReader).
  2. Read oplog from network (by OplogReader).
  3. Push it to OPQueue
rsSyncNotifier

I could not understand what is the role of this...

sleep
no wait
  1. Cond wait (will notify from rsSync)
  2. _oplogMarker.more() : It used to compare with cursor of rsBackgroundSync. Maybe,,,

I'm feeling this thread do logging only eventually...but I also think it's so complicated for logging.

I could not understand the role of the oplogMarker cursor.

rsGhostSync on task::Server

I also could not understand precisely...

sleep
no wait

GhostSync::percolate() would be called when a (ghost) slave connected to sync from me in spite of I'm now slave.

percolate() would compare my sync target from ghost slave.

But... I could not found the way of how to avoid cyclic sync.

slaveTracking

Watch and notify.

sleep
1000

Be kicked from processGetMore() : db/ops/query.cpp (on rsSync ?)

  1. Get the event of slave changing.
  2. Write to local.slaves.
  3. Notify to other thread. (Maybe used in mongos)

Accept client sequence

Threads

conn%d

Mongod will create thread for each socket client.