Data Collection [Complete]¶
All the CLIx schools are running either of the platforms: [1]. gStudio [2]. unplatform, in schools where servers are not available, etc.
GENERIC STRUCTURE:
The directory/folder structure of collected data is maintained as LEVEL 0/.../LEVEL N
Read below for description of LEVELs.
Year (YYYY) [ LEVEL 0 ]
State (State Code) [ LEVEL 1 ]
- School (School Code + Server ID) [ LEVEL 2 ]
- gstudio (Clix Platform) [ LEVEL 3 ]
- db [ LEVEL 4 ]
- media [ LEVEL 4 ]
- rcs-repo [ LEVEL 4 ]
- postgres-dump [ LEVEL 4 ]
- pg_dump_all.sql [ LEVEL 5 ]
- system-heartbeat [ LEVEL 4 ]
- gstudio-exported-users-analytics-csvs [ LEVEL 4 ]
- gstudio_tools_logs [ LEVEL 4 ]
- gstudio-logs [ LEVEL 4 ]
- local_settings.py [ LEVEL 4 ]
- server_settings.py [ LEVEL 4 ]
- git-commit.log [ LEVEL 4 ]
- assessment-media [ LEVEL 4 ]
- repository [ LEVEL 5 ]
- studentResponseFiles [ LEVEL 5 ]
- nginx-logs [ LEVEL 4 ]
- unplatform (Optional) [ LEVEL 3 ]
[ LEVEL 0 ] : Year (YYYY)
- Year when data is collected.
- Example:
- 2016
- 2017
- 2018
[ LEVEL 1 ] : State (State Code)
- Small case state code, which we have used in our school instances.
- Possible values:
- Mizoram : mz
- Rajasthan : rj
- Chhattisgarh : ct
- Telangana : tg
[ LEVEL 2 ] : School (School Code + Server ID)
- It's combination of "school code" and "server id"
- i.e:
<school code>-<server id> - Example: 2031001-mz1
NOTE: In order to address scenario where the schools might run both gStudio and unplatform in parallel, we have a provision to collect data from the said platforms and both may reside at [LEVEL 3]. To maintain this structure as generic, we are namingclixserverasgstudio.
[ LEVEL 3 ] : gstudio
- A CLIx Platform, clixserver.
[ LEVEL 4 ] : data
- gstudio Folder will have following [LEVEL 4] items:
db: mongoDB data (gStudio + qbank).media: Files uploaded on clixserver (gStudio).rcs-repo: rcs, versioned json files (gStudio).postgres-dump: Postgres DB dump folder containing filepg_dump_all.sql. This is dump of user data.system-heartbeat: Contains log of following:- Server Id
- School Id
- School Name
- Internal IP Address
- RAM usage details (
free -h) - HDD usage details (
df -h) uptime- Current Processes (
ps aux)
gstudio-exported-users-analytics-csvs: Folder, contains students unitwise quantitative data in CSV form.gstudio_tools_logs: Tools (which supports data logging and configured gstudio end points) data injsonformat will be stored in this folder.gstudio-logs: Folder containing gstudio level general logs for scripts and updates.local_settings.py: Copy of file in deployed instance (gStudio).server_settings.py: Copy of file in deployed instance (gStudio).git-commit.log: Snapshot of git records at time of backup. It will have output of following git commands (gStudio + qbank):git statusgit diffgit log -5git branch
assessment-media: Assessment files and user uploaded files in assessments. This folder will have following [LEVEL 5] directories (qbank)repository: Files/Media used in assessments.studentResponseFiles: User uploaded files in assessments e.g: recorded-audio, images etc.
nginx-logs: Contains logs produced by nginx.
EXAMPLE STRUCTURE:
Example-data-collection-dir-str/
├── 2016
└── 2017
├── ct
│ └── 01011011-ct11
│ ├── gstudio
│ │ ├── db
│ │ ├── media
│ │ ├── rcs-repo
│ │ ├── postgres-dump
│ │ | └── pg_dump_all.sql
│ | ├── gstudio-exported-users-analytics-csvs
│ | ├── gstudio_tools_logs
│ | ├── gstudio-logs
│ | ├── system-heartbeat
│ │ ├── local_settings.py
│ │ ├── server_settings.py
│ │ ├── git-commit.log
│ │ ├── nginx-logs
│ │ └── assessment-media
│ │ ├── repository
│ │ └── studentResponseFiles
│ └── unplatform
└── mz
└── 02031001-mz1
├── gstudio
│ ├── db
│ ├── media
│ ├── rcs-repo
│ ├── postgres-dump
│ | └── pg_dump_all.sql
│ ├── gstudio-exported-users-analytics-csvs
│ ├── gstudio_tools_logs
│ ├── gstudio-logs
│ ├── system-heartbeat
│ ├── local_settings.py
│ ├── server_settings.py
│ ├── git-commit.log
│ ├── nginx-logs
│ └── assessment-media
│ ├── repository
│ └── studentResponseFiles
└── unplatform
Derived Data¶
Derived data is one which we can pull out from above complete data by running scripts on it.
1. Activity Timestamp [format: CSV]¶
1.1 For Single User:
- Script Name:
activity_timestamp - Prerequisite:
- Get in docker (ref. Utility 1)
- Reach at required gstudio location within docker (ref. Utility 2)
- RUN :
python manage.py activity_timestamp <username> - Example RUN :
python manage.py activity_timestamp green-apple-sp100
1.2 For all user's:
- Script Name:
get_all_users_activity_timestamp_csvs.py - Prerequisite:
- Get in docker (ref. Utility 1)
- Reach at required gstudio location within docker (ref. Utility 2)
- Get into project shell (ref. Utility 3)
2. Progress CSV's: [format: CSV]¶
- Script name:
export_users_analytics - Prerequisite:
- Get in docker (ref. Utility 1)
- Reach at required gstudio location within docker (ref. Utility 2)
- RUN :
python manage.py export_users_analytics
3. Assessments data [format: DB data dumps]¶
- Script Name:
single_school_get_MIT_activity_data.py - Prerequisite:
- Get in docker (ref. Utility 1)
- Reach at required gstudio location within docker (ref. Utility 4)
- RUN :
python single_school_get_MIT_activity_data.py