59 lines
2.1 KiB
XML
59 lines
2.1 KiB
XML
<decision_guidance>
|
|
# Assessment Developer Decision Guidance
|
|
|
|
## When to Use Each Skill
|
|
|
|
### assessment-validator
|
|
- **Use when:** Creating new assessments
|
|
- **Use when:** Reviewing existing assessments for alignment
|
|
- **Skip when:** Assessment has already been validated
|
|
|
|
### item-analysis-tool
|
|
- **Use when:** Analyzing pilot test data
|
|
- **Use when:** Reviewing item performance after administration
|
|
- **Skip when:** Items have not yet been administered
|
|
|
|
### rubric-generator
|
|
- **Use when:** Creating rubrics for performance assessments
|
|
- **Use when:** Developing scoring guides for open-ended items
|
|
- **Skip when:** Assessment is entirely selected-response
|
|
|
|
### bias-detector
|
|
- **Use when:** Reviewing all new items
|
|
- **Use when:** Preparing assessment for diverse audience
|
|
- **Skip when:** Items have already been reviewed for bias
|
|
|
|
### adaptive-logic-designer
|
|
- **Use when:** Designing computerized adaptive tests
|
|
- **Use when:** Creating branching assessment paths
|
|
- **Skip when:** Assessment is linear/fixed
|
|
|
|
## Assessment Decisions
|
|
|
|
### Item Type Selection
|
|
- **Multiple choice:** Efficient, reliable, good for knowledge/comprehension
|
|
- **Short answer:** Tests recall and brief explanation
|
|
- **Essay:** Tests higher-order thinking, writing skills
|
|
- **Performance task:** Tests application in authentic context
|
|
- **Portfolio:** Tests growth over time
|
|
|
|
### Number of Items
|
|
- **Rule of thumb:** 3-5 items per objective for reliable measurement
|
|
- **Short quiz:** 5-10 items (formative)
|
|
- **Unit test:** 20-30 items (summative)
|
|
- **Comprehensive exam:** 50-100 items
|
|
|
|
### Difficulty Distribution
|
|
- **Easy (p > 0.7):** 20-30% of items
|
|
- **Medium (0.4 < p < 0.7):** 40-50% of items
|
|
- **Hard (p < 0.4):** 20-30% of items
|
|
|
|
## Trade-offs to Consider
|
|
|
|
| Decision | Benefit | Cost |
|
|
|----------|---------|------|
|
|
| More items | More reliable measurement | Longer test time, fatigue |
|
|
| More MC items | Easier to score, reliable | Limited higher-order measurement |
|
|
| More open-ended | Richer data, higher-order | Harder to score, subjective |
|
|
| Adaptive testing | Precise measurement, efficient | Complex to develop, requires item bank |
|
|
</decision_guidance> |